

US008661422B2

# (12) United States Patent Lethin et al.

## (10) Patent No.: US 8,661,422 B2 (45) Date of Patent: Feb. 25, 2014

### (54) METHODS AND APPARATUS FOR LOCAL MEMORY COMPACTION

(75) Inventors: Richard A. Lethin, New York, NY (US);
Allen K. Leung, New York, NY (US);
Benoit J. Meister, New York, NY (US);
Nicolas T. Vasilache, New York, NY
(US); David E. Wohlford, Portland, OR

(US)

(73) Assignee: **Reservoir Labs, Inc.**, New York, NY

(US)

(\*) Notice: Subject to any disclaimer, the term of this

patent is extended or adjusted under 35

U.S.C. 154(b) by 1287 days.

(21) Appl. No.: 12/365,780

(22) Filed: Feb. 4, 2009

(65) Prior Publication Data

US 2010/0192138 A1 Jul. 29, 2010

#### Related U.S. Application Data

- (60) Provisional application No. 61/065,294, filed on Feb. 8, 2008.
- (51) **Int. Cl. G06F 9/45** (2006.01)

#### (56) References Cited

#### U.S. PATENT DOCUMENTS

| 5,442,699 A | 8/1995 | Arnold et al.   |
|-------------|--------|-----------------|
| 5,442,797 A | 8/1995 | Casavant et al. |
| 5,613,136 A | 3/1997 | Casavant et al. |

| 5,742,814 | A  | 4/1998  | Balasa et al.      |
|-----------|----|---------|--------------------|
| 5,920,854 | A  | 7/1999  | Kirsch et al.      |
| 5,953,531 | A  | 9/1999  | Megiddo et al.     |
| 6,006,033 | A  | 12/1999 | Heisch             |
| 6,018,735 | A  | 1/2000  | Hunter             |
| 6,038,398 | A  | 3/2000  | Schooler           |
| 6,131,092 | A  | 10/2000 | Masand             |
| 6,279,113 | B1 | 8/2001  | Vaidya             |
| 6,327,699 | B1 | 12/2001 | Larus et al.       |
| 6,338,057 | B1 | 1/2002  | Weeks              |
| 6,651,246 | B1 | 11/2003 | Archambault et al. |
| 6,754,650 | B2 | 6/2004  | Cho et al.         |
|           |    |         |                    |

(Continued)

#### OTHER PUBLICATIONS

International Report on Patentability dated Mar. 31, 2011 for PCT Application No. PCT/US2009/057194.

(Continued)

Primary Examiner — Idriss N Alrobaye
Assistant Examiner — Brooke Taylor
(74) Attorney, Agent, or Firm — Goodwin Procter LLP

#### (57) ABSTRACT

Methods, apparatus and computer software product for local memory compaction are provided. In an exemplary embodiment, a processor in connection with a memory compaction module identifies inefficiencies in array references contained within in received source code, allocates a local array and maps the data from the inefficient array reference to the local array in a manner which improves the memory size requirements for storing and accessing the data. In another embodiment, a computer software product implementing a local memory compaction module is provided. In a further embodiment a computing apparatus is provided. The computing apparatus is configured to improve the efficiency of data storage in array references. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.

#### 39 Claims, 9 Drawing Sheets



#### (56) References Cited

#### U.S. PATENT DOCUMENTS

| 6,772,415    | B1  | 8/2004  | Danckaert et al.           |
|--------------|-----|---------|----------------------------|
| 6,785,677    | Bi  | 8/2004  | Fritchman                  |
|              | BI  | 9/2004  | Shanklin et al.            |
| 6,880,087    | Bi  | 4/2005  | Carter                     |
| 6,912,526    |     | 6/2005  | Akaboshi                   |
| 6,952,694    |     | 10/2005 | Mathur et al.              |
| 6,952,821    | B2  | 10/2005 | Schreiber                  |
| 7,086,038    |     | 8/2006  | Cronquist et al.           |
| 7,185,327    | B2  | 2/2007  | Scales                     |
| 7,225,188    | BI  | 5/2007  | Gai et al.                 |
| 7,260,558    |     | 8/2007  | Cheng et al.               |
| 7,594,260    | B2  | 9/2009  | Porras et al.              |
| 7,634,566    |     | 12/2009 | Turner et al.              |
| 7,757,222    |     | 7/2010  | Liao et al.                |
|              | B2  | 12/2011 | Eichenberger et al.        |
| 8,108,845    |     | 1/2012  | Little et al.              |
| 8,230,408    |     | 7/2012  | Eng                        |
| 8,250,550    | B2  | 8/2012  | Luszczek et al.            |
| 8,255,890    | B2  | 8/2012  | Luszczek et al.            |
| 8,307,347    | B2  | 11/2012 | Austin et al.              |
| 2002/0021838 | Al  | 2/2002  | Richardson et al.          |
| 2003/0097652 | Al  | 5/2003  | Roediger et al.            |
| 2004/0034754 | A1  | 2/2004  | Schreiber                  |
| 2004/0068501 | A1  | 4/2004  | McGoveran                  |
| 2005/0114700 | Al  | 5/2005  | Barrie et al.              |
| 2006/0048121 | A1  | 3/2006  | Blainey et al.             |
| 2006/0048123 | Al  | 3/2006  | Martin                     |
| 2006/0085858 | Al  | 4/2006  | Noel et al.                |
| 2007/0033367 | A1* | 2/2007  | Sakarda et al 711/170      |
| 2007/0074195 | A1  | 3/2007  | Liao et al.                |
| 2007/0192861 | A1  | 8/2007  | Varghese et al.            |
| 2008/0010680 | A1  | 1/2008  | Cao et al.                 |
| 2009/0037889 | A1* | 2/2009  | Li et al 717/140           |
| 2009/0083724 | A1* | 3/2009  | Eichenberger et al 717/160 |
| 2009/0119677 | A1  | 5/2009  | Stefansson et al.          |
| 2009/0259997 | A1  | 10/2009 | Grover et al.              |
| 2009/0307673 | A1  | 12/2009 | Eichenberger et al.        |
| 2010/0050164 | A1  | 2/2010  | Van De Waerdt et al.       |
| 2010/0162225 | Al  | 6/2010  | Huang et al.               |
|              |     |         | -                          |

#### OTHER PUBLICATIONS

International Search Report and the Written Opinion dated Mar. 18, 2010 for PCT Application No. PCT/US2009/057194.

Software Tools to Optimize BMD Radar Algorithms to COTS Hardware: Phase II Proposal, Reservoir Labs, Inc., Topic No. MDA06-031, Proposal No. B2-1415.

Optimizing and Mapping Tool Chain for FPGA Programming—Phase II Proposal, Reservoir Labs, Inc., Topic No. SB062-006, Proposal No. D2-0627.

Darte and Vivien's Algorithm, "Chapter 5: Parallelism Detection In Nested Loops", pp. 193-226.

"The Cell Roadmap", Published on PPCNUX at http://www.ppcnux.com/?q=print/6666.

"ClearSpeed™ Introductory Programming Manual—The ClearSpeed Software Development Kit", ClearSpeed Technology Inc. 2007.

"ClearSpeed™ ClearSpeed Programming Model: An introduction", ClearSpeed Technology Inc. 2007.

"ClearSpeed™ ClearSpeed Programming Model: Card-side Libraries", ClearSpeed Technology Inc. 2007.

"ClearSpeed<sup>TM</sup> ClearSpeed Programming Model: Optimizing Performance", ClearSpeed Technology Inc. 2007.

Ayers et al, Aggressive inlining, PLDI '92 Las Vegas, NV, USA.

Bastoul, "Efficient Code Generation for Automatic Parallelization and Optimization", Proceedings of the Second International Symposium on Parallel and Distributed Computing, 2003.

Bastoul, "Code Generation in the Polyhedral Model Is Easier Than You Think", Proceedings of the 13<sup>th</sup> International Conference on Parallel Architecture and Compilation Techniques, 2004.

Bastoul et al, "Putting Polyhedral Loop Transformations to Work", INRIA, No. 4902, Jul. 2003.

Bondhugula et al, "Automatic Mapping of Nested Loops to FPGAs", OSU, Mar. 19, 2007.

Bondhugula et al, "A Practical and Fully Automatic Polyhedral Program Optimization System", OSU OSU-CISRC-10/07-TR70, Dec. 14, 2007.

Cifuentes, "Structuring Decompiled Graphs", Department of Computer Science, Univ. of Tasmania, 1994.

Cifuentes, "Structuring Decompiled Graphs", Department of Computer Science, Univ. of Tasmania, 1996.

Clauss et al, "Deriving Formulae to Count Solutions to Parameterized Linear Systems using Ehrhart Polynomials: Applications to the Analysis of Nested-Loop Programs", Apr. 10, 1997.

Collard et al, "Automatic Generation of Data Parallel Code", Proceedings of the Fourth International Workshop on Compilers for Parallel Computers, Dec. 1993.

Collberg et al, "Manufacturing Cheap, Resilient, and Stealthy Opaque Constructs", POPL 98, San Diego, CA 1998.

Darte et al, "Revisiting the decomposition of Karp, Miller and Winograd", Parallel Processing Letters, 1995.

Feautrier, "Array Expansion", Labratoire PRiSM, Jul. 1998.

Feautrier, "Some efficient solutions to the affine scheduling problem Part I One-dimensional Time", Laboratoire MASI, Institute Blaise Pascal, Universite de Versailles St-Quentin, Apr. 23, 1993.

Ferrante et al, "The Program Dependence Graph and Its Use in Optimization", ACM Transactions on Programming Languages and Systems, vol. 9, No. 3, Jul. 1987, pp. 319-349.

Franke et al, "Compiler Transformation of Pointers to Explicit Array Accesses in DSP Applications", Institute for Computing Systems Architecture (ICSA), University of Edinburgh.

Gautam et al, "The Z-Polyhedral Model", PPoPP'07, San Jose, CA Mar. 14-17, 2007.

Griebl, "On the Mechanical Tiling of Space-Time Mapped Loop Nests", Fakultat fur Mthemetik and Informatik, Universitat Passau, Germany

Griebl et al, "Space-Time Mapping and Tiling: A Helpful Combination", Concurrency and Comput.: Pract. Exper. 2004, 16:221-246. Griebl, "Automatic Parallelization of Loop Programs for Distributed Memory Architectures" Fakultat fur Mathematik und Informatik,

Jun. 2, 2004. Griebl et al, "Forward Communication Only Placements and their Use for Parallel Program Construction", University of Passau.

Irigoin et al, "Supernode Partitioning", Proceedings of the 15<sup>th</sup> Annual ACM, SIGACT-SIGPLAN Symposium on Principles of Programming Languages, San Diego, CA, Jan. 1988.

Jimenez et al, "Register Tiling in Nonrectangular Iteration Spaces", ACM Transactions on Programming Languages and Systems, vol. 24, No. 4, pp. 409-453, Jul. 2002.

Kandemir et al, "Optimizing Spatial Locality in Loop Nests using Linear Algebra", Proc. 7<sup>th</sup> International Workshop on Compliers for Parallel Computers, Sweden Jun. 1998.

Lethin, "Software Tools to Optimize BMD Radar Algorithms to COTS Hardware—Final Report", Sep. 12, 2007.

Lethin et al, "Mapping Loops for the ClearSpeed Processor Using the R-Stream Compiler", Feb. 4, 2008.

Lethin et al, "The R-Stream 3.0 Compiler", Feb. 4, 2008.

Lim et al, "Maximizing Parallelism and Minimizing Synchronization with Affine Transforms", 24<sup>th</sup> Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Paris, France, Jan. 1997.

Loechner et al, "Precise Data Locality Optimization of Nested Loops", The Journal of Supercomputing, 21, pp. 37-76, 2002.

Meister et al, "Optimizing and Mapping Tool Chain for FPGA Programming—Final Report Phase 1 SBIR Project", Sep. 28, 2007.

Pop et al, "Induction Variable Analysis with Delayed Abstractions", ACM Transactions on Architecture and Code Optimization, vol. V, No. N, pp. 1-30, Aug. 2005.

Pop et al, "Fast Recognition of Scalar Evolutions on Three-Address SSA Code", CRI/ENSMP Research Report, A/354/CRI, Apr. 1, 2004

Quillere et al, "Generation of Efficient Nested Loops from Polyhedra" 2000 Kluwer Academic Publishers, 2000.

Quinton et al, "On Manipulating Z-polyhedra", IRISA, Publication Interne No. 1016, Jul. 1996.