Greenthumb: Superoptimizer Construction Framework
GreenThumb is an extensible framework for constructing superoptimizers. It is designed to be easily extended to a new target ISA using inheritance. GreenThumb is implemented in Racket.
The top level directory contains ISA-independent files, which implement the superclasses to be extended. We have built superoptimizers for ARM, GreenArrays GA144 (GA), and a small subset of LLVM IR. Directories
llvm contain ISA-specific files for ARM, GA, and LLVM IR respectively.
- Greenthumb: Superoptimizer Construction Framework (CC'16) explains the framework overview.
- Scaling Up Superoptimization (ASPLOS'16) explains the search strategy provided by GreenThumb.
- Racket: Download and install drracket version 6.7 from https://racket-lang.org/download/. Other versions of Racket may work, but they are not tested. Include the installed 'bin' directory, which contains racket, drracket, raco, and etc., to the environment path.
- Rosette: Download Rosette v1.1 and follow the instruction to install in Rosette's README file. Don't forget to put Z3 executable in rosette/bin, as GreenThumb depends on Z3 (but not CVC4).
git clone https://github.com/mangpo/greenthumb.git cd greenthumb make
path.rkt will be generated.
Running an Existing Superoptimizer
For example, we will walk through how one can run the ARM supertoptimizer.
- A code file, containing a straight-line ARM code. For example, see
- A meta-data file, containing live-in and live-out information. The name of this file has to be the same as the code file appended with '.info', for example,
arm/programs/p14_floor_avg_o0.s.info. The content of the meta-data file for ARM should look like the following:
The first line indicates live-out registers, and the second line indicates live-in registers. In this example, a live-out register is R0, and live-in registers are R0 and R1. Note that currently the tool only supports R registers. Live-in information is no longer used in GreenThumb 2.0.
You can superoptimize one of the example programs or your own program by running
cd arm racket optimize.rkt <search-type> <search-mode> -c <number-of-search-instances> \ -t <time-limit-in-sec> <file_to_optimize>
--enum(enumerative search) or
- When using
--linear: optimizing by reducing cost incrementally,
--binary: optimizing by binary searching on the cost, or
--partial: optimizing by synthesizing small parts of the given code using context-aware window decomposition. Option
- When using
--synthesize: each search instance starts the search from a random program, or
--optimize: each search instance starts the search from the original input program.
For example, to optimize
arm/programs/p14_floor_avg_o0.s using cooperative search running all search techniques using eight search instances for an hour, run
racket optimize.rkt --hybrid -p -c 8 -t 3600 programs/p14_floor_avg_o0.s
racket optimize.rkt --help to see all supported arguments and what their default values are.
An output directory containing
driver-<id>.rkt files will be created. The default name of the output directory is
-d flag to specify the output directory's name. Each
driver-<id>.rkt file runs a search instance.
- updates the shared file
best.s, which always contains the current best program, if the search instance finds a better program than the current best.
summaryfile, which contains the statistic of best programs found at different points of time.
- writes debug and error messages to
At the beginning, the search driver will report the numbers of search instances allocated for different search techniques.
SEARCH INSTANCES ---------------- stoch: 2 instances sym: 2 instances enum: 4 instances ID 0-1: stoch (optimize) << driver-0 and 1 run stochastic search. ID 2-2: sym (window=L) << driver-2 runs symbolic search. ID 3-3: sym (window=2L) << driver-3 runs symbolic search. ID 4-4: enum (no-decomposition) << driver-4 runs enumerative search (no window). ID 5-6: enum (window=L) << driver-5 and 6 run enumerative search. ID 7-7: enum (window=2L) << driver-7 run enumerative search.
X in (window=X) indicates the size of window used in the context-aware window decomposition.
Then, the search driver will report the status of the search every 10 seconds.
// This first part is the statistics from all stochastic search instances. elapsed time: 74 mutate time: 0.080541 << (time spent on mutating)/total simulate time: 0.126095 << (time spent on interpreting)/total check time: 0.054595 << (time spent on checking actual outputs against expected outputs)/total validate time: 0.702649 << (time spent on validating the correctness using solver)/total iterations/s: 480.18 best cost: 5 << best cost despite the correctness best correct cost: 5 << best performance cost considering the correctness best correct time: 41 << time in seconds to find the best correct proposal Mutate Proposed Accepted Accepted/Proposed opcode 0.282484 0.058890 0.208473 operand 0.182736 0.018227 0.099747 swap 0.266824 0.092875 0.348078 inst 0.118275 0.017906 0.151396 nop 0.149678 0.015671 0.10470257303795003 accept-count: 0.047734 << rate of accepting mutated programs accept-higher-count: 0.000457 << rate of accepting mutated programs with higher cost // This second part summarizes the charateristics of the best program found so far. // This section only appears if the superoptimizer found better program(s). =============== SUMMARY =============== cost: 3 << cost of the best program found so far len: 3 << length of the best program found so far time: 44 << time in seconds to find the best program output/0/driver-7 << the best program is found by driver-7 (enumerative search in this example).
Press Ctrl-C to end the process early or wait until time is up. At the end, the search driver will print out the optimized program. Below is the output program when optimizing p14_floor_avg_o0.s.
OUTPUT ------ and r2, r0, r1 eor r3, r0, r1 add r0, r2, r3, asr 1
Reduce Memory Usage
To reduce the memory usage and the overhead of Racket translating program to bytecode, we can precompile our superoptimizer application to bytecode by running in command line:
raco make <list_of_flies_to_be_compiled>
For example, in
ARM directory, run:
raco make test-search.rkt ../parallel-driver.rkt
- Extending GreenThumb to a New ISA
- Adding more instructions to an existing superoptimizer
- Special Objects for Program State
- Advanced Usage
Inquery and Bug Report
If you are interested in using GreenThumb, please feel free to contact mangpo [at] eecs.berkeley.edu.