Skip to content

ROSE based tools

Nathan Pinnow edited this page Aug 20, 2019 · 11 revisions

ROSE-Based Tools

A list of tools based on ROSE.

Source transformation tools

identityTranslator Simplest source-to-source translator built using ROSE.

dotGenerator and dotGeneratorWholeASTGraph Generate dot graph dump of AST

pdfGenerator Generate pdf dump of AST

autoPar Automatic Parallelization using OpenMP

Declaration Move Tool Re-scoping variable declarations

Binary analysis tools

Binary analysis tools all show a Unix-style man page when invoked with "--help". This list gives the location of the tool and a brief description of what it does. The locations are:

  • ROSE: You can find the source code for the tool in the ROSE repository, usually under the "projects" directory, and most often in "projects/BinaryAnalysisTools".
  • Megachiropteran: these tools are available to collaborators and are mostly useful for debugging. BTW, megachiropteran means "big BAT" and "BAT" stands for "binary analysis tool". It also has the distinction of being one half of the most complex common anagram pair in the English language (I'll leave it to you to find the other word, which is much more common than megachiropteran).
Name Location Purpose
astDiffBinary ROSE Computes various edit distance metrics to measure the distance between two binary functions.
bat-ana Megachiropteran This tool runs the initial steps needed by almost all other analysis tools: parsing ELF and PE containers, initializing simulated virtual memory; deciding which addresses are instructions and decoding them; partitioning decoded addresses into basic blocks and functions; generating the global control flow graph (CFG) and address usage map (AUM); optionally running post-partitioning analyses such as may-return, stack-delta, calling-convention, etc. The results are saved in a binary analysis state file (*.rba) that can be read by other tools.
bat-arraybounds Megachiropteran Statically analyzes binaries to find out-of-bound reads and writes for arrays.
bat-cc Megachiropteran This tool runs the ROSE calling-convention analysis and either reports the results or inserts them into the binary analysis state file.
bat-cfg Megachiropteran Prints various kinds of control flow graphs, such as the global CFG, function CFGs, or region CFGs. It can show the CFG in human readable format or as GraphViz output and has numerous switches for controlling the style.
bat-cg Megachiropteran Prints information about the function call graph. It can show human-readable information about individual functions (the function's callers and callees), or it can produce GraphViz output of the entire call graph.
bat-container Megachiropteran Prints information about ELF and PE containers.
bat-dis Megachiropteran Disassembly lister. This tool reads a binary analysis state file and produces assembly listings. It has numerous switches to control the format of the output. Note that ROSE listings are intended for human consumption and it generally doesn't work to feed them into an assembler to produce a new binary.
bat-dis-simple Megachiropteran Similar to bat-dis but all command-line switches default to values that produce a simplified assembly listing.
bat-entropy Megachiropteran Measures symbol entropy in a sliding window through virtual memory. This is different from most other entropy tools because it scans virtual memory rather than just the executable file itself.
bat-insert-call Megachiropteran Patches a binary by inserting new code and calls to the new code.
bat-io Megachiropteran Static analysis to analyze I/O characteristics of an executable.
bat-linear Megachiropteran Linear (ordered by memory address) assembly listing. This is a trivial disassembler that disassembles each address before moving on to the instruction at the following address. No attempt is made to organize instructions into basic blocks or functions.
bat-lsb Megachiropteran Reads a binary analysis state file and lists information about each basic block, such as the number of instructions and the virtual address segments. Note that ROSE's definition of basic block is different than some other tools in that ROSE does not require the instructions to be adjacent to each other in memory; a basic block can have internal unconditional branches as long as no interior instruction is a successor of some other basic block. The output is intended to be in a format that's easily used by other tools.
bat-lsd Megachiropteran Reads a binary analysis state file and lists information about each static data block.
bat-lsf Megachiropteran Reads a binary analysis state file and lists information about each function. The output is intended to be in a format that's easily used by other tools.
bat-mem Megachiropteran Reads a binary analysis state file and shows information about the simulated virtual memory. The memory can be output in a variety of formats such as raw memory dumps with a text index file, Motorola S-Records, Intel HEX files, hexdump-style output, etc.
bat-nullptr Megachiropteran Static analysis to find possible null pointer dereferences and the inputs that would cause such an execution path to be taken.
bat-prop Megachiropteran Prints various properties about a specimen in an output format intended to be used by shell scripts.
bat-reachable Megachiropteran Computes reachability information through the global control flow graph.
bat-stack-deltas Megachiropteran Reads a binary analysis state file, runs the stack delta analysis for functions where it hasn't already been run (in parallel), and reports the results in a format that's easy for other tools to read. The output consists of memory address intervals for instruction sequences and how their incoming and outgoing stack pointers relate to the initial stack pointer at the start of the function. This is useful for unwinding call frames when there's no frame pointer register.
bat-window Megachiropteran Measures code likelihood in a sliding window through virtual memory. This can be used to find what parts of virtual memory are likely to contain code versus data.
BinaryCloneDetection ROSE This is a suite of tools to incrementally build a database and query it for results in order to find similar functions across executables based on their symbolic behavior.
rose-binary-to-source ROSE Generates a low-level C source code for any architecture for which ROSE has instruction semantics.
bROwSE-server ROSE This was our first foray into embedding the ROSE library in a web server. The server analyzes the binary and the user connects to it with a web browser to see results. The server supports analyzing any binary format and architecture supported by ROSE, interactive adjustments to the virtual memory map, disassembling and partitioning, program indexed by functions showing various function properties that can be sorted, graph-based disassembly listings using the CFG, traditional linear assembly listings, cross references for constants, decoded strings, hexdumps of virtual memory, symbolic data-flow results, lists of magic numbers, etc.
rose-check-execution-addresses ROSE Compares a dynamically-generated execution trace of a Linux program with the statically-generated global control flow graph and reports differences.
rose-debug-semantics ROSE Instruction semantics trace tool. This tool runs instruction semantics in various domains and reports all operations and states per basic block. It's main purpose is to have an easy and extensible way for users to check whether semantics are operating as they expect, and to be able to report bugs that the ROSE developers can reproduce.
detectConstants ROSE Demonstrates a few ways that constants can be found in binaries, such as by traversal of the AST or by examining machine states after a data-flow analysis.
rose-dump-memory ROSE A tool to print information about virtual memory and to extract data in a few different formats. As with most binary analysis tools, this one can examine data in files, or simulated virtual memory initialized from ELF and PE files, raw memory dumps, or stopped or running Linux processes.
rose-dwarf-lines ROSE This tool reads DWARF debugging information from an executable compiled with "-g" and emits two mappings: one that shows which source code files and lines correspond to each instruction, and vice versa.
rose-find-dead-code ROSE This is a simple demonstration of how the global control flow graph can be used to propagate reachability information. It uses a custom implementation to propagate reachability, which is a good example of how to traverse CFGs, but it's been superseded by a better analysis built into the ROSE library.
rose-find-path ROSE Given end-points of an execution path and addresses and/or branches to be avoided, this tool determines whether any execution path exists and the input conditions necessary to drive that path. Most of this tool is superseded by a path feasibility analysis that's part of the ROSE library.
rose-find-similar-functions ROSE Given two closely related executables, this tool finds the best mapping of functions from one executable to the other. This is for the common case when function's don't have names. It correlates functions by computing a difference metric (several are implemented) to create a bipartite graph, then solving the minimum weight perfect matching problem with the Kuhn-Munkres algorithm. This tool is superseded by a matching analysis that's part of the ROSE library.
rose-generate-paths ROSE This tool generates source code of arbitrary size with various kinds of control flow paths in order to test algorithms that operate on control flow graphs.
rose-linear-disassemble ROSE This is a simple linear sweep disassembler: it starts at some specified address(es) and decodes instructions one after another with no regard for control flow.
rose-magic-scanner ROSE Scans binaries for magic numbers and reports them like the Unix "file" command. The differences between this tool and "file" is that this tool scans every address instead of just the beginning of the file, and it scans the simulated virtual memory rather than the file (i.e., addresses are virtual memory addresses rather than file offsets). Like most ROSE-based binary analysis tools, the virtual memory can be constructed from raw files, ELF or PE containers, Motorola S-Records, or running or stopped Linux processes.
rose-max-bijection ROSE This is a low-level tool that computes a minimum-cost 1:1 mapping between two sets of integers. It's used as part of a work flow to find where code should be mapped in memory if it is not position independent and no starting address is known.
rose-missing-semantics ROSE Shell script that runs some ROSE tools in order to obtain statistics about which instructions are missing semantics. This is mainly used as a development tool to guide how resources are used to implement instruction semantics.
rose-native-execution-trace ROSE Generates a simple execution trace by single-stepping a Linux executable within the ROSE debugger.
rose-recursive-disassemble ROSE The full-fledged ROSE disassembler with lots of command-line switches to control specimen loading into simulated virtual memory, the disassembly and partitioning process, analyses run by the partitioner, output of virtual memory information, statistics about the CFG and instruction cache, a detailed list of all CFG information, output of various kinds of control flow graphs and function call graphs, detailed information about what's at each virtual address, list of addresses that were not used during parsing, an index of functions, a list of instruction addresses for input to other tools, lists of string literals in various encodings, all details about PE or ELF containers, and assembly listings.
rose-xml ROSE This tool generates an XML representation of various ROSE data structures, such as the entire Partitioner binary analysis state, or the binary components of an AST. Various free tools are able to convert XML to JSON.
rose-simulate ROSE Parses and loads a specimen and then executes its instructions in a concrete semantic domain.
rose-string-decoder ROSE Finds strings of various formats using the string analysis in the ROSE library. This analysis is able to search for ASCII, Unicode, etc. with variable width character encoding, termination or run-length encoding, etc. It can search for multiple encodings simultaneously and reports the string literal and encoding information. Another way it differs from the Unix "strings" command is that it searches the simulated virtual address space rather than a file. Like most ROSE binary analysis tools, the virtual memory can be constructed in a variety of ways.
rose-symbolic-simplifier ROSE This implements a read-eval-print loop (REPL) for symbolic expressions, where the "eval" step sends the expression through ROSE's built-in simplification layer. This layer is not a full simplifier like what might be found in SMT solvers, but rather tuned for those situations that commonly occur when emulating instructions symbolically. It's purpose is to give the user a way to interactively discover how the simplifier works, and to report bugs that the ROSE team can reproduce.
rose-trace ROSE Compares a dynamically-generated execution trace of a Linux program with the statically-generated global control flow graph and reports differences.
rose-x86-call-targets ROSE A low-level function that disassembles all bytes and reports the target addresses for all x86 CALL instructions it finds. This is part of a work flow to find the best address at which to map non-PIC code.
rose-xml2json ROSE Translates the XML produced by @c boost::serialization to JSON and is able to handle XML inputs that are many gigabytes. This tool is very fast and uses a small amount of memory compared to the input size.
x86-function-vas ROSE A low-level tool that reports addresses of functions. This is part of a work flow to find the best address at which to map non-PIC code.
simulator2 ROSE This is a suite of tools to "execute" a program by using concrete instruction semantics and a Linux system call translation layer that can be modified by the user. It can handle amd64, m68k, or x86 instruction sets (those instructions for which ROSE knows the semantics) and Linux x86 and amd64 system calls or raw hardware; it can initialize its simulation memory from some combination of files, ELF or PE containers, Motorola S-Records, and running or stopped Linux executables; it can trace system calls similar to the "strace" command, instructions, and memory access; it partially supports multi-threaded applications; user can adjust instruction, memory, and syscall behavior through callbacks; it's able to selectively disable certain system calls; it can trace file and socket I/O; it has an interactive debugger for examining the specimen being simulated.

Identity Translator


This tool is the simplest tool built using ROSE. It takes input source files , builds AST, and then unparses the AST back to compilable source code. It tries its best to preserve everything from the input file. Any other ROSE-based tools can be built using this tool's skeleton by adding customized AST analysis and transformation.

#include "rose.h"
int main(int argc, char *argv[]){
  // Build the AST used by ROSE
  SgProject *project = frontend(argc, argv);
  // Run internal consistency tests on AST
  // Insert your own manipulation of the AST here...
  // Generate source code from AST and call the vendor's compiler
  return backend(project);


Type make install-core -j4 under $ROSE_build will build and install this tool into $ROSE_INSTALL/bin.

User Instructions

There are too many options to be displayed here. So please refer to –help for details.

  • –help will display help information for the tool


You can run the tool as follows to produce an output file rose_input.c file below:

identityTranslator -c input.c


Due to limitations of the frontends used by ROSE and some internal processing, identityTranslator cannot generate 100% identical output compared to the input file. Some notable changes it may introduce include:

  • "int a, b, c;" are transformed to three SgVariableDeclaration statements,
  • macros are expanded
  • extra brackets are added around constants of typedef types (e.g. c=Typedef_Example(12); is translated in the output to c = Typedef_Example((12));)
  • NULL is converted to 0

This page is generated from $ROSE/docs/Rose/Tools/identityTranslator.dox

dotGenerator and dotGeneratorWholeASTGraph


There are two tools to generate AST graphs in dot format. They are:

  • dotGenerator: a simple AST graph generator showing essential nodes and edges only.
  • dotGeneratorWholeASTGraph: generating a whole AST graph showing more details.


Type make install-core -j4 under $ROSE_build will build and install the tools into $ROSE_INSTALL/bin.

User Instructions

dotGeneratorWholeASTGraph provides filter options to show/hide certain AST information.

        dotGeneratorWholeASTGraph --help
        -rose:help                     show this help message
        -rose:dotgraph:asmFileFormatFilter           [0|1]  Disable or enable asmFileFormat filter
        -rose:dotgraph:asmTypeFilter                 [0|1]  Disable or enable asmType filter
        -rose:dotgraph:binaryExecutableFormatFilter  [0|1]  Disable or enable binaryExecutableFormat filter
        -rose:dotgraph:commentAndDirectiveFilter     [0|1]  Disable or enable commentAndDirective filter
        -rose:dotgraph:ctorInitializerListFilter     [0|1]  Disable or enable ctorInitializerList filter
        -rose:dotgraph:defaultFilter                 [0|1]  Disable or enable default filter
        -rose:dotgraph:defaultColorFilter            [0|1]  Disable or enable defaultColor filter
        -rose:dotgraph:edgeFilter                    [0|1]  Disable or enable edge filter
        -rose:dotgraph:expressionFilter              [0|1]  Disable or enable expression filter
        -rose:dotgraph:fileInfoFilter                [0|1]  Disable or enable fileInfo filter
        -rose:dotgraph:frontendCompatibilityFilter   [0|1]  Disable or enable frontendCompatibility filter
        -rose:dotgraph:symbolFilter                  [0|1]  Disable or enable symbol filter
        -rose:dotgraph:emptySymbolTableFilter        [0|1]  Disable or enable emptySymbolTable filter
        -rose:dotgraph:typeFilter                    [0|1]  Disable or enable type filter
        -rose:dotgraph:variableDeclarationFilter     [0|1]  Disable or enable variableDeclaration filter
        -rose:dotgraph:variableDefinitionFilter      [0|1]  Disable or enable variableDefinitionFilter filter
        -rose:dotgraph:noFilter                      [0|1]  Disable or enable no filtering
     Current filter flags' values are: 
              m_asmFileFormat = 0 
              m_asmType = 0 
              m_binaryExecutableFormat = 0 
              m_commentAndDirective = 1 
              m_ctorInitializer = 0 
              m_default = 1 
              m_defaultColor = 1 
              m_edge = 1 
              m_emptySymbolTable = 0 
              m_expression = 0 
              m_fileInfo = 1 
              m_frontendCompatibility = 0 
              m_symbol = 0 
              m_type = 0 
              m_variableDeclaration = 0 
              m_variableDefinition = 0 
              m_noFilter = 0  

Visualization of dot files

To visualize the generated dot graph, you have to install

Please note that you have to configure ZGRViewer to have correct paths to some commands it uses. You can do it from its configuration/setting menu item. Or directly modify the text configuration file (.zgrviewer).

You have to configure the script of zgrviewer to have a correct path also

    # If you want to be able to run ZGRViewer from any directory,
    # set ZGRV_HOME to the absolute path of ZGRViewer's main directory
    # e.g. ZGRV_HOME=/usr/local/zgrviewer
    java -jar $ZGRV_HOME/target/zgrviewer-0.8.1.jar "$@"


You can run the tool as follows to produce dot files:

    dotGeneratorWholeASTGraph -c ttt.c


Due to the limitation of visualization tools, dotGenerator and dotGeneratorWholeASTGraph have a threshold of the max number of nodes supported. Once the threshold is reached, the tools will give up and report an error.

It is recommended to use simplest input code without including any headers to meet the requirement. Alternatively, you can use pdfGenerator to generate a pdf file for large input files.

This page is generated from $ROSE/docs/Rose/Tools/dotGenerator.dox

PDF Generator


This tool will generate a pdf file from a list of input source files. It is especially useful when the AST is too large to be visualized by the Dot graph generators.


The source file of the tool is under $ROSE/exampleTranslators/PDFGenerator. Type make install-core -j4 under $ROSE_build will build and install pdfGenerator into $ROSE_INSTALL/bin.

User Instructions

The translator, pdfGenerator, accepts the following options:

  • rose:convertFullAST will conver the full AST, including portions from headers, to the pdf file
  • help will display help information for the tool and ROSE

By default, only the AST from the input files of the command line will be converted to a pdf file.


You can run the tool as follows to produce the input.c.pdf output file below:

pdfGenerator -rose:convertFullAST -c input.c

This page is generated from $ROSE/docs/Rose/Tools/pdfGenerator.dox



This tool is an implementation of automatic parallelization using OpenMP. It can automatically insert OpenMP 3.0 directives into input serial C/C++ codes. For input programs with existing OpenMP directives, the tool will double check the correctness when the right option is turned on.


Type make install-core -j4 under $ROSE_build will build and install the tool, autoPar, into $ROSE_INSTALL/bin.

User Instructions

The tool accepts a set of options:

     autoPar -help
     Auto parallelization-specific options
     -rose:autopar:enable_debug     run automatic parallelization in a debugging mode
     -rose:autopar:enable_patch     additionally generate patch files for translations
     -rose:autopar:unique_indirect_index assuming all arrays used as indirect indices have unique elements (no overlapping)
     -rose:autopar:enable_distance  report the absolute dependence distance of a dependence relation preventing parallelization
     -annot filename        specify annotation file for semantics of abstractions. This option can be repeated in one command line to load multiple annotation files. 
     -dumpannot            dump annotation file content


An Simple One

For an example input code:

/* Only the inner loop can be parallelized
void foo()
  int n=100, m=100;
  double b[n][m];
  int i,j;
  for (i=0;i<n;i++)
    for (j=0;j<m;j++)

You can run the tool as follows:

     autoPar -rose:autopar:unique_indirect_index -rose:autopar:enable_patch inner_only.c
     Enabling generating patch files for auto parallelization ...
     Assuming all arrays used as indirect indices have unique elements (no overlapping) ...
     Unparallelizable loop at line:8 due to the following dependencies:
     2*2 TRUE_DEP; commonlevel = 2 CarryLevel = (0,0)  Is precise SgPntrArrRefExp:b[i][j]@10:14->SgPntrArrRefExp:b[i - 1][j - 1]@10:21 == -1;* 0;||* 0;== -1;||::
     Automatically parallelized a loop at line:9 

The generated output file rose_inner_only.c should look like:

/* Only the inner loop can be parallelized
#include "omp.h" 
void foo()
  int n = 100;
  int m = 100;
  double b[n][m];
  int i;
  int j;
  for (i = 0; i <= n - 1; i += 1) {
#pragma omp parallel for private (j) firstprivate (n,m,i)
    for (j = 0; j <= m - 1; j += 1) 
      b[i][j] = b[i - 1][j - 1];

Using Annotations

Annotation files provide additional program information which is traditionally hard for compilers to extract, such as aliasing, side effect information. Multiple annotations files can be loaded in one command line Example annotation files are and .

To use annotation files:

autoPar -annot floatArray.annot -annot funcs.annot -c interp1_elem2.C

Generating Patches

Often users want to see the changed lines only instead of seeing a big output file.

To use the patch generation feature:

The generated patch file should look like:

Checking Correctness

AutoPar can examine pre-existing OpenMP directives in an application and verifies that they have correctly accounted for private, reductions and other OMP data-sharing attributes. Sample input file

#include <stdio.h>
#include <omp.h>
int main(int argc, char *argv[]) {
  int N = 20;
  int total ;
  int i,j;
#pragma omp parallel for
  for (j=0;j<N;j++) {
    for (i=0;i<N;i++) {
      total += i+j ;
  printf("%d\n", total) ;
  return 0;

The code above contains a real OpenMP bug someone struggled with while trying to add OMP annotations to their code, submitted here:

Running the autoPar as an OpenMP directive checker produces this result:

% autoPar -rose:unparse_tokens -rose:autopar:unique_indirect_index -rose:autopar:enable_diff  -fopenmp -c
user defined      : #pragma omp parallel for
compiler generated:#pragma omp parallel for private (i) reduction (+:total)

The output above from autoPar indicates that the OMP pragma is missing an OpenMP 'private' declaration for the variable 'i', and a reduction annotation for the variable 'total'.

This page is generated from $ROSE/docs/Rose/Tools/autoPar.dox

Declaration Move Tool


This tool will move variable declarations to their innermost possible used scopes. For a declaration, find the innermost scope we can move it into, without breaking the code's original semantics.

User Instructions

The translator, moveDeclarationToInnermostScope, accepts the following options:

  • -rose:merge_decl_assign will merge the moved declaration with an immediately followed assignment.
  • -rose:aggressive : turn on the aggressive mode, which will move declarations with initializers, and across loop boundaries. A warning message will be sent out if the move crosses a loop boundary. Without this option, the tool only moves a declaration without an initializer to be safe.
  • -rose:debug, which is turned on by default in the testing. Some dot graph files will be generated for scope trees of variables for debugging purpose.
  • -rose:keep_going will ignore assertions as much as possible (currently on skip the assertion on complex for loop initialization statement list). Without this option, the tool will stop on assertion failures.
  • -rose:identity will turn off any transformations and act like an identity translator. Useful for debugging purposes.
  • -rose:trans-tracking will turn on the transformation tracking mode, showing the source statements of a move/merged declaration


For the following input code,

void AccumulateForce(int *idxBound, int *idxList, int len,
                    double *tmp, double *force)
   register int ii ;
   register int jj ;
   int count ;
   int *list ;
   int idx ;
   double sum ;
   for (ii=0; ii<len; ++ii) {
      count = idxBound[ii+1] - idxBound[ii] ;
      list = &idxList[idxBound[ii]] ;
      sum = 0.0 ;
      for (jj=0; jj<count; ++jj) {
         idx = list[jj] ;
         sum += tmp[idx] ;
      force[ii] += sum ;
   return ;

you can run the move tool as follows to produce the output file below:

moveDeclarationToInnermostScope -rose:unparse_tokens -rose:merge_decl_assign -c

There are several things to notice about this command line. The moveDeclarationToInnermostScope tool acts as a front-end to an underlying compiler, and the command line options for that compiler will be honored. Here, we also have some ROSE/tool specific command line options. The '-rose:unparse_tokens' option tells ROSE to take extra care to preserve the source-code formatting from the input source file when producing the output file. The '-rose:merge_decl_assign' option is specific to the rescoping tool, and indicates that any moved declarations should try to be combined with pre-existing assignment statements in the target scope.

The output file will look like

void AccumulateForce(int *idxBound, int *idxList, int len,
                    double *tmp, double *force)
   for (register int ii = 0; ii<len; ++ii) {
      int count = idxBound[ii + 1] - idxBound[ii];
      int *list = &idxList[idxBound[ii]];
      double sum = 0.0;
      for (register int jj = 0; jj<count; ++jj) {
         int idx = list[jj];
         sum += tmp[idx] ;
      force[ii] += sum ;
   return ;

Looking at the transformed source code above, there are several points of interest:

  • Any qualifiers associated with declarations are preserved when the declaration is moved.
  • Declarations related to for-loop control variables are moved into the loop header.
  • Assignments and declarations are merged, due to the presence of the -rose:merge_decl_assign command line option.

Internal Algorithm

Focusing on finding target scopes, since multiple (iterative) declaration moves are unnecessary. If we know the final scopes to be moved into, we can copy-move a declaration to all target scopes in one shot

Analysis: findFinalTargetScopes (declaration, &target_scopes)
    while (!scope_tree_worklist.empty())
        current_scope_tree = scope_tree_worklist.front(); …
        collectCandidateTargetScopes(decl, current_scope_tree);
            if (found a bottom scope) target_scopes.push_back(candidate)
            else scope_tree_worklist.push_back(candiate)
    if (target_scopes.size()>0)
        copyMoveVariableDeclaration(decl, target_scopes);
You can’t perform that action at this time.