Skip to content
Nathan Pinnow edited this page Aug 23, 2019 · 12 revisions

We collect a list of frequently asked questions about ROSE, mostly from the rose-public mailing list link


How to search rose-public mailinglist for previously asked questions?

Use the following command on google search


How to check the version of ROSE?

In ROSE_Install_path/include/rose/rosePublicConfig.h

/* Define to the version of this package. */

To check this in our code

bool checkRoseVersionNumber(const std::string &need) {
    std::vector<std::string> needParts = rose::StringUtility::split('.',
    std::vector<std::string> haveParts = rose::StringUtility::split('.',

    for (size_t i=0; i < needParts.size() && i < haveParts.size(); ++i) {
        if (needParts[i] != haveParts[i])
            return needParts[i] < haveParts[i];
    // E.g., need = "1.2" and have = "1.2.x", or vice versa
    return true;

Why can't ROSE staff members answer all my questions?

It can feel very frustrating when you get no responses to your questions submitted to the mailing list. You may wonder why the ROSE staff cannot help neither sometimes. Here are some possible excuses:

  • They are just as busy as everybody else in the research and development fields. They may be working around the clock to meet deadlines for proposals, papers, project reviews, deliverables, etc.
  • They don't know every corner of their own compiler, given the breadth and depth of contributions made to ROSE by collaborators, former staff members, post-docs, and interns. Moreover, most contributions lack good documentation--something that should be remedied in the future.
  • Some questions are simply difficult and open research and development questions. They may have no clue, either.
  • They just feel lazy sometimes or are taking a thing called vacation.

Possible alternatives to have your questions answered and your problems solved in a timely fashion:

  • Please do you own homework first (e.g. Google).
  • The ROSE team is actively addressing the documentation problem, through an internal code review process to enforce well-documented contributions going forward.
  • Help others to help yourself. Answer questions on the mailing list and contribute to this community-editable Wikibook.
  • Find ways to formally collaborate with, or fund, the ROSE team. Things go faster when money is flowing :-) Sad, but true, reality in this busy world.

Is ROSE a preprocessor, a translator, or a compiler?

Technically, no! ROSE is formally a meta-tool, a tool for building tools. ROSE is an object-oriented framework for building source-to-source translators. A preprocessor knows nothing of the syntax or semantics of the language being preprocessed, typically it recognizes another embedded language within the input file (or attempts to recognize subsets of source language). In contrast, translators process the input language with precision identical to a compiler. Since ROSE helps build source-to-source translators, we resist calling the translators compilers, since the output is not machine code. This point is not a required part of the definition of a compiler, many language compilers use a particular language as an assembly language level (typically C). These are no less a compiler. But since we do source-to-source, we feel uncomfortable with calling the translators compilers (the output language is typically the same as the input language). The point is further muddled since it is common in ROSE to have a translator hide the call to the vendor's compiler and thus the translator can be considered to generate machine code. But this gives little credit to the vendor's compiler. So we prefer to refer to our work as a tool (or framework) for building source-to-source translators.

How many lines of source code does ROSE have?

Excluding the EDG submodule and all source code comments, the core of ROSE (rose/src) has about 674,000 lines of C/C++ source code as of July 11, 2012.

Including tests, projects, and tutorial directories, ROSE has about 2 Million lines of code.

Some details are shown below:

[rose/src]./ .
    3076 text files.
    2871 unique files.                                          
     716 files ignored. v 1.56  T=26.0 s (91.7 files/s, 39573.3 lines/s)
Language                     files          blank        comment           code
C++                            908          75280          93960         354636
C                              123          12010           3717         199087
C/C++ Header                   915          28302          38412         121373
Bourne Shell                    17           3346           4347          25326
Perl                             4            743           1078           7888
Java                            18           1999           4517           7096
m4                               1            747             20           6489
Python                          34           1984           1174           5363
make                           148           1682           1071           3666
C#                              11            899            274           2546
SQL                              1              0              0           1817
Pascal                           5            650             31           1779
CMake                          168           1748           4880           1702
yacc                             3            352            186           1544
Visual Basic                     6            228            421           1180
Ruby                            11            281            181            809
Teamcenter def                   3              3              0            606
lex                              2            103             47            331
CSS                              1             95             32            314
Fortran 90                       1             34              6            244
Tcl/Tk                           2             29              6            212
HTML                             1              8              0             15
SUM:                          2383         130523         154360         744023

How large is ROSE?

To show top level information only (in MB): du -msl * | sort -nr

170	tests
109	projects
90	src
19	docs
16	winspecific
16	ROSE_ResearchPapers
15	binaries
7	scripts
5	LicenseInformation
4	tutorial
4	autom4te.cache
2	libltdl
2	exampleTranslators
2	configure
2	config
2	ChangeLog

Sort directories by their sizes in MegaBytes

du -m | sort -nr >~/size.txt

709	.
250	./.git
245	./.git/objects
243	./.git/objects/pack
170	./tests
109	./projects
90	./src
76	./tests/CompileTests
50	./tests/RunTests
40	./tests/RunTests/FortranTests
34	./tests/RunTests/FortranTests/LANL_POP
29	./tests/RunTests/FortranTests/LANL_POP/netcdf-4.1.1
27	./src/3rdPartyLibraries
23	./tests/roseTests
23	./src/frontend
22	./tests/CompileTests/Fortran_tests
21	./tests/CompilerOptionsTests
19	./docs
18	./tests/CompileTests/RoseExample_tests
18	./src/midend
18	./docs/Rose
16	./winspecific
16	./ROSE_ResearchPapers
15	./tests/CompileTests/Fortran_tests/gfortranTestSuite
15	./binaries/samples
15	./binaries
14	./tests/CompileTests/Fortran_tests/gfortranTestSuite/gfortran.dg
14	./src/roseExtensions
11	./projects/traceAnalysis
10	./tests/CompileTests/A++Code
10	./tests/CompilerOptionsTests/testCpreprocessorOption
10	./tests/CompilerOptionsTests/A++Code
10	./src/roseExtensions/qtWidgets
10	./src/frontend/Disassemblers
10	./projects/symbolicAnalysisFramework
10	./projects/SATIrE
10	./projects/compass
9	./winspecific/MSVS_ROSE
9	./tests/RunTests/A++Tests
9	./tests/roseTests/binaryTests
9	./src/frontend/SageIII
9	./projects/symbolicAnalysisFramework/src
9	./docs/Rose/powerpoints
8	./winspecific/MSVS_project_ROSETTA_empty
8	./projects/simulator
7	./tests/RunTests/FortranTests/LANL_POP_OLD
7	./tests/CompileTests/Cxx_tests
7	./src/midend/programTransformation
7	./src/midend/programAnalysis
7	./src/3rdPartyLibraries/libharu-2.1.0
7	./scripts
7	./projects/symbolicAnalysisFramework/src/mpiAnal
7	./projects/RTC
6	./winspecific/MSVS_ROSE/Debug
6	./tests/RunTests/FortranTests/LANL_POP/netcdf-4.1.1/ncdap_test
6	./tests/roseTests/programAnalysisTests
6	./src/3rdPartyLibraries/ckpt
6	./src/3rdPartyLibraries/antlr-jars
6	./projects/SATIrE/src
5	./tests/RunTests/FortranTests/LANL_POP/pop-distro
5	./tests/RunTests/FortranTests/LANL_POP/netcdf-4.1.1/libcf
5	./tests/CompileTests/ElsaTestCases
5	./src/ROSETTA
5	./src/3rdPartyLibraries/qrose
5	./projects/DatalogAnalysis
5	./projects/backstroke
5	./LicenseInformation
5	./docs/Rose/AstProcessing

To list files based on size

 find . -type f -print0 | xargs -0 ls -s | sort -k1,1rn

241568 ./.git/objects/pack/pack-f366503d291fc33cb201781e641d688390e7f309.pack
13484 ./tests/CompileTests/RoseExample_tests/Cxx_Grammar.h
10240 ./projects/traceAnalysis/vmp-hw-part.trace
6324 ./tests/RunTests/FortranTests/LANL_POP_OLD/poptest.tgz
5828 ./winspecific/MSVS_ROSE/Debug/MSVS_ROSETTA.pdb
4732 ./.git/objects/pack/pack-f366503d291fc33cb201781e641d688390e7f309.idx
4488 ./binaries/samples/bgl-helloworld-mpicc
4488 ./binaries/samples/bgl-helloworld-mpixlc
4080 ./LicenseInformation/edison_group.pdf
3968 ./projects/RTC/tags
3952 ./src/frontend/Disassemblers/x86-InstructionSetReference-NZ.pdf
3908 ./tests/CompileTests/RoseExample_tests/trial_Cxx_Grammar.C
3572 ./winspecific/MSVS_project_ROSETTA_empty/MSVS_project_ROSETTA_empty.ncb
3424 ./src/frontend/Disassemblers/x86-InstructionSetReference-AM.pdf
2868 ./.git/index
2864 ./projects/compassDistribution/COMPASS_SUBMIT.tar.gz
2864 ./projects/COMPASS_SUBMIT.tar.gz
2740 ./ROSE_ResearchPapers/2007-CommunicatingSoftwareArchitectureUsingAUnifiedSingle-ViewVisualization-ICECC
2592 ./docs/Rose/powerpoints/rose_compiler_users.pptx
2428 ./src/3rdPartyLibraries/ckpt/wrapckpt.c
2408 ./projects/DatalogAnalysis/jars/weka.jar
2220 ./scripts/graph.tar
1900 ./src/3rdPartyLibraries/antlr-jars/antlr-3.3-complete.jar
1884 ./src/3rdPartyLibraries/antlr-jars/antlr-3.2.jar
1848 ./src/midend/programTransformation/ompLowering/
1772 ./src/3rdPartyLibraries/qrose/docs/QROSE.pdf
1732 ./tests/CompileTests/Cxx_tests/longFile.C
1724 ./src/midend/programTransformation/ompLowering/
1656 ./ChangeLog
1548 ./tests/roseTests/binaryTests/yicesSemanticsExe.ans
1548 ./tests/roseTests/binaryTests/yicesSemanticsLib.ans
1480 ./ROSE_ResearchPapers/1997-ExpressionTemplatePerformanceIssues-IPPS.pdf
1408 ./docs/Rose/powerpoints/ExaCT_AllHands_March2012_ROSE.pptx


We have read, that the rose compiler is provided under the BSD license. Is every part of the rose compiler under BSD licence and is it free for commercial use?

ROSE is free for commercial use, our research license with EDG has no restrictions (except that we can only release the binary and not the source code). Obviously the EDG part is not released BSD, only the source code part. If you want to build products using ROSE for C/C++, then you should consider contacting EDG for a license to there work then you could build commercial products and sell them; but you don't have to worry about ROSE. I have no idea what ground your on if you build commercial products for sale based on ROSE and just use the EDG binary that we provide. I expect it would be a complicated install for your customers. In general if you are using EDG, and building commercial projects for sale, then I would encourage you to contact EDG and buy a license from them. This is was a few companies have done, and they have consulted EDG on this point. Our goal is to especially encourage open-source C++ work using ROSE. Clearly we derive robustness in C++ in ROSE from the use of EDG, and we are thankful to their liberal research license.


Cannot download the EDG binary tar ball

Information about obtaining EDG is located in EDG Installation Instructions.

How to access EDG or EDG-SAGE connection code?

The connection code that was used to translate EDG’s AST to SAGE III was derived loosely from the EDG C++ source generator and has formed the basis of the SAGE III translator from EDG to SAGE III’s IR.

Under the license we have, the EDG source code and the translation from the EDG AST in distributions are excluded from source release and are made available through a binary format. No part of the EDG work is visible to the user of ROSE. The EDG source are available only to those who have the EDG research or commercial license.

How to speedup compiling ROSE?

You can try only installing rose core by using make rose-core and make install-core. If you have multi-core processors, try to use make -j4 (make by using four processes).

make rose-core -j4
make install-core -j4

I want to use MPI support with ROSE. I reconfigured it with ./configure --prefix=pwd --with-mpi

ROSE handles MPI without any additional flag. If you configure ROSE with --with-mpi it will allow you to traverse the AST in parallel (using MPI). There are examples for this in the projects directory under DistributedMemoryAnalysisCompass.

What libraries and include paths do I need to build an application using ROSE.

Run make installcheck and observe the command lines used to compile the example applications. These command lines will be what you will want to reproduce in your Makefile.

Can ROSE analyze Linux Kernel sources?

Yes, ROSE can analyze Linux Kernal sources.

Can ROSE compile C++ Boost library?

Yes, ROSE can compile Boost.


How to find XYZ in AST?

The usual steps to retrieve information from AST are:

  • prepare a simplest (preferably 5-10 lines only), compilable sample code with the code feature you want to find (e.g array[i][j] if you are curious about how to find use of multi-dimensional arrays in AST), avoid including any headers (#include file.h) to keep the code small.

    • Please note: don't include any headers in the sample code. A header (#include <stdio.h> for example) can bring in thousands of nodes into AST.
  • use dotGeneratorWholeASTGraph to generate a detailed AST dot graph of the input code

  • use zgrviewer-0.8.2's to visualize the dot graph

  • visually/manually locate the information you want in the dot graph, understand what to look and where to look

  • Some sample AST graphs are available at

How to get children of an AST node?

Once you know how to find a child in the AST manually. You can use codes to walk the AST using AST member functions, traversal, or SageInteface functions, etc to retrieve the information you want

  • ROSE provides member access functions like get_X() by default for a child named X. such as get_lhs_operand() for SgBinaryOp with a child named lhs_operand in the AST graph.
  • The names are shown in AST graph as labels of edges from parents to children.

To get a child by index use the function (not recommended though):

virtual SgNode * 	get_traversalSuccessorByIndex (size_t idx)

and/or related, similarly named functions.

How to filter out header files from AST traversals?

By default, AST traversal may visit all AST nodes, including the ones come from headers.

So AST processing classes provide three functions :

  • T traverse (SgNode * node, ..): traverse full AST , nodes which represent code from include files
  • T traverseInputFiles(SgProject* projectNode,..) traverse the subtree of AST which represents the files specified on the command line
  • T traverseWithinFile(SgNode* node,..): only the nodes which represent code of the same file as the start node

Should SgIfStmt::get_true_body() return SgBasicBlock?

Both true/false bodies were SgBasicBlock before.

Later, we decided to have more faithful representation of both blocked (with {...}) and single-statement (without { ..} ) bodies. So they are SgStatement (SgBasicBlock is a subclass of SgStatement) now.

But it seems like the document has not been updated to be consistent with the change.

You have to check if the body is a block or a single statement in your code. Or you can use the following function to ensure all bodies must be SgBasicBlock.

//A wrapper of all ensureBasicBlockAs*() above to ensure the parent of s is a scope statement with list of statements as children, otherwise generate a SgBasicBlock in between.

SgLocatedNode * SageInterface::ensureBasicBlockAsParent (SgStatement *s)

How to handle #include "header.h", #if, #define etc. ?

It is called preprocessing info. within ROSE's AST. They are attached before, after, or within a nearby AST node (only the one with source location information.)

An example translator is provided to traverse the input code's AST and dump information about the found preprocessing information. The source code of this translator is here.

To use the translator:

buildtree/exampleTranslators/defaultTranslator/preprocessingInfoDumper -c main.cxx
Found an IR node with preprocessing Info attached:
(memory address: 0x2b7e1852c7d0 Sage type: SgFunctionDeclaration) in file
/export/tmp.liao6/workspace/userSupport/main.cxx (line 3 column 1)
-------------PreprocessingInfo #0 ----------- :
classification = CpreprocessorIncludeDeclaration:
  String format = #include "all_headers.h"

relative position is = before

SgClassDeclaration::get_definition() returns NULL?

If you look at the whole AST graph carefully, you can find defining and non-defining declarations for the same class.

A symbol is usually associated with a non-defining declaration. A class definition is associated with a defining declaration.

You may want to get the defining declaration from the non-defining declaration before you try to grab the definition, as in this function:

SgFunctionDefinition* getFunctionDefinitionFromDeclaration(const SgFunctionDeclaration* funcDecl) {
  //Get the defining declaration (we don't know if funcDecl is the defining or nonDefining declaration
  SgFunctionDeclaration* funcDefDecl = isSgFunctionDeclaration(funcDecl->get_definingDeclaration()); 
  ROSE_ASSERT(funcDefDecl != NULL);

  //Get the definition from the defining declaration
  SgFunctionDefinition* funcDef = isSgFunctionDefinition(funcDefDecl->get_definition());  
  ROSE_ASSERT(funcDef != NULL);
  return funcDef;

Where is the SgTypedefSeq used?

Any type may be hidden behind a chain of typedefs. The typedef sequence is the list of typedefs that have been applied to any given type.

How to handle arrays?

The first step is to get familiar with the AST representing Array types (SgArrayType) and array references (SgPntrArrRefExp). Then you can retrieve the necessary information from the AST.

To understand array types and array references, Here is one example,

// cat ~/temp/array.c 
int a[5][10][15];  // array declaration, a type is declared
int foo()
  return a[0][1][2]; // a reference to array element

An Array Type is represented by SgArrayType.

int a[5][10][15], corresponding three SgArrayType linked together

List a->get_type() will return the first one

  • SgArrayType_1: (index=5, base_type = SgArrayType_2)
  • SgArrayType_2: (index=10, base_type = SgArrayType_3)
  • SgArrayType_3: (index=15, base_type = SgTypeInt )

So a traverse from the first to the element type will get all dimension sizes 5-10-15

The subtree looks like

      /       \
     5      SgArrayType_2
            /       \
           10      SgArrayType_3  
                     /    \
                    15     SgTypeInt

An array reference is represented by SgPntrArrRefExp

A reference like: a[0][1][2]

  • SgPntrArrRefExp_1 <lhs= ref_2, rhs=2>
  • SgPntrArrRefExp_2 <lhs= ref_3, rhs=1>
  • SgPntrArrRefExp_3 <lhs= SgVarRefExp (a_symbol), rhs=0>

The subtree should look like the following:

    a[0][1][2] //SgPntrArrRefExp
      /    \
  a[0][1]  2 // SgIntVal
    / \
 a[0]  1
  / \
a    0


There are quite a few functions related to array handling here

You can just search "array" to find them:

//Check if an expression is an array access (SgPntrArrRefExp). If so, return its name expression and subscripts 
//if requested. Users can use convertRefToInitializedName() to get the possible name. It does not check if the
//expression is a top level SgPntrArrRefExp. 

SageInterface::isArrayReference (SgExpression *ref, SgExpression **arrayNameExp=NULL, std::vector< SgExpression * > **subscripts=NULL)

// 	returns the array dimensions in an array as defined for arrtype
std::vector< SgExpression * > 	SageInterface::get_C_array_dimensions (const SgArrayType &arrtype)

// 	Get the number of dimensions of an array type. 
 int 	SageInterface::getDimensionCount (SgType *t)

// 	Get the element type of an array. 
 SgType * 	SageInterface::getArrayElementType (SgType *t)

Some example code using these functions can be found here

For example, void linearizeArrayAccess(SgPntrArrRefExp* top_array_ref) rewrites array reference using multiple-dimension subscripts to a reference using one-dimension subscripts:

  • a[i][j] is changed to a[i*col_size +j]
  • a [i][j][k] is changed to a [(i*col_size + j)*K_size +k]

Sample code to handle 1-D array references

For 1-D array element access a[0], the AST with 3 nodes looks like:

   a[0]          // node 1: SgPntrArrRefExp
  /    \
a       0    //node 3:  SgIntVal
// node 2: SgVarRefExp

So the code searching for SgVarRefExp will find a. The next step is to check its type.

SgVarRefExp *vref = ... 

 SgType* t = vref->get_type();

  if (SgArrayType* atype= isSgArrayType(t)) // now you have array type
    // obtain the dimension vector
    vector<SgExpression*> dimensions  =  SageInterface::get_C_array_dimensions (* atype);
    // dimensions.size() should be 1 if you only handle 1-D array types
    if (dimensions.size() ==1)
      SgPntrArrRefExp * arr_ref_exp = vref->get_parent(); // now you get a[0] from a.
      //do your things you want , with a (vref) and a[o] (arr_ref_exp)

   else if (SageInterface::isScalarType(t))// if scalar types, handle them differently

How to add new AST nodes?

There is a section named "1.7 Adding New SAGE III IR Nodes (Developers Only)" in ROSE Developer’s Guide

But before you decide adding new nodes, you may consider if AstAttribute (user defined objects attached to AST) would be sufficient for your problem.

For example, the 1st version of the OpenMP implementation in ROSE (rose/projects/OpenMP_Translator) started by using AstAttribute to represent information parsed from pragmas. Only in the 2nd version we introduced dedicated AST nodes.

There are two separate steps when new kinds of IR nodes are added into ROSE:

  • First step (declaration): Adding class declaration/implementation into ROSE for the new IR nodes. This step is mostly related to ROSETTA.
  • Second step (creation): Creating those new IR nodes at some point: such as somewhere within frontend, midend, or even backend if desired. So this step is decided case by case.

If the new types of IR come from their counterparts in EDG, then modifications to the EDG/SAGE connection code are needed. If not, the EDG/SAGE connection code may be irrelevant.

If you are trying to add new nodes to represent pragma information, you can create your new nodes without involving EDG or its connection to ROSE. You just parse the pragma string in the original AST and create your own nodes to get a new version of AST. Then it should be done.

How does the AST merge work?

tests that demonstrate the AST Merge are in the directory:


(run "make check" to see hundreds of tests go by).

parent vs. scope

An AST node can have a parent node which is different from the its scope.

For example: the struct declaration's parent is the typedef declaration. But the struct's scope is the scope of the typedef declaration.

typedef struct frame {int x;} s_frame;

Parsing text into AST

There is some experimental support to parse simple code text into AST pieces. It is not intended to parse entire source codes. But the support should be able to be extended to handle more types of input.

Some documentation about this work:

Example project using the parser building blocks

  • projects/pragmaParsing should work.


What does the output from a ROSE translator look like?

A great deal of effort has been made to preserve the quality of your original code when regenerated by a translator built using ROSE. ROSE preserves all formatting, comments, and preprocessor control structure.

How to skip system headers in translation?

Often we are only interested in user code. The AST represents all codes from users and system headers. We need to skip things from system headers.

// Final most complete version, skip all header files, we cannot unparse changed AST from header files , at least by default

    if (Inliner::skipHeaders)
      string filename= funcall->get_file_info()->get_filename();
      string suffix = StringUtility ::fileNameSuffix(filename);
      //vector.tcc: This is an internal header file, included by other library headers
      if (suffix=="h" ||suffix=="hpp"|| suffix=="hh"||suffix=="H" ||suffix=="hxx"||suffix=="h++" ||suffix=="tcc")
        return false;

      // also check if it is compiler generated, mostly template instantiations. They are not from user code.
      if (funcall->get_file_info()->isCompilerGenerated() )
        return false;

      // check if the file is within include-staging/ header directories
      if (insideSystemHeader(funcall))
       return false;


//------------partial solutions

bool processStatements(SgNode* n)
  // Skip compiler generated code, system headers, etc.
  if (isSgLocatedNode(n))
    if (isSgLocatedNode(n)->get_file_info()->isCompilerGenerated())
      return false;

This is based on Sg_File_Info

Inside of Sg_File_Info::display(debug.......) 
     isTransformation                      = false 
     isCompilerGenerated                   = true (no position information) 
     isOutputInCodeGeneration              = false 
     isShared                              = false 
     isFrontendSpecific                    = true (part of ROSE support for gnu compatability) 
     isSourcePositionUnavailableInFrontend = false 
     isCommentOrDirective                  = false 
     isToken                               = false 
     file_id  = 2 
     filename = /home/liao6/daily-test-rose/upcwork/install/include/gcc_HEADERS/rose_edg_required_macros_and_functions.h 
     line     = 167  column   = 1 

shared[1] int gsj;
Inside of Sg_File_Info::display(debug.......) 
     isTransformation                      = false 
     isCompilerGenerated                   = false 
     isOutputInCodeGeneration              = false 
     isShared                              = false 
     isFrontendSpecific                    = false 
     isSourcePositionUnavailableInFrontend = false 
     isCommentOrDirective                  = false 
     isToken                               = false 
     filename = /home/liao6/svnrepos/mycode/rose/upc/unshared.upc 
     line     = 6  column = 1 
     file_id  = 1 
     filename = /home/liao6/svnrepos/mycode/rose/upc/unshared.upc 
     line     = 6  column   = 1 

Another way, rose make a copy for all system headers and store them in dedicated paths

  bool insideSystemHeader (SgLocatedNode* node)
    bool rtval = false;
    ROSE_ASSERT (node != NULL);
    Sg_File_Info* finfo = node->get_file_info();
    if (finfo!=NULL)
      string fname = finfo->get_filenameString();
      string buildtree_str1 = string("include-staging/gcc_HEADERS");
      string buildtree_str2 = string("include-staging/g++_HEADERS");
      string installtree_str1 = string("include/edg/gcc_HEADERS");
      string installtree_str2 = string("include/edg/g++_HEADERS");
      // if the file name has a sys header path of either source or build tree
      if ((fname.find (buildtree_str1, 0) != string::npos) ||
          (fname.find (buildtree_str2, 0) != string::npos) ||
          (fname.find (installtree_str1, 0) != string::npos) ||
          (fname.find (installtree_str2, 0) != string::npos)
        rtval = true;
    return rtval;                                                                                                              

Why are there defining and non-defining declarations?

     class X;            // non-defining declaration
     X* foo();           // return type of function will refer to non-defining declaration
     X* xPointer = NULL; // Again, the type will refer to a pointer-to-a-type that will be the non-defining declaration.
     class X {};         // defining declaration

The traversal will visit the declarations, so you will, in this case, see the {\tt class X;} class declaration and the {\tt class X {};} class declaration. In general, all references to the class X will use the non-defining declaration, and only the location were X is defined will be a defining declaration. This is discussed in great detail in the chapter on SAGE III of the ROSE User Manual and a bit in the Doxygen Web pages.

In general, while unparsing, we can't be sure where the definitions associated with declarations are in the AST (without making the code generation significantly more complex).

     class X;
     class X{};

could be unparsed as:

     class X {};  // should have been "class X;"
     class X;     // should have been "class X {};"

The previous example hardly communicates the importance of this concept, but perhaps this one does:

     class X;
     class Y {};
     class X { Y y };

would not compile if unparsed as:

     class X { Y y };
     class Y {};
     class X

Note that we can't just make a declaration as being a defining declarations since they are shared internally (types and symbols can reference them, etc.).

Can ROSE identityTranslator generate 100% identical output file?

No, some modifications are performed automatically

  • Expanding the assert macro.
  • Adding extra brackets around constants of typedef types (e.g. c=Typedef_Example(12); is translated in the output to c = Typedef_Example((12));)
  • Converting NULL to 0.

There is no easy way to avoid these changes currently. Some of them are introduced by the cpp preprocessor. Others are introduced by the EDG front end ROSE uses. 100% faithful source-to-source translation may require significant changes to preprocessing directive handling and the EDG internals.

We have had some internal discussion to save raw token strings into AST and use them to get faithful unparsed code. But this effort is still at its initial stage.

How to insert a header into an input file?

There is an SageInterface function for doing this:

// Insert include "filename" or include <filename> (system header) into the global scope containing the current scope, right after other include XXX.
PreprocessingInfo *     SageInterface::insertHeader (const std::string &filename, PreprocessingInfo::RelativePositionType position=PreprocessingInfo::after, bool isSystemHeader=false, SgScopeStatement *scope=NULL) 

How to copy/clone a function?

We need to be more specific about the function you want to copy. Is it just a prototype function declaration (non-defining declaration in ROSE's term ) or a function with a definition (defining declaration in ROSE's term)?

  • Copying a non-defining function declaration can be achieved by using the following function:
// Build a prototype for an existing function declaration (defining or nondefining is fine).
SgFunctionDeclaration* SageBuilder::buildNondefiningFunctionDeclaration (const SgFunctionDeclaration *funcdecl, SgScopeStatement *scope=NULL)
  • Copying a defining function declaration is semantically a problem since it introduces redefinition of the same function.

It is at least a hack to first introduce something wrong and later correct it. Here is an example translator to do the hack (copy a defining function, rename it, fix its symbol):

#include <rose.h>
#include <stdio.h>
using namespace SageInterface;

int main(int argc, char** argv)
  SgProject* project = frontend(argc, argv);

// Find a defining function named "bar" under project

  SgFunctionDeclaration* func=
findDeclarationStatement<SgFunctionDeclaration> (project, "bar", NULL,
  ROSE_ASSERT (func != NULL);

// Make a copy and set it to a new name
  SgFunctionDeclaration* func_copy =
isSgFunctionDeclaration(copyStatement (func));

// Insert it to a scope
  SgGlobal * glb = getFirstGlobalScope(project);
  appendStatement (func_copy,glb);

#if 0  // fix up the missing symbol, this should be optional now since SageInterface::appendStatement() should handle it transparently. 
  SgFunctionSymbol *func_symbol =  glb->lookup_function_symbol
("bar_copy", func_copy->get_type());
  if (func_symbol == NULL);
    func_symbol = new SgFunctionSymbol (func_copy);
    glb ->insert_symbol("bar_copy", func_symbol);
  return 0;
  • Another thing to consider is if you want to copy a function into another file. You have to change the clone's file location information.

ROSE's unparser checks for Sg_File_Info objects of AST pieces before it decides to print out text format of the AST pieces. Only the AST coming from the same file of the input file or AST generated by transformation should be unparsed by default. For example, some AST subtrees come from an included header. But it is often not desired to unparse the content of an included header.

If the file info is still the original file info, the solution is to set the copied AST to be transformation-generated:

// Recursively set source position info(Sg_File_Info) as transformation generated.
SageInterface::setSourcePositionForTransformation (SgNode *root) 

Can I transform code within a header file?

ROSE does not support writing out changed headers for safety/practical reasons. A changed header has to be saved to another file since writing to the original header is very dangerous (imaging debugging a header translator which corrupts input headers). Then all other files/headers using the changed header have to be updated to use the new header file.

Also all files involved have to be writable by user's translators.

As a result, the current unparser skips subtrees of AST from headers by checking file flags (compiler_generated and/or output_in_code_generation etc.) stored in Sg_File_Info objects.

How to work with formal and actual arguments of functions?

     //Get the actual arguments
     SgExprListExp* actualArguments = NULL;
     if (isSgFunctionCallExp(callSite))
         actualArguments = isSgFunctionCallExp(callSite)->get_args();
     else if (isSgConstructorInitializer(callSite))
         actualArguments = isSgConstructorInitializer(callSite)->get_args();
     ROSE_ASSERT(actualArguments != NULL);

     const SgExpressionPtrList& actualArgList = 

     //Get the formal arguments.
     SgInitializedNamePtrList formalArgList;
     if (calleeDef != NULL)
         formalArgList = calleeDef->get_declaration()->get_args();

     //The number of actual arguments can be less than the number of 
formal arguments (with implicit arguments) or greater
     //than the number of formal arguments (with varargs)

How to translate multiple files scattered in different directories of a project?

A translator built using ROSE is designed to act like a compiler (gcc, g++,gfortran ,etc depending on the input file types). So users of the translator only need to change the build system for the input files to use the translator instead of the original compiler.

If the original compiler used by you implicitly include or link anything, you may have to make the include or linking paths explicit after the change. For example, if mpiCC transparently links to /path/to/mpilib.a, you have to add this linking flag into your modified Makefile.


Generate code into different files

The ROSE outliner has an option to output the generated function into a new file.

// Generate the outlined function into a separated new source file
// -rose:outline:new_file
extern bool useNewFile;

You may want to check how this option is used in the outliner source files to get what you want.

Can ROSE accept incomplete code?

ROSE does not handle incomplete code. Though this might be possible in the future. It would be language dependent and likely depend heavily on some of the language specific tools that we use internally. This is however, not really a priority for our work.

How does the rose handle include directives like iostream?

In the file containing a CPP include directive, the the generated file will be essentially identical (i.e. with the CPP include directive). However, a traversal of the AST will include all the items in the include files (and alternative traversal will allow you to only travers the input file and skips all other files (e.g. header files).
For the case of iostream this will be large, but that is what your program really is, so that is how it has to be represented; such details are important for type analysis and that trickles into every other part of analysis (especially for C++).

You mention that ROSE can refer to code locations as they are before preprocessing, although it inputs preprocessed files. So, where exactly do you get the fine-grained (row,column) info from if you only see the preprocessed files?

The frontend of EDG includes CPP and thus it reports source code positions before the CPP translation, thus we get and save this information. For Fortran we have to handle the CPP translation more explicitly and so we only have the source position after translation (but Fortran is always a bit special when it is preprocessed). I am not aware the CPP will remove whitespace, but it is not an issue since we get the information from EDG where it is generated before CPP translation.

Binary Analysis

What does binary analysis have to do with source-to-source translation?

ROSE was, and is still primarily, a source-to-source compiler. But it turns out that much of what happens in source code analysis can be applied to binary analysis. For instance, parsing of an ELF or PE container (the non-instructions, non-initialized data parts of an executable) is much like parsing source code: some files are read, an abstract syntax tree is produced, and the tree can be analyzed and modified, and unparsed to create a new container. Disassembly produces instructions that have static information much like a statement in a source language. Binaries have control flow graphs, call graphs, data flow, etc. that's very similar to source code. A goal is to be able to unify source and binary analysis as much as possible.

I'm interested in binary analysis. Where is a good place to start?

Most of the binary analysis capabilities are documented in the library itself and are part the doxygen output. The best place to start is the BinaryAnalysis namespace. In short,

  • ROSE parses an ELF (Unix) or PE/DOS (Windows) container (executable, library, object file, or core dump) and builds an abstract syntax tree (AST).
  • It then invokes a BinaryLoader to map sections of that container into a MemoryMap that simulates the address space of a process. The BinaryLoader can also do limited dynamic linking and relocation fixups.
  • Non-container sources are then mapped into memory. These are things like files of raw data that aren't in an ELF or PE container (e.g., some firmware and memory dumps), Motorola S-records, memory of a running process, etc.
  • A Disassembler and Partitioner then try to distinguish code from data, disassemble instructions and static data, and partition instructions into basic blocks functions. The instructions are inserted into the AST.
  • Instruction semantics map instructions into low-level RISC-like operations. Most analysis occurs at this level.
  • Various kins of analysis can be performed and can make limited changes to the AST.
  • An AsmUnparser can generate an assembly listing (for human consumption, not reassembly), the ::backend can generate a new executable, the GraphViz class can generate graphs.

Why is binary analysis slow, and what can I do about it?

ROSE has extensive invariant checks (i.e., assert), encapsulation, and polymorphic classes. When compiling ROSE, make sure to turn on optimizations (e.g., -O3 -fomit-frame-pointer), turn off assertion checking (-DNDEBUG), and turn off interator checking (i.e., don't define _GLIBCXX_DEBUG). Also use optimized versions of libraries, especially Boost libraries. Doing these can easily make ROSE three or four times faster.

The disassembler in ROSE is different than most. First of all, by default it uses a recursive descent algorithm that requires a certain amount of reasoning about instructions, some of which may invoke an external SMT solver. One can use a linear disassembly instead (see projects/BinaryAnalysisTools/linearDisassemble.C) but then most analysis capabilities are sacrificed because there's no control flow graph. Other options are to turn off SMT usage or use the Yices library instead of a separate SMT executable. Secondly, being an analysis framework, ROSE's disassembler and instruction semantics APIs are more modular, complex, and configurable than simple disassemblers, and this comes at a cost.

What is "partitioning"?

Unlike many disassemblers that use a linear sweep through memory disassembling everything they encounter, ROSE uses a recursive approach. It starts with a set of addresses obtained from the user or from parsing the ELF or PE container, disassembles an instruction, and then uses instruction semantics to determine the next possible values for the instruction pointer register. It then adds those addresses to its work list, repeating until the worklist is empty. This not only generates the individual instructions (SgAsmInstruction), but also a global control flow graph (CFG). The next step is to group instructions together to form functions, essentially coloring parts of the control flow graph, thereby partitioning its vertices into smaller subgraphs.

Why does ROSE use basic blocks?

ROSE's definition of basic block (BB) relaxes the usual requirement that a BB's instructions be contiguous and non-overlapping in memory, and instead uses only control flow properties: a BB's instructions are always executed sequentially from the first to the last. ROSE uses BBs in most of its analysis because at the semantics level, a BB and an individual instruction are indistinguishable, but a BB "does more". Most analysis is more efficient when it's able to perform larger units of work.

Is there an easy way to overwrite part of virtual memory?

Sometimes one wants to replace (or insert) an instruction or some data in the virtual memory space, and figuring out what bytes of the specimen, e.g., ELF file, correspond to the virtual memory addresses can be difficult. But most binary analysis tools have an easy command-line way to modify the virtual address space before any disassembly occurs. Usually, a binary specimen is described by one command-line argument, such as the name of an ecutable file like "/home/user/a.out", but most tools actually understand that a specimen can be described by more than one argument. Therefore, you could use the file name as the first argument, followed by additional arguments that modify the address space. These additional arguments have various forms which are described in the "--help" output of most commands. For instance, if you want to replace a four-byte instruction at virtual address 0x26a5bc with i386 NOP instructions, each being a single 0x90 byte, you'd use "data:0x26a5bc=rx::'0x90 0x90 0x90 0x90'". The "=rx" means the data should be mapped with read and execute permission.

Why is ROSE's CFG structured the way it is?

ROSE has many CFGs that it uses internally, some of which are exposed through an API. For instance, the Partitioner2 has a CFG that it creates while discovering instructions, and a resulting global CFG at the end. Many users find this CFG convenient for analysis, but others need things to be a bit different. The good news is that CFGs are copiable, so all one needs to do is copy the CFG and then make any adjustments he needs for his analysis. There are also V_USER_DEFINED and E_USER_DEFINED vertex and edge types if one needs to insert special things into the graph (they carry no data, so a separate lookup table may be required). The Partitioner2 basic blocks, functions, etc. have a mechanism for attaching arbitrary data.

Specifically, the Partitioner2 CFG has a single vertex to represent an indeterminate address, and all function return sites branch to this vertex. A function call site has an extra "call-return" edge from the call site to the point where the called function would return.

Why not store semantics in the AST?

Instruction semantics are encoded as C++ code instead of explicitly stored in the AST as subtrees of SgAsmInstruction for two main reasons:

  • Storing them in the AST would make the AST many times larger. Consider the x86 parity status flag that needs to be set based on whether the result of an arithmetic operation has an odd or even number of bits set. This is compounded by the fact that subtrees cannot be shared in the AST because there are no immutability guarantees and because each SgNode has only one parent pointer.
  • Implementing semantics in C++ code means that users have ample opportunities to control how and what gets stored for the semantics.

But having said that, ROSE does have the ability to represent semantics within the AST. See StaticSemantics. The approach used by this relatively simple semantic domain can be used to construct pretty much any kind of semantic data structure the user wants.

Why not use SgExpression AST nodes as symbolic expression?

We decided to use SymbolicExpr instead because:

  • Symbolic expressions are used in many places besides abstract syntax trees. E.g., as intermediate and final results in an analysis.
  • AST node design prevents common subexpressions from sharing the same expression trees. In particular, SymbolicExpr nodes are immutable, AST nodes are not. Also AST nodes point to their unique parent.
  • AST nodes don't have an easy way to track ownership, which becomes important when multiple analyses are using and generating expressions and who knows what has references to parts of a tree. We end up in a situation where no analysis frees anything because it's afraid that something else might be referencing it still.

Is it possible to generate an SgExpression AST from semantics?

Yes. In fact, the semantics were designed specifically for generating a variety of data structures, and the ROSE AST is just one possibility. Some examples:

  • The SymbolicSemantics domain generates SymbolicExpr trees from the semantics encoded in the various Dispatcher classes.
  • SMT solver input is generated from SymbolicExpr. It could have also been generated directly from the semantics instead of going through this intermediate step, but the intermediate step is important because, as stated in another answer, InsnInstructionExpr is used for things besides instruction semantics.

By default, the AST is not populated with instruction semantics because we've found the virtual representation to be more flexible and uses less memory. But if you want them, see StaticSemantics.

How does one access the memory of a binary specimen?

ROSE provides at least two levels of access. If all you've done is parse a binary container (ELF, PE) then you can traverse the AST to find the SgAsmGenericFile or SgAsmGenericSection in which you're interested and use one of its methods.

If the BinaryLoader has executed then you can traverse the AST to find a SgAsmInterpretation, which has a method to return a MemoryMap.

How can semantics be made to use specimen memory?

Some implementations of RiscOperators have a method that can be called to set the memory map. However, many of these semantics will only read from memory locations that are non-writable because they assume that such memory will not change during the execution of a program. You might need to remove the write access for parts of a memory map, and this can be done by copying the map (a cheap operation) and using the MemoryMap API.

But let's say you want to use a semantic domain like SymbolicSemantics that doesn't have such a method. In that case, you'll need to subclass RiscOperators and augment its readMemory method to do what you want, namely, read the memory and construct a new symbolic value from what was read.

How do I add a data member to an existing SValue?

To extend an SValue, create a new SValue class that inherits from an existing SValue. InstructionSemantics2::BaseSemantics::SValue is the base class of the SValue hierarchy. As described here, the convention is that each semantic domain has a number of main components enclosed in a namespace. If you're not overriding all of the components then add typedefs for those that you're not overriding in order that all the main components are defined in your namespace.

Within your new SValue class, add whatever members you need. You may inherit or augment the virtual constructors; they are: undefined_, number_, boolean_, and copy (all but boolean_ are pure virtual in the base class). For most other semantic classes the virtual constructors are named create and clone. You'll also need a static method to create new allocated instances; the convention is to name this method instance. You might want to augment the may_equal and must_equal if your new data member affects value equality. You might also want the print method to output your new data members, and perhaps a new Formatter to control whether and how those members are printed.

If your new SValue data members are set/modified as the result of some RISC operation, then you must also define a new RiscOperators class that inherits ultimately from InstructionSemantics2::BaseSemantics::RiscOperators. Usually your RiscOperators will inherit from the RiscOperators defined in the same namespace as your new SValue class's superclass. You need to override only those RiscOperators operators whose behavior will be different (a list can be found here), plus provide implementations for the virtual constructors. Any operators that you don't override will use the virtual constructor mechanism (the SValue virtual constructors mentioned above) to construct your new SValue objects from a prototypical SValue specified when the RiscOperators was created. Your new RiscOperators methods should never create an SValue using a specific class name–doing so will make your operators less user friendly.

We highly recommend that you test your new semantic domain with InstructionSemantics2::TestSemantics, which tries to catch most type-related errors in your new classes and their direct super classes. If you plan to submit your semantics back to ROSE (or you want an easy way to test your semantics against real programs), then the projects/BinaryAnalysisTools/debugSemantics.C should be modified to support your semantics domain.

Can I attach information to symbolic expressions?

Each symbolic expression node can store one user-defined datum of arbitrary, copyable type via SymbolicExpr::Node::userData property. The user data is mutable, does not participate in expression hashes, and can be clobbered by ROSE's built-in simplification layer. Users can change the value at any time from any thread (with user-supplied synchronization) regardless of how many expressions are sharing this subexpression. The boost::any type is used as the storage machanism because the AST attribute storage mechanism is too expensive (it's common to have hundreds of millions of these nodes creating during an analysis).

How is the binary analysis capability in ROSE?

ROSE has various binary disassemblers (x86, ARM, MIPS, PowerPC) that, like source code analysis, create an internal representation of the binary in the form of an AST. Although the types of AST nodes for source and binaries are largely disjoint, one can analyze the binary AST using concepts similar to source analysis. ROSE has a few binary analyses. Here are some examples:

  • Control flow graphs, both virtual and using Boost Graph Library.
  • Function call graphs.
  • Operations on control flow graphs: dominator, post-dominator
  • Pointer detection analysis that tries to figure out which memory locations are used as if they were pointers in a higher level language.
  • Instruction partitioning: figuring out how to group instructions into basic blocks, and how to group basic blocks into functions when all you have is a list of instructions. Its accuracy on automatically partitioning stripped, obfuscated code has been shown to be better than the best disassemblers that use debugging info and symbol tables.
  • Instruction semantics for x86. This is an area of active development but supports only 32-bit integer instructions. We plan to add floating point, SIMD, 64-bit, other architectures, and a simpler API. But even as it stands, it is complete enough to simulate entire ELF executables (even "vi"). See next bullet
  • An x86 simulator for ELF executables. This project is able to simulate how the Linux kernel loads an executable, and the various system calls made by the executable. It it complete enough to simulate many Linux programs, but also provides callback points for the user to insert various kinds of analyses. For instance, you could use it to disassemble an entire process after it has been dynamically linked. There are many examples in the projects/simulator directory. In contrast to simulators like Qemu, Bochs, valgrind, VirtualBox, VMware, etc. where speed is a primary design driver, the ROSE simulator is designed to provide user-level access to as many aspects of execution as possible.
  • Plugins for instruction semantics. Instruction semantics is written in such a way that different "semantic domains" can be plugged in. ROSE has a symbolic domain, an interval domain, and a partial-symbolic domain. The symbolic domain can be used in conjunction with an SMT solver (currently supporting Yices). The interval domain is actually sets of intervals, and is binary-arithmetic-aware (i.e., correctly handles overflows, etc on a fixed word size). The partial-symbolic domain uses single-node expressions in order to optimize for speed and size at the expense of accuracy. Users can and have written other domains, and a new API (in the works) will make this even easier.
  • Examples of data-flow analysis (e.g., the pointer analysis already mentioned), but not a well defined framework yet (someone is working on one). Currently, data-flow type analyses are implemented using the instruction semantics support: as each instruction is "executed" the domain in which it executes causes the data to flow in the machine state. Each analysis provides its own flow equation to handle the points where control flow joins from two or more directions; and provides its own "next-instruction" function to iterate over the control flow graph.
  • Clone detection of various formats: various forms of syntactic, including one using locality-sensitive hashing; and semantic clone detection via fuzz testing in a simulator.

Daily Work

git clone returns error: SSL certificate problem?


git clone
Cloning into rose...
error: SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed while accessing

fatal: HTTP request failed

The reason may be that you are behind a firewall which tweaks the original SSL certification.

Solutions: Tell cURL to not check for SSL certificates:

#Solution 1: Environment variable (temporary)
      $ env GIT_SSL_NO_VERIFY=true git pull

# Solution 2: git-config (permanent)
      # set local configuration
      $ git config --local http.sslVerify false

# Solution 2:  set global configuration
      $ git config --global http.sslVerify false

What is the best IDE for ROSE developers?

There may not be a widely recognized best integrated development environment. But developers have reported that they are using

  • vim
  • emacs
  • KDevelop
  • Source Navigator
  • Eclipse
  • Netbeans

The thing is that ROSE is huge and has some ridiculously large generated source file (CxxGrammar.h and CxxGrammar.C are generated in the build tree for example). So many code browsers may have trouble in handling ROSE.

How do I debug my transformation?

There are a couple of ways to debug your transformation, but in general the process starts with knowing exactly what you want to accomplish. An example of your transformation on a specific input code is particularly useful. Depending on the type of transformation, there are different mechanisms within ROSE to support the development of a transformation. Available mechanisms include (in decreasing levels of abstractions):

  • String-Based Specification. A transformation may specify new code to be inserted into the AST by specifying the new code as a source code string. Functions are included to permit insert(), replace(), remove().
  • Calling Predefined Transformations. There are a number of predefined optimizing transformations (loop optimizations) that may be called directly within a translator built using ROSE.
  • Explicit AST Manipulation. The lowest level is to manipulate the AST directly. Numerous functions within SAGE III are provided to support this, but of course it is rather tedious.

How do I use the SQLite database?

ROSE has a connection to SQLite, but you must run configure with the correct command-line options to enable it. Example scripts to configure ROSE to use SQLite are in the ROSE/scripts directory. Another detail is that SQLite development generally lags behind ROSE in the use of the newest versions of compilers. So you are likely to be forced to use an older version of your compiler (particularly with GNU g++).


What is the status for supporting Windows?

We are currently developing support for ROSE on windows.

You can’t perform that action at this time.