Skip to content
This repository has been archived by the owner on Mar 30, 2021. It is now read-only.
Daniel Krupp edited this page Jun 14, 2018 · 76 revisions

Clang Cross Translational Unit (CTU) Static Analysis

The goal of this project is to improve the Clang Static Analyzer to be able to detect bugs that span multiple translation units (TUs). CTU analysis has been presented at EuroLLVM '17 (see the submitted Extended abstract for a more in-depth overview.)

Usage

To use CTU static analysis, you need to build a version of Clang which supports this feature. (See in Compilation.) Invoking the analyzer requires some special arguments (for an in-depth explanation, see Approach), we suggest using CodeChecker to invoke the analyzer. (See Cross Translation Unit analysis with CodeChecker.) scan-build-py is currently in the process of supporting CTU.

  1. Compile this clang version. Tips for cmake options for
  • If you only analyze your code using CTU (and not develop CTU), compile clang in production mode : cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_ENABLE_ASSERTIONS=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_INSTALL_PREFIX=<install_directory> ../llvm/
  • For CTU developers, compile clang in debug mode: cmake -DCMAKE_BUILD_TYPE=Debug -DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_ENABLE_ASSERTIONS=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_INSTALL_PREFIX=<install_directory> ../llvm/
  1. Clone CodeChecker from https://github.com/Ericsson/codechecker and create a package.
  2. Add the newly built clang to your PATH.
  3. CodeChecker will use the CTU capable Clang. If all goes well, --ctu switch will be available in CodeChecker help.
  4. The CTU analysis can be performed as described in Cross Translation Unit analysis with CodeChecker.)

To analyze your project in strict mode (error on import failures) pass this parameter to clang:

-Werror=odr

Compilation

You can build a version of Clang by checking out our repository. The commits below tell you which LLVM and clang-tools-extra Git commit to use. To build clang, use the same procedure as usual, but with the commits described below.

Branches

The ctu-os branch collects commits and changes that are currently undergoing review by the community.

ctu-master and ctu-clang5 contain extra functionality that are continuously aimed to make CTU more viable, especially for C++ projects. -master follows the master version of Clang, while -clang5 is branched from the (currently release-candidated) Clang 5.0 version. We suggest using ctu-clang5 to build your Clang binaries from.

Which LLVM commit to use?

Use the commit in the .llvm-commit file.

If you want to use clang-tools-extra (e.g. clang-tidy):

Use the commit in the .cte-commit file.

Debugging and Development

Reproducing a CTU crashes

Assuming that you used CodeChecker to run CTU analysis. You can find clang crashes in the failed folder of the report directory (CodeChecker analyze --ctu -o <report_dir>). This section describes how to reproduce such a crash on an other machine.

These CodeChecker debug tools help to prepare to log file so a CTU can be reproducible on another machine: Debug Scripts

These scripts assume that the CTU clang and CodeChecker are available in the PATH environment variable. The following command prepares the commands which reproduce the error:

prepare_all_cmd_for_ctu.py --sources_root /<path_to>/sources-root/ --report_dir /<path_to>/reports/ --clang /<path_to>/clang++ --clang_plugin_name libericsson --clang_plugin_path /<path_to>/codechecker_core/build/libericsson-checkers.so

Known issues:

  • Error messages may inform you that some directories don't exist. These have to be created manually until this is done by the script itself. By the way these directories are used as current working directory for running the analysis.

  • Some error may occur because of missing header files (stddef.h, stdarg.h, etc.) These have to be copied to their place, where the error message indicates from clang/install/lib/clang//include/.

  • If externalFnMap.txt is missing then it is in the wrong directory. ctu-dir contains a directory named an architecture (e.g. powerpc). This has to be renamed to whatever is needed on the current architecture (x86_64).

How to debug ASTImporter

This CTU implementation heavily relies on the ASTImporter library as it imports the implementation of functions from foreign Translation Units.

It is essential that we get a valid AST after importing. It is a common fault that after import, the resulting merged AST gets corrupt which prevents Clang Static Analyzer to work properly on the AST.

Let's say that we want to verify how myclass.cpp and main.cpp is merged.

#Create the binary dump of the to-be-merged cpp file
clang -cc1 -emit-pch -o myclass.ast ./myclass.cpp
#Call Clang to merge create a textual dump of the ast
clang -cc1 -ast-merge ./myclass.ast -ast-dump main.cpp > merged_ast.txt

#create dump of the single file AST
cat myclass.cpp main.cpp > main_all.cpp
clang -cc1 -ast-dump main_all.cpp > single_file_ast.txt

#compare the ASTs using meld
meld ./single_file_ast.txt merged_ast.txt

any structural deviation in the merged_ast.txt is a potential fault in merge. See for example A typical error in the merged ast

There exists a more advanced ASTImporter implementation by HanNoQ which may contain the import of additional AST nodes: https://github.com/haoNoQ/clang/blob/summary-ipa-draft/lib/AST/ASTImporter.cpp


Approach

Today, Clang SA can perform (context-sensitive) inter-procedural analysis by "inlining" the called function into the callers context. This means that function parameters (including all constraints) are passed to the called function and the return value of the function is passed back to the caller. This works well for function calls within a translation unit, but when the symbolic execution reaches a function that is implemented in another TU, the analyzer engine handles it as "unknown".

In this project we are working on a method which enables CTU analysis by inlining external function definitions using Clang's existing ASTImporter functionality.

The EuroLLVM '17 Extended abstract contains a more in-depth description in white paper style.

Two-pass analysis

To perform the analysis we need to run Clang on the whole source code two times.

1st pass

We generate a binary AST dump (using Clang's -cc1 -emit-pch feature) of each TU into a temporary directory called preanalyze-dir. We collect the Unified Symbol Resolution (USR) of all externally linkable functions into a text file (externalFnMap.txt).

2nd pass

We run the Clang Static Analysis for all translation units, and if during inlining an externally defined function is reached, we look up the definition of that function in the corresponding AST file (based on the info in externalFnMap.txt) and import the function definition into the caller's context using the ASTImpoter library.

Results

We have run comparative analysis on several open source projects, such as openssl, FFMpeg, Git, Xerces, tmux, etc. We found several additional bugs compared to the normal (non cross-translation-unit capable) analysis.

See the results on cc.elte.hu/, with memory usage and result comparison.

Credits

This work is based on earlier work of Aleksei Sidorin, Artem Dergachev, et al. See http://lists.llvm.org/pipermail/cfe-dev/2015-October/045730.html.