Skip to content

Commit

Permalink
[analyzer] On-demand parsing capability for CTU
Browse files Browse the repository at this point in the history
Summary:
Add an option to enable on-demand parsing of needed ASTs during CTU analysis.
Two options are introduced. CTUOnDemandParsing enables the feature, and
CTUOnDemandParsingDatabase specifies the path to a compilation database, which
has all the necessary information to generate the ASTs.

Reviewers: martong, balazske, Szelethus, xazax.hun

Subscribers: ormris, mgorny, whisperity, xazax.hun, baloghadamsoftware, szepet, rnkovacs, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp, Charusso, steakhal, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75665
  • Loading branch information
Endre Fülöp committed Apr 27, 2020
1 parent 7a07641 commit 811c0c9
Show file tree
Hide file tree
Showing 18 changed files with 652 additions and 104 deletions.
225 changes: 192 additions & 33 deletions clang/docs/analyzer/user-docs/CrossTranslationUnit.rst
Expand Up @@ -3,14 +3,33 @@ Cross Translation Unit (CTU) Analysis
=====================================

Normally, static analysis works in the boundary of one translation unit (TU).
However, with additional steps and configuration we can enable the analysis to inline the definition of a function from another TU.
However, with additional steps and configuration we can enable the analysis to inline the definition of a function from
another TU.

.. contents::
:local:

Manual CTU Analysis
-------------------
Overview
________
CTU analysis can be used in a variety of ways. The importing of external TU definitions can work with pre-dumped PCH
files or generating the necessary AST structure on-demand, during the analysis of the main TU. Driving the static
analysis can also be implemented in multiple ways. The most direct way is to specify the necessary commandline options
of the Clang frontend manually (and generate the prerequisite dependencies of the specific import method by hand). This
process can be automated by other tools, like `CodeChecker <https://github.com/Ericsson/codechecker>`_ and scan-build-py
(preference for the former).

PCH-based analysis
__________________
The analysis needs the PCH dumps of all the translations units used in the project.
These can be generated by the Clang Frontend itself, and must be arranged in a specific way in the filesystem.
The index, which maps symbols' USR names to PCH dumps containing them must also be generated by the
`clang-extdef-mapping`. This tool uses a :doc:`compilation database <../../JSONCompilationDatabase>` to
determine the compilation flags used.
The analysis invocation must be provided with the directory which contains the dumps and the mapping files.


Manual CTU Analysis
###################
Let's consider these source files in our minimal example:

.. code-block:: cpp
Expand Down Expand Up @@ -47,7 +66,8 @@ And a compilation database:
]
We'd like to analyze `main.cpp` and discover the division by zero bug.
In order to be able to inline the definition of `foo` from `foo.cpp` first we have to generate the `AST` (or `PCH`) file of `foo.cpp`:
In order to be able to inline the definition of `foo` from `foo.cpp` first we have to generate the `AST` (or `PCH`) file
of `foo.cpp`:

.. code-block:: bash
Expand All @@ -58,7 +78,8 @@ In order to be able to inline the definition of `foo` from `foo.cpp` first we ha
compile_commands.json foo.cpp.ast foo.cpp main.cpp
$
The next step is to create a CTU index file which holds the `USR` name and location of external definitions in the source files:
The next step is to create a CTU index file which holds the `USR` name and location of external definitions in the
source files:

.. code-block:: bash
Expand All @@ -85,47 +106,34 @@ We have to feed Clang with CTU specific extra arguments:
$ pwd
/path/to/your/project
$ clang++ --analyze -Xclang -analyzer-config -Xclang experimental-enable-naive-ctu-analysis=true -Xclang -analyzer-config -Xclang ctu-dir=. -Xclang -analyzer-output=plist-multi-file main.cpp
$ clang++ --analyze \
-Xclang -analyzer-config -Xclang experimental-enable-naive-ctu-analysis=true \
-Xclang -analyzer-config -Xclang ctu-dir=. \
-Xclang -analyzer-config -Xclang ctu-on-demand-parsing=false \
-Xclang -analyzer-output=plist-multi-file \
main.cpp
main.cpp:5:12: warning: Division by zero
return 3 / foo();
~~^~~~~~~
1 warning generated.
$ # The plist file with the result is generated.
$ ls
$ ls -F
compile_commands.json externalDefMap.txt foo.ast foo.cpp foo.cpp.ast main.cpp main.plist
$
This manual procedure is error-prone and not scalable, therefore to analyze real projects it is recommended to use `CodeChecker` or `scan-build-py`.
This manual procedure is error-prone and not scalable, therefore to analyze real projects it is recommended to use
`CodeChecker` or `scan-build-py`.

Automated CTU Analysis with CodeChecker
---------------------------------------
#######################################
The `CodeChecker <https://github.com/Ericsson/codechecker>`_ project fully supports automated CTU analysis with Clang.
Once we have set up the `PATH` environment variable and we activated the python `venv` then it is all it takes:

.. code-block:: bash
$ CodeChecker analyze --ctu compile_commands.json -o reports
[INFO 2019-07-16 17:21] - Pre-analysis started.
[INFO 2019-07-16 17:21] - Collecting data for ctu analysis.
[INFO 2019-07-16 17:21] - [1/2] foo.cpp
[INFO 2019-07-16 17:21] - [2/2] main.cpp
[INFO 2019-07-16 17:21] - Pre-analysis finished.
[INFO 2019-07-16 17:21] - Starting static analysis ...
[INFO 2019-07-16 17:21] - [1/2] clangsa analyzed foo.cpp successfully.
[INFO 2019-07-16 17:21] - [2/2] clangsa analyzed main.cpp successfully.
[INFO 2019-07-16 17:21] - ----==== Summary ====----
[INFO 2019-07-16 17:21] - Successfully analyzed
[INFO 2019-07-16 17:21] - clangsa: 2
[INFO 2019-07-16 17:21] - Total analyzed compilation commands: 2
[INFO 2019-07-16 17:21] - ----=================----
[INFO 2019-07-16 17:21] - Analysis finished.
[INFO 2019-07-16 17:21] - To view results in the terminal use the "CodeChecker parse" command.
[INFO 2019-07-16 17:21] - To store results use the "CodeChecker store" command.
[INFO 2019-07-16 17:21] - See --help and the user guide for further options about parsing and storing the reports.
[INFO 2019-07-16 17:21] - ----=================----
[INFO 2019-07-16 17:21] - Analysis length: 0.659618854523 sec.
$ ls
compile_commands.json foo.cpp foo.cpp.ast main.cpp reports
$ ls -F
compile_commands.json foo.cpp foo.cpp.ast main.cpp reports/
$ tree reports
reports
├── compile_cmd.json
Expand Down Expand Up @@ -174,9 +182,9 @@ Or we can use `CodeChecker parse -e html` to export the results into HTML format
$ firefox html_out/index.html
Automated CTU Analysis with scan-build-py (don't do it)
-------------------------------------------------------
We actively develop CTU with CodeChecker as a "runner" script, `scan-build-py` is not actively developed for CTU.
`scan-build-py` has various errors and issues, expect it to work with the very basic projects only.
#############################################################
We actively develop CTU with CodeChecker as the driver for this feature, `scan-build-py` is not actively developed for CTU.
`scan-build-py` has various errors and issues, expect it to work only with the very basic projects only.

Example usage of scan-build-py:

Expand All @@ -191,3 +199,154 @@ Example usage of scan-build-py:
Opening in existing browser session.
^C
$
On-demand analysis
__________________
The analysis produces the necessary AST structure of external TUs during analysis. This requires the
compilation database in order to determine the exact compiler invocation used for each TU.
The index, which maps function USR names to source files containing them must also be generated by the
`clang-extdef-mapping`. The mapping of external definitions implicitly uses a
:doc:`compilation database <../../JSONCompilationDatabase>` to determine the compilation flags used.
Preferably the same compilation database should be used when generating the external definitions, and
during analysis. The analysis invocation must be provided with the directory which contains the mapping
files, and the compilation database which is used to determine compiler flags.


Manual CTU Analysis
###################

Let's consider these source files in our minimal example:

.. code-block:: cpp
// main.cpp
int foo();
int main() {
return 3 / foo();
}
.. code-block:: cpp
// foo.cpp
int foo() {
return 0;
}
And a compilation database:

.. code-block:: bash
[
{
"directory": "/path/to/your/project",
"command": "clang++ -c foo.cpp -o foo.o",
"file": "foo.cpp"
},
{
"directory": "/path/to/your/project",
"command": "clang++ -c main.cpp -o main.o",
"file": "main.cpp"
}
]
We'd like to analyze `main.cpp` and discover the division by zero bug.
As we are using On-demand mode, we only need to create a CTU index file which holds the `USR` name and location of
external definitions in the source files:

.. code-block:: bash
$ clang-extdef-mapping -p . foo.cpp
c:@F@foo# /path/to/your/project/foo.cpp
$ clang-extdef-mapping -p . foo.cpp > externalDefMap.txt
Now everything is available for the CTU analysis.
We have to feed Clang with CTU specific extra arguments:

.. code-block:: bash
$ pwd
/path/to/your/project
$ clang++ --analyze \
-Xclang -analyzer-config -Xclang experimental-enable-naive-ctu-analysis=true \
-Xclang -analyzer-config -Xclang ctu-dir=. \
-Xclang -analyzer-config -Xclang ctu-on-demand-parsing=true \
-Xclang -analyzer-config -Xclang ctu-on-demand-parsing-database=compile_commands.json \
-Xclang -analyzer-output=plist-multi-file \
main.cpp
main.cpp:5:12: warning: Division by zero
return 3 / foo();
~~^~~~~~~
1 warning generated.
$ # The plist file with the result is generated.
$ ls -F
compile_commands.json externalDefMap.txt foo.cpp main.cpp main.plist
$
This manual procedure is error-prone and not scalable, therefore to analyze real projects it is recommended to use
`CodeChecker` or `scan-build-py`.

Automated CTU Analysis with CodeChecker
#######################################
The `CodeChecker <https://github.com/Ericsson/codechecker>`_ project fully supports automated CTU analysis with Clang.
Once we have set up the `PATH` environment variable and we activated the python `venv` then it is all it takes:

.. code-block:: bash
$ CodeChecker analyze --ctu --ctu-on-demand compile_commands.json -o reports
$ ls -F
compile_commands.json foo.cpp main.cpp reports/
$ tree reports
reports
├── compile_cmd.json
├── compiler_info.json
├── foo.cpp_53f6fbf7ab7ec9931301524b551959e2.plist
├── main.cpp_23db3d8df52ff0812e6e5a03071c8337.plist
├── metadata.json
└── unique_compile_commands.json
0 directories, 6 files
$
The `plist` files contain the results of the analysis, which may be viewed with the regular analysis tools.
E.g. one may use `CodeChecker parse` to view the results in command line:

.. code-block:: bash
$ CodeChecker parse reports
[HIGH] /home/egbomrt/ctu_mini_raw_project/main.cpp:5:12: Division by zero [core.DivideZero]
return 3 / foo();
^
Found 1 defect(s) in main.cpp
----==== Summary ====----
-----------------------
Filename | Report count
-----------------------
main.cpp | 1
-----------------------
-----------------------
Severity | Report count
-----------------------
HIGH | 1
-----------------------
----=================----
Total number of reports: 1
----=================----
Or we can use `CodeChecker parse -e html` to export the results into HTML format:

.. code-block:: bash
$ CodeChecker parse -e html -o html_out reports
$ firefox html_out/index.html
Automated CTU Analysis with scan-build-py (don't do it)
#######################################################
We actively develop CTU with CodeChecker as the driver for feature, `scan-build-py` is not actively developed for CTU.
`scan-build-py` has various errors and issues, expect it to work only with the very basic projects only.

Currently On-demand analysis is not supported with `scan-build-py`.

0 comments on commit 811c0c9

Please sign in to comment.