ECOOP 2016 Artifact
Our tool adds a new sanitizer to
clang, a const sanitizer. This sanitizer
verifies that instances of
const are treated as transitively immutable. Our
tool will display a warning for any write through a
const type qualifier,
even if a field is explictly
const. The goal of our tool is to investigate
how developers use
const in programs.
- Virtual Machine Package
- Virtual Machine Package (VDI)
- Included Modified LLVM Sources
- Included Modified Clang Sources
- Included Modified compiler-rt Sources
Note that these instructions assume the non-VDI image, running on QEMU. For Windows users, download the VDI image and use VirtualBox. After the Virtual Machine is running, the instructions are identical.
There is an example
virtual machine already
setup. The username and password to this VM are both
ecoop-2016. To run the
VM, with QEMU, do the following:
qemu-system-x86_64 -enable-kvm -m 2048 -drive file=ecoop-2016.qcow2,format=qcow2
The login information you'll always want to use is:
Username: ecoop-2016 Password: ecoop-2016
This VM should have all the requirements needed to run all of the experiments. If you want to SSH into the VM from your host, use the following:
qemu-system-x86_64 -enable-kvm -m 2048 -drive file=ecoop-2016.qcow2,format=qcow2 -net user,hostfwd=tcp::10022-:22 -net nic
Then from your host machine do:
ssh ecoop-2016@localhost -p10022
Note that the VM needs to be connected to the internet in order for some packages to build.
If you are using the VM image that we've distributed, the
clang++ executable on
that VM points to a prebuilt version of our tool. However, we've included the
sources and you can build your own
clang++ from scratch as follows.
Ensure you have the
base-devel group installed and the
enabled. Afterwards you can build the package in the standard Arch Linux
cd ~/abs makepkg -s
After building the tool, you can use our set of 16 small test cases you run to ensure the tool works correctly. Navigate to the test directory to see these tests:
Note: if you haven't built the tool, this directory will not exist, do the following first:
cd ~/abs makepkg -o
The expected test results are embedded within the source files themselves. Any
CHECK are expected to occur on
stderr when the source file is
compiled and run with our tool enabled. Any lines beginning with
should not occur when our tool is used. To run all the tests do the following:
cd ~/abs/src/llvm-csan-0.0.1/build make check-csan
Note 1: you must have built the tool in order to run
Note 2: for the timing results in the paper, ran a debug version of the tool. To build a debug version, follow these steps:
cd ~/abs makepkg -s -p PKGBUILD-debug
Then replace the current version of our packages with these debugging ones with:
pacman -U *.pkg.tar.xz
Manually Running Tests
Instead of automatically running the tests with
make check-csan (and not getting much feedback,
due to the LLVM testing framework), you can also manually run the tests. You do need to
cd ~/abs; makepkg -o as described above, though.
To manually run them yourself do the following:
cd ~ clang++ -fsanitize=const -g ~/abs/src/llvm-csan-0.0.1/projects/compiler-rt/test/csan/const-object.cc -o const-object ./const-object
You may explore all the other tests by exploring
~/abs/src/llvm-csan-0.0.1/projects/compiler-rt/test/csan and running them in
a similar manner.
To use the tool, use
clang++ as you normally would, but add the
-fsanitize=const -g. You should get more precise results if
you disable optimizations and include the frame pointer with
-O0 -fno-omit-frame-pointer. To run the example given in Listing 1 of the
paper, do the following:
cd ~/examples clang++ -std=c++11 -fsanitize=const -g listing-1.cpp
You can run the resulting executable as
./a.out and you should see a
warning. To write to an external log file, use the
log_path option. For
example, to log the results to a file called
listing-1.log do the following:
After running the program again, there should be no extra output on
and there should be a
listing-1.log.XXXXX file in the current directory where
XXXXX are random numbers. Feel free to try it out!
Note that to browse the implementation you have to have the sources extracted. To extract the sources, do the following:
cd ~/abs makepkg -o
The first part of the implementation is getting Clang to annotate definition
expressions of declaration statements so that ConstSanitizer can ignore them.
The code implementing this is in:
~/abs/src/llvm-csan-0.0.1/tools/clang/lib/CodeGen/CGDebugInfo.h. The part
of the code generation we instrument is in
~/abs/src/llvm-csan-0.0.1/tools/clang/lib/CodeGen/CGDecl.cpp within the
The heart of our implementation is located at:
This file corresponds to the instrumentation of LLVM bit code that implements
our runtime const tracking. The computation of the shadow values is in the
The runtime library is located at:
~/abs/src/llvm-csan-0.0.1/projects/compiler-rt/lib/csan/csan.cc. This file
contains the implementation that reports the stack traces at runtime.
The modification to get Clang to recongize our new sanitizer option is located
All experiments are located in the
experiments directory. To instrument a
project, for example Ninja, do the following:
cd ~/experiments python build.py ninja
The build script stores any build-time violations (for instance, that occur
while running a project's tests as part of the build) in the
directory, in a file named
PACKAGE-build.log. Ninja is an example of a project
that runs tests as part of its build.
To create groupings for manual inspection, run
python group.py ninja. The
group.py script collects all results from log files with
the specified project name.
The next subsections give examples of how we obtained the results in the paper.
Similar to the Ninja example above, tests are run as part of the build process. So you may do the following:
cd ~/experiments python build.py protobuf python group.py protobuf-build
These results should be comparable to
organization. Note that running the tests produces many
protobuf-build.log.XXXXX files. While the
group.py script does
combine all build log files, the resulting file still contains a
separate section for each build log file. We manually combined these
sections and report combined results from all build logs.
Note that before manual post-processing we found 216 unique warnings with 169736 occurences. There was one archetype, relating to message targets, we could not determine and did not include in the paper. This archetype had 133 unique warnings with 14638 occurences and were manually identified. We also had a false positive due to incorrect debugging information (we believe). This archetype had 7 unique locations with 27454 occurences. Manually removing these results should exactly match the results in the paper.
Similar to above, tests are run as part of the build process.
We obtained the Fish results by running the shell, following these steps:
cd ~/experiments python build.py fish CSAN_OPTIONS=log_path=fish.log fish/pkg/fish/usr/bin/fish
Then press control-D to exit. Afterwards you can do the same as with Ninja:
python group.py fish
These results should correspond to the paper.
cd ~/experiments python build.py mosh CSAN_OPTIONS=log_path=mosh.log mosh/pkg/mosh/usr/bin/mosh --client=/home/ecoop-2016/experiments/mosh/pkg/mosh/usr/bin/mosh-client --server=/home/ecoop-2016/experiments/mosh/pkg/mosh/usr/bin/mosh-server localhost
Answer yes to the certificate (if prompted) and login using the same information
used for the virtual machine (username
Again, similar to the last case, use the group script:
python group.py mosh
These results should correspond to the paper (there may be more unique locations than in the paper).
Similar to Ninja, LLVM compiles
llvm-tblgen and executes it as part of its
build process. Therefore after running the build script the results should be
cd ~/experiments python group.py llvm-build
Similar to Fish, we run the executable. You may need to also include
LD_LIBRARY_PATH like so:
cd ~/experiments python build.py tesseract CSAN_OPTIONS=log_path=tesseract.log LD_LIBRARY_PATH=tesseract/pkg/tesseract/usr/lib tesseract/pkg/tesseract/usr/bin/tesseract stdin stdout
You should get an error along the lines of "error opening data file", and tesseract immediately exits. However, there will be some results, as before run:
python group.py tesseract
In this case, as part of the build process, the tests are run. Therefore the
ninja-build.log.XXXXX shows what violations occur as part of the test suite.
If you open this file and observe it, the first non-standard library portion of
the stack trace should be in
src/disk_interface_test.cc:226:3 matching the
results of the paper. There should be 4 unique source locations, starting in
the standard library, for all violations. To find these unique source
locations, like for all other experiments, use the group script:
cd ~/experiments python group.py ninja-build
This will group the raw results into unique locations and also give the dynamic violation count.
Wayland / Weston
First build Wayland and install the package:
cd ~/experiments python build.py wayland sudo pacman -U wayland/wayland-1.9.0-1-x86_64.pkg.tar.xz
Then you can build weston:
python build.py weston
Again, run the produced executable:
To collect the timing results, for example for Protobuf, do the following:
cd ~/experiments python time.py protobuf
Note that you'll have to clear all the build files between each run. Do that
with the following command (you need to
cd into the project directory first).
cd ~/experiments/protobuf rm -rf src pkg *.pkg.tar.xz
The resulting files will be in
/tmp/time-protobuf-check. The last 3 lines of the first file indicate how
long it took to build with the tool enabled. The last 3 lines of the second
file indicate how long it took to run the tests with the tool enabled. After
recording these numbers you can do the same procedure with the tool disabled.
To collect the timing (after cleaning) results do:
cd ~/experiments python time-disable-csan.py protobuf
Our results are in the
results directory, organized by project name. These
files represent our findings organized by manually categorizing the violations
and putting them all under the same heading. The remaining results show the
number of violations at each source location. These violations are annotated
with source locations.