This is the repository to the paper:
Micha Horlboge, Erwin Quiring, Roland Meyer, and Konrad Rieck. I still know it's you! On Challenges in Anonymizing Source Code. Proc. on Privacy Enhancing Technologies 2024(3), 2024.
This document contains instructions on how to generate the datasets and results present in the submission.
Our implementation builds on the code transformation framework Code Imitator. We have modified and extended the framework, so that the different protection techniques can be applied and evaluated in a unified manner. This repository contains the resulting modified framework.
To setup the framework, we refer the reader to the
build instructions provided in the Github
repository. Additionally, in src/LibToolingAST
the new files
hooks.cpp and
libnocstd.c have to be compiled. The
corresponding compiler calls are at the top of these files and they are also
incorporated into the cmake builds.
Finally the obfuscators Stunnix and Tigress need to be downloaded and installed. We developed fixes to compensate limitations of the evaluation version of Stunnix. These are discussed below.
We provide a Dockerfile to setup the framework in a container. This requires to use BuildKit. As this is sometimes not included in the installation, make sure the buildx plugin is available at your CLI.
Next, we describe briefly how to generate the datasets we used in our analysis with the candidates for anonymization. All files can afterwards be controlled for output-equivalence by running test_obfuscated_c.py.
First the dataset must be prepared to run tigress. Therefore call
prepare_tigress.sh (or the
_advanced
option for our improvements) on every file. Afterwards
run obfuscate_tigress.sh on
every prepared file. There are three options:
--random
will activate a random seed. Without this option the seed is fixed.--rename
to obfuscate the files "inplace". Otherwise a new file..obf.c
will be created.--advanced
to activate our improvements.
First follow the instructions by Stunnix to obfuscate files. We used the evaluation edition for our experiments, which you can download here. Afterwards run stunnix_postprocessing.sh with the output directory as argument.
For this purpose, run execute_transformers.py with
python anonymize/execute_transformers.py generate --numbering
For this purpose, please refer to the READMEs of Imitator.
See original documentation for code-imitator. We added feature_extraction_single_c.sh to extract features from C files.
One single file can be classified using classify.py. To classify whole datasets (on multiple models), this script will do all classifications for datasets and models specified in a copy of this file (named obfus_config.yaml).
- Sometimes, DNS resolution is not working while building the image. Try to add
--network host
to your command, this should resolve this problem in most cases.
If you are using our code, please cite our PETS paper. You may use the following BibTex entry:
@article{horquimeyrie2024,
author = {Micha Horlboge and Erwin Quiring and Roland Meyer and Konrad Rieck},
journal = {Proc. of the Privacy Enhancing Technologies Symposium ({PETS})},
title = {I still know it's you! On Challenges in Anonymizing Source Code},
year = 2024,
number = 3,
volume = 2024
}