On Challenges in Anonymizing Source Code

This is the repository to the paper:

Micha Horlboge, Erwin Quiring, Roland Meyer, and Konrad Rieck. I still know it's you! On Challenges in Anonymizing Source Code. Proc. on Privacy Enhancing Technologies 2024(3), 2024.

This document contains instructions on how to generate the datasets and results present in the submission.

Requirements

Our implementation builds on the code transformation framework Code Imitator. We have modified and extended the framework, so that the different protection techniques can be applied and evaluated in a unified manner. This repository contains the resulting modified framework.

To setup the framework, we refer the reader to the build instructions provided in the Github repository. Additionally, in src/LibToolingAST the new files hooks.cpp and libnocstd.c have to be compiled. The corresponding compiler calls are at the top of these files and they are also incorporated into the cmake builds.

Finally the obfuscators Stunnix and Tigress need to be downloaded and installed. We developed fixes to compensate limitations of the evaluation version of Stunnix. These are discussed below.

We provide a Dockerfile to setup the framework in a container. This requires to use BuildKit. As this is sometimes not included in the installation, make sure the buildx plugin is available at your CLI.

Creating Datasets (Anonymization methods)

Next, we describe briefly how to generate the datasets we used in our analysis with the candidates for anonymization. All files can afterwards be controlled for output-equivalence by running test_obfuscated_c.py.

Tigress (Obfuscation 1)

First the dataset must be prepared to run tigress. Therefore call prepare_tigress.sh (or the _advanced option for our improvements) on every file. Afterwards run obfuscate_tigress.sh on every prepared file. There are three options:

--random will activate a random seed. Without this option the seed is fixed.
--rename to obfuscate the files "inplace". Otherwise a new file ..obf.c will be created.
--advanced to activate our improvements.

Stunnix (Obfuscation 2)

First follow the instructions by Stunnix to obfuscate files. We used the evaluation edition for our experiments, which you can download here. Afterwards run stunnix_postprocessing.sh with the output directory as argument.

Normalization

For this purpose, run execute_transformers.py with

python anonymize/execute_transformers.py generate --numbering

Coding style imitation

For this purpose, please refer to the READMEs of Imitator.

Train attribution models

See original documentation for code-imitator. We added feature_extraction_single_c.sh to extract features from C files.

Classifications (Attribution methods)

One single file can be classified using classify.py. To classify whole datasets (on multiple models), this script will do all classifications for datasets and models specified in a copy of this file (named obfus_config.yaml).

Common issues

Sometimes, DNS resolution is not working while building the image. Try to add --network host to your command, this should resolve this problem in most cases.

If you are using our code, please cite our PETS paper. You may use the following BibTex entry:

@article{horquimeyrie2024,
  author  = {Micha Horlboge and Erwin Quiring and Roland Meyer and Konrad Rieck},
  journal = {Proc. of the Privacy Enhancing Technologies Symposium ({PETS})},
  title   = {I still know it's you! On Challenges in Anonymizing Source Code},
  year    = 2024,
  number  = 3,
  volume  = 2024
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
docker		docker
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_IMITATOR.md		README_IMITATOR.md
intro-imitator.jpg		intro-imitator.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

docker

docker

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

README_IMITATOR.md

README_IMITATOR.md

intro-imitator.jpg

intro-imitator.jpg

Repository files navigation

On Challenges in Anonymizing Source Code

Requirements

Creating Datasets (Anonymization methods)

Tigress (Obfuscation 1)

Stunnix (Obfuscation 2)

Normalization

Coding style imitation

Train attribution models

Classifications (Attribution methods)

Common issues

About

Releases

Packages

Contributors 2

Languages

License

horlabs/anonymizer

Folders and files

Latest commit

History

Repository files navigation

On Challenges in Anonymizing Source Code

Requirements

Creating Datasets (Anonymization methods)

Tigress (Obfuscation 1)

Stunnix (Obfuscation 2)

Normalization

Coding style imitation

Train attribution models

Classifications (Attribution methods)

Common issues

About

Resources

License

Stars

Watchers

Forks

Languages