Skip to content

horlabs/anonymizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

On Challenges in Anonymizing Source Code

This is the repository to the paper:


Micha Horlboge, Erwin Quiring, Roland Meyer, and Konrad Rieck. I still know it's you! On Challenges in Anonymizing Source Code. Proc. on Privacy Enhancing Technologies 2024(3), 2024.


This document contains instructions on how to generate the datasets and results present in the submission.

Requirements

Our implementation builds on the code transformation framework Code Imitator. We have modified and extended the framework, so that the different protection techniques can be applied and evaluated in a unified manner. This repository contains the resulting modified framework.

To setup the framework, we refer the reader to the build instructions provided in the Github repository. Additionally, in src/LibToolingAST the new files hooks.cpp and libnocstd.c have to be compiled. The corresponding compiler calls are at the top of these files and they are also incorporated into the cmake builds.

Finally the obfuscators Stunnix and Tigress need to be downloaded and installed. We developed fixes to compensate limitations of the evaluation version of Stunnix. These are discussed below.

We provide a Dockerfile to setup the framework in a container. This requires to use BuildKit. As this is sometimes not included in the installation, make sure the buildx plugin is available at your CLI.

Creating Datasets (Anonymization methods)

Next, we describe briefly how to generate the datasets we used in our analysis with the candidates for anonymization. All files can afterwards be controlled for output-equivalence by running test_obfuscated_c.py.

Tigress (Obfuscation 1)

First the dataset must be prepared to run tigress. Therefore call prepare_tigress.sh (or the _advanced option for our improvements) on every file. Afterwards run obfuscate_tigress.sh on every prepared file. There are three options:

  1. --random will activate a random seed. Without this option the seed is fixed.
  2. --rename to obfuscate the files "inplace". Otherwise a new file ..obf.c will be created.
  3. --advanced to activate our improvements.

Stunnix (Obfuscation 2)

First follow the instructions by Stunnix to obfuscate files. We used the evaluation edition for our experiments, which you can download here. Afterwards run stunnix_postprocessing.sh with the output directory as argument.

Normalization

For this purpose, run execute_transformers.py with

python anonymize/execute_transformers.py generate --numbering

Coding style imitation

For this purpose, please refer to the READMEs of Imitator.

Train attribution models

See original documentation for code-imitator. We added feature_extraction_single_c.sh to extract features from C files.

Classifications (Attribution methods)

One single file can be classified using classify.py. To classify whole datasets (on multiple models), this script will do all classifications for datasets and models specified in a copy of this file (named obfus_config.yaml).

Common issues

  • Sometimes, DNS resolution is not working while building the image. Try to add --network host to your command, this should resolve this problem in most cases.

If you are using our code, please cite our PETS paper. You may use the following BibTex entry:

@article{horquimeyrie2024,
  author  = {Micha Horlboge and Erwin Quiring and Roland Meyer and Konrad Rieck},
  journal = {Proc. of the Privacy Enhancing Technologies Symposium ({PETS})},
  title   = {I still know it's you! On Challenges in Anonymizing Source Code},
  year    = 2024,
  number  = 3,
  volume  = 2024
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published