Adversarial Examples for Models of Code - Code2vec

THIS IS A DEPRECATED REPOSITORY. PLEASE REFER TO THIS LINK.

Adversarial Examples for Models of Code - Code2vec

An adversary for Code2vec - neural network for learning distributed representations of code. This is an official implemention of the model described in:

Noam Yefet, Uri Alon and Eran Yahav, "Adversarial Examples for Models of Code", 2019 https://arxiv.org/abs/1910.07517

This is a TensorFlow implementation , designed to be easy and useful in research, and for experimenting with new ideas for attacks in machine learning for code tasks. Contributions are welcome.

_--load _ - the path to the pretrained model.
_--load_dict _ - the path to the preprocessed dictionary.
_--adversarial_deadcode _ - use DeadCode attack (note: you should also specify the path to the deadcode dataset)
_--adversarial_type _ - targeted\nontargeted.
_--adversarial_target _ - specify the desired target (for the "targeted" type). Names seperated by '|" (e.g. "merge|from")

You can also determine the BFS search's depth and width by setting the --adversarial_depth , --adversarial_topk parameters respectively (2 by default).

Manually examine adversarial examples

You can run the examples we provided in the paper on the Code2vec's online demo. available at https://code2vec.org/.

You can copy&paste the sort example from here
you can type the following code in each example to get Prediction of sort:

int introsorter = 0;

Defense

You can run the Outlier Detection defense by adding the --guard_input with threshold to either:

regular evaluation, e.g. :

python3 code2vec.py --load models/java-large/saved_model_iter3 --test data/java_large_adversarial/java_large_adversarial.test.c2v --guard_input 2.7

adversarial evaluation. e.g.:

python3 code2vec.py --load models/java-large/saved_model_iter3 --load_dict data/java_large_adversarial/java-large --test data/java_large_adversarial/java_large_adversarial.test.c2v --test_adversarial --adversarial_type targeted --adversarial_target add --guard_input 2.7

Configuration

Changing hyper-parameters is possible by editing the file config.py. Here are some of the parameters and their description:

config.MAX_WORDS_FROM_VOCAB_FOR_ADVERSARIAL = 100000

The vocabulary size of the adversary.

config.ADVERSARIAL_MINI_BATCH_SIZE = 256

set the batch size for gradients step of the adversary.

config.TEST_BATCH_SIZE = config.BATCH_SIZE = 1024

Batch size in evaluating. Affects only the evaluation speed and memory consumption, does not affect the results.

config.READING_BATCH_SIZE = 1300 * 4

The batch size of reading text lines to the queue that feeds examples to the network during training.

config.NUM_BATCHING_THREADS = 2

The number of threads enqueuing examples.

config.BATCH_QUEUE_SIZE = 300000

Max number of elements in the feeding queue.

config.DATA_NUM_CONTEXTS = 200

The number of contexts in a single example, as was created in preprocessing.

config.MAX_CONTEXTS = 200

The number of contexts to use in each example.

config.WORDS_VOCAB_SIZE = 1301136

The max size of the token vocabulary.

config.TARGET_VOCAB_SIZE = 261245

The max size of the target words vocabulary.

config.PATHS_VOCAB_SIZE = 911417

The max size of the path vocabulary.

config.EMBEDDINGS_SIZE = 128

Embedding size for tokens and paths.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
JavaExtractor		JavaExtractor
images		images
test_input		test_input
.gitignore		.gitignore
Input.java		Input.java
LICENSE		LICENSE
PathContextReader.py		PathContextReader.py
README.md		README.md
__init__.py		__init__.py
adversarialsearcher.py		adversarialsearcher.py
batch_adversarial_search.py		batch_adversarial_search.py
build_extractor.sh		build_extractor.sh
code2vec.py		code2vec.py
codeguard.py		codeguard.py
common.py		common.py
common_adversarial.py		common_adversarial.py
extractor.py		extractor.py
interactive_predict.py		interactive_predict.py
interactive_predict_adversarial_search.py		interactive_predict_adversarial_search.py
model.py		model.py
preprocess.py		preprocess.py
preprocess.sh		preprocess.sh
preprocess_aug-optimized.py		preprocess_aug-optimized.py
preprocess_aug.py		preprocess_aug.py
preprocess_aug.sh		preprocess_aug.sh
preprocess_test.sh		preprocess_test.sh
preprocess_test_batch.py		preprocess_test_batch.py
realextractor.py		realextractor.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Adversarial Examples for Models of Code - Code2vec

Table of Contents

Requirements

Quickstart

Step 0: Cloning this repository

Step 1: Creating a new dataset from java sources

Download our preprocessed dataset (compressed: 200Mb, extracted 1Gb)

Step 2: Downloading a trained model

Step 3: Run adversary on the trained model

Manually examine adversarial examples

Defense

Configuration

config.MAX_WORDS_FROM_VOCAB_FOR_ADVERSARIAL = 100000

config.ADVERSARIAL_MINI_BATCH_SIZE = 256

config.TEST_BATCH_SIZE = config.BATCH_SIZE = 1024

config.READING_BATCH_SIZE = 1300 * 4

config.NUM_BATCHING_THREADS = 2

config.BATCH_QUEUE_SIZE = 300000

config.DATA_NUM_CONTEXTS = 200

config.MAX_CONTEXTS = 200

config.WORDS_VOCAB_SIZE = 1301136

config.TARGET_VOCAB_SIZE = 261245

config.PATHS_VOCAB_SIZE = 911417

config.EMBEDDINGS_SIZE = 128

About

Uh oh!

Releases

Packages

Languages

License

noamyft/code2vec

Folders and files

Latest commit

History

Repository files navigation

Adversarial Examples for Models of Code - Code2vec

Table of Contents

Requirements

Quickstart

Step 0: Cloning this repository

Step 1: Creating a new dataset from java sources

Download our preprocessed dataset (compressed: 200Mb, extracted 1Gb)

Step 2: Downloading a trained model

Step 3: Run adversary on the trained model

Manually examine adversarial examples

Defense

Configuration

config.MAX_WORDS_FROM_VOCAB_FOR_ADVERSARIAL = 100000

config.ADVERSARIAL_MINI_BATCH_SIZE = 256

config.TEST_BATCH_SIZE = config.BATCH_SIZE = 1024

config.READING_BATCH_SIZE = 1300 * 4

config.NUM_BATCHING_THREADS = 2

config.BATCH_QUEUE_SIZE = 300000

config.DATA_NUM_CONTEXTS = 200

config.MAX_CONTEXTS = 200

config.WORDS_VOCAB_SIZE = 1301136

config.TARGET_VOCAB_SIZE = 261245

config.PATHS_VOCAB_SIZE = 911417

config.EMBEDDINGS_SIZE = 128

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages