Given a code and a collection of candidates as the input, the task is to return Top K codes with the same semantic. Models are evaluated by MAP@R score. MAP@R is defined as the mean of average precision scores, each of which is evaluated for retrieving R most similar samples given a query. For a code (query), R is the number of other codes in the same class, i.e. R=499 in this dataset.
We use POJ-104 dataset on this task.
1.Download dataset from website or run the following command:
cd dataset
pip install gdown
gdown https://drive.google.com/uc?id=0B2i-vWnOu7MxVlJwQXN6eVNONUU
tar -xvf programs.tar.gz
cd ..
2.Preprocess data
cd dataset
python preprocess.py
cd ..
After preprocessing dataset, you can obtain three .jsonl files, i.e. train.jsonl, valid.jsonl, test.jsonl
For each file, each line in the uncompressed file represents one function. One row is illustrated below.
- code: the source code
- label: the number of problem that the source code solves
- index: the index of example
Data statistics of the dataset are shown in the below table:
#Problems | #Examples | |
---|---|---|
Train | 64 | 32,000 |
Dev | 16 | 8,000 |
Test | 24 | 12,000 |
We provide a script to evaluate predictions for this task, and report MAP@R score.
Given a codes file evaluator/test.jsonl:
{"label": "65", "index": "0", "code": "function0"}
{"label": "65", "index": "1", "code": "function1"}
{"label": "65", "index": "2", "code": "function2"}
{"label": "66", "index": "3", "code": "function3"}
{"label": "66", "index": "4", "code": "function4"}
{"label": "66", "index": "5", "code": "function5"}
We first extract answers from codes file.
python evaluator/extract_answers.py -c evaluator/test.jsonl -o evaluator/answers.jsonl
The answers is:
cat evaluator/answers.jsonl
{"index": "0", "answers": ["1", "2"]}
{"index": "1", "answers": ["0", "2"]}
{"index": "2", "answers": ["0", "1"]}
{"index": "4", "answers": ["3", "5"]}
{"index": "3", "answers": ["4", "5"]}
{"index": "5", "answers": ["4", "3"]}
Report MAP@R score
python evaluator/evaluator.py -a evaluator/answers.jsonl -p evaluator/predictions.jsonl
{'MAP@R': 0.5833}
For each index, return Top K (K=2 for this example, but K=499 in the task) codes. For example:
cat evaluator/predictions.jsonl
{"index": "0", "answers": ["3", "2"]}
{"index": "1", "answers": ["0", "4"]}
{"index": "2", "answers": ["0", "1"]}
{"index": "4", "answers": ["1", "5"]}
{"index": "3", "answers": ["4", "2"]}
{"index": "5", "answers": ["4", "3"]}
We also provide a pipeline that fine-tunes CodeBERT on this task.
cd code
python run.py \
--output_dir=./saved_models \
--model_type=roberta \
--config_name=microsoft/codebert-base \
--model_name_or_path=microsoft/codebert-base \
--tokenizer_name=roberta-base \
--do_train \
--train_data_file=../dataset/train.jsonl \
--eval_data_file=../dataset/valid.jsonl \
--test_data_file=../dataset/test.jsonl \
--epoch 2 \
--block_size 400 \
--train_batch_size 8 \
--eval_batch_size 16 \
--learning_rate 2e-5 \
--max_grad_norm 1.0 \
--evaluate_during_training \
--seed 123456 2>&1| tee train.log
cd code
python run.py \
--output_dir=./saved_models \
--model_type=roberta \
--config_name=microsoft/codebert-base \
--model_name_or_path=microsoft/codebert-base \
--tokenizer_name=roberta-base \
--do_eval \
--do_test \
--train_data_file=../dataset/train.jsonl \
--eval_data_file=../dataset/valid.jsonl \
--test_data_file=../dataset/test.jsonl \
--epoch 2 \
--block_size 400 \
--train_batch_size 8 \
--eval_batch_size 16 \
--learning_rate 2e-5 \
--max_grad_norm 1.0 \
--evaluate_during_training \
--seed 123456 2>&1| tee test.log
python ../evaluator/extract_answers.py -c ../dataset/test.jsonl -o saved_models/answers.jsonl
python ../evaluator/evaluator.py -a saved_models/answers.jsonl -p saved_models/predictions.jsonl
{'MAP@R': 0.8267}
The results on the test set are shown as below:
Method | MAP@R(%) |
---|---|
code2vec | 1.98 |
NCC | 39.95 |
NCC-w/0-inst2vec | 54.19 |
Aroma-Dot | 52.08 |
Aroma-Cos | 55.12 |
RoBERTa | 76.67 |
MISIM-GNN | 82.45 |
CodeBERT | 82.67 |
@inproceedings{mou2016convolutional,
title={Convolutional neural networks over tree structures for programming language processing},
author={Mou, Lili and Li, Ge and Zhang, Lu and Wang, Tao and Jin, Zhi},
booktitle={Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence},
pages={1287--1293},
year={2016}
}