The official PyTorch implementation of:
- On the Embeddings of Variables in Recurrent Neural Networks for Source Code [arxiv] (accepted to NAACL'21)
code_completion
: code for the code completion task (additional preprocessing, models, training etc)var_misuse
: code for the variable misuse task (additional preprocessing, models, training etc)
Please refer to these subfolders for each task's instructions.
The experiments were conducted on the Python150k and JavaScript150k datasets, resplitted according to https://github.com/bayesgroup/code_transformers. Please follow this instruction to obtain data.
The experiments were run on a system with Linux 3.10.0 using Tesla V100 GPU. The implementation is based on PyTorch>=1.5.
Running experiments:
- Download and resplit data, see this instruction for details;
- Preprocess data for a task you are interested in, see
code_completion
orvar_misuse
for details; - Run the experiment you are interested in, see
code_completion
orvar_misuse
for details.
Parts of this code are based on the following repositories:
- A Transformer-based Approach for Source Code Summarization
- OpenNMT
- DrQA
- https://github.com/oleges1/code-completion
If you found this code useful, please cite our paper
@inproceedings{chirkova2021embeddings,
title={On the Embeddings of Variables in Recurrent Neural Networks for Source Code},
author={Nadezhda Chirkova},
booktitle={North American Chapter of the Association for Computational Linguistics}
year={2021},
}