What is this library?
This library contains the code needed to reproduce all experiments in the paper Open Vocabulary Learning on Source Code with a Graph-Structured Cache.
It's meant to be used along with this data preprocessing library.
How do I run your code?
Install the Conda python package manager. Then follow the instructions here
using the file
environment.yml file in this library's root directory to satisfy the python requirements to run this library's code.
In theory our code is OS-agnostic, but we ran all our experiments on Ubuntu Linux, so that's where you're most likely to have installation success.
Cloud Integration (optional)
To use these features, you'll need the AWS CLI installed and configured, and you need to edit the details in the
experiments/temp.aws_config.py file and rename it
aws_config.py is in the .gitignore since it might contain sensitive info.)
We included as many unit tests as we could. They're in the
tests directory, whose directory structure mirrors that of the rest of the library. (Warning: they take a while to run, and they expect a GPU.)
You can run them from the project root directory with
python -m unittest.
Training and Evaluating models
All code in this library expects to be run from the library's root directory with python running modules as scripts. E.g.
python -m experiments.VarNaming_vocab_comparison.train_models.
The general workflow is
- Create some .gml files with this library.
- Create a
Taskinstance for the task you want the model to perform, as shown in the file
- Turn the task into preprocessed datapoints by running
python -m preprocess_task_for_model.
python -m train_model_on_task.
python -m evaluate_modelto see how your model did on a test set.
Recreating the experiments in the paper
- Use this library with its existing
repositories.txtfile to download 18 maven repositories and preprocess their contents into Augmented ASTs.
- Move the directories produced via step 1. to
s3shared/18_popular_mavens/repositories. (Don't worry if you're not using S3 - it'll still work.)
- Navigate to
experiments/make_train_test_split.shfrom the command line.
- For either the Fill In The Blank experiment (
FITB_vocab_comparison) or the Variable Naming experiment (
python -m experiments.<experiment name>.make_tasks_and_preprocess. (You may need to change some args/kwargs in this file to suit your setup, e.g. changing
'local'if you want to run locally.)
python -m experiments.<experiment name>.train_models. (Again, you may need to change some args/kwargs in this file to suit your setup.)
python -m experiments.<experiment name>.evaluate_models. (Again, you may need to change some args/kwargs in this file to suit your setup.)
Feel free to get in touch with Milan Cvitkovic or any of the other paper authors. We'd love to hear from you!