Mapping Language to Code in a Programmatic Context
Clone or download
Latest commit a2fe2c3 Nov 12, 2018

README.md

Mapping Language to Code in Programmatic Context

Install requirements

Pytorch 0.3 (minor changes needed for 0.4)

pip install antlr4-python3-runtime==4.6
pip install allennlp==0.3.0
pip install ipython

Download data from Google drive

mkdir concode
cd concode

Download data from: https://drive.google.com/drive/folders/1kC6fe7JgOmEHhVFaXjzOmKeatTJy1I1W into this folder.

Create production rules. This restricts the data to 100000 train and 2000 valid/test. If your resources can support it, you can use more.

python build.py -train_file concode/train_shuffled.json -valid_file concode/valid_shuffled.json -test_file concode/test_shuffled.json -output_folder data -train_num 100000 -valid_num 2000

Prepare pytorch datasets

mkdir data/d_100k_762
python preprocess.py -train data/train.dataset -valid data/valid.dataset -save_data data/d_100k_762/concode -train_max 100000 -valid_max 2000

Train seq2seq

python train.py -dropout 0.5 -data data/d_100k_762/concode -save_model data/d_100k_762/s2s -epochs 30 -learning_rate 0.001 -seed 1123 -enc_layers 2 -dec_layers 2 -batch_size 50 -src_word_vec_size 1024 -tgt_word_vec_size 512 -rnn_size 1024 -encoder_type regular -decoder_type regular -copy_attn

Train seq2prod

python train.py -dropout 0.5 -data data/d_100k_762/concode -save_model data/d_100k_762/s2p -epochs 30 -learning_rate 0.001 -seed 1123 -enc_layers 2 -dec_layers 2 -batch_size 20 -src_word_vec_size 1024 -tgt_word_vec_size 512 -rnn_size 1024 -encoder_type regular -decoder_type prod -brnn -copy_attn

Train Concode

python train.py -dropout 0.5 -data data/d_100k_762/concode -save_model data/d_100k_762/concode/ -epochs 30 -learning_rate 0.001 -seed 1123 -enc_layers 2 -dec_layers 2 -batch_size 20 -src_word_vec_size 512 -tgt_word_vec_size 512 -rnn_size 512 -decoder_rnn_size 1024 -encoder_type concode -decoder_type concode -brnn -copy_attn -twostep -method_names -var_names

Prediction:

On Dev:

ipython predict.ipy -- -start 5 -end 30 -beam 3 -models_dir  data/d_100k_762/concode/ -test_file data/valid.dataset -tgt_len 500 

On Test (Use best epoch from dev):

ipython predict.ipy -- -start 15 -end 15 -beam 3 -models_dir  data/d_100k_762/concode/ -test_file data/test.dataset -tgt_len 500 

For other model types, use the appropriate -models_dir.