Skip to content
Mapping Language to Code in a Programmatic Context
Python ANTLR Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

Mapping Language to Code in Programmatic Context

Install requirements

Pytorch 0.3 (minor changes needed for 0.4)

pip install antlr4-python3-runtime==4.6
pip install allennlp==0.3.0
pip install ipython

Download data from Google drive

mkdir concode
cd concode

Download data from: into this folder.

Create production rules. This restricts the data to 100000 train and 2000 valid/test. If your resources can support it, you can use more.

python -train_file concode/train_shuffled.json -valid_file concode/valid_shuffled.json -test_file concode/test_shuffled.json -output_folder data -train_num 100000 -valid_num 2000

Prepare pytorch datasets

mkdir data/d_100k_762
python -train data/train.dataset -valid data/valid.dataset -save_data data/d_100k_762/concode -train_max 100000 -valid_max 2000

Train seq2seq

python -dropout 0.5 -data data/d_100k_762/concode -save_model data/d_100k_762/s2s -epochs 30 -learning_rate 0.001 -seed 1123 -enc_layers 2 -dec_layers 2 -batch_size 50 -src_word_vec_size 1024 -tgt_word_vec_size 512 -rnn_size 1024 -encoder_type regular -decoder_type regular -copy_attn

Train seq2prod

python -dropout 0.5 -data data/d_100k_762/concode -save_model data/d_100k_762/s2p -epochs 30 -learning_rate 0.001 -seed 1123 -enc_layers 2 -dec_layers 2 -batch_size 20 -src_word_vec_size 1024 -tgt_word_vec_size 512 -rnn_size 1024 -encoder_type regular -decoder_type prod -brnn -copy_attn

Train Concode

python -dropout 0.5 -data data/d_100k_762/concode -save_model data/d_100k_762/concode/ -epochs 30 -learning_rate 0.001 -seed 1123 -enc_layers 2 -dec_layers 2 -batch_size 20 -src_word_vec_size 512 -tgt_word_vec_size 512 -rnn_size 512 -decoder_rnn_size 1024 -encoder_type concode -decoder_type concode -brnn -copy_attn -twostep -method_names -var_names


On Dev:

ipython predict.ipy -- -start 5 -end 30 -beam 3 -models_dir  data/d_100k_762/concode/ -test_file data/valid.dataset -tgt_len 500 

On Test (Use best epoch from dev):

ipython predict.ipy -- -start 15 -end 15 -beam 3 -models_dir  data/d_100k_762/concode/ -test_file data/test.dataset -tgt_len 500 

For other model types, use the appropriate -models_dir.

You can’t perform that action at this time.