Building from Ruotian Luo's code for captioning AND Sandeep Subramanian's seq2seq code
I use these steps from Alexandre Bérard's code
> config/WMT14/download.sh # download WMT14 data into raw_data/WMT14
> config/WMT14/prepare.sh # preprocess the data, and copy the files to data/WMT14
Then run the following to save in h5 files:
> python scripts/prepro_text.py
Training requires some directories for saving the model's snapshots, the tensorboard events
> mkdir -p save events
To train a model under the parameters defined in config.yaml
> python nmt.py -c config.yaml
Check options/opts.py for more about the options.
To evaluate a model:
> python eval.py -c config
To submit jobs via OAR use either train.sh or select_train.sh