Skip to content

elbayadm/seq2seq

Repository files navigation

Seq2seq code in PyTorch

Building from Ruotian Luo's code for captioning AND Sandeep Subramanian's seq2seq code

Data preprocessing:

I use these steps from Alexandre Bérard's code

> config/WMT14/download.sh    # download WMT14 data into raw_data/WMT14
> config/WMT14/prepare.sh     # preprocess the data, and copy the files to data/WMT14

Then run the following to save in h5 files:

> python scripts/prepro_text.py 

Training:

Training requires some directories for saving the model's snapshots, the tensorboard events

> mkdir -p save events

To train a model under the parameters defined in config.yaml

> python nmt.py -c config.yaml 

Check options/opts.py for more about the options.

To evaluate a model:

> python eval.py -c config

To submit jobs via OAR use either train.sh or select_train.sh