No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Permalink
Failed to load latest commit information.
LICENSE Update LICENSE Feb 19, 2018
README.md Added link to arxiv paper Feb 21, 2018
data.py First commit Feb 20, 2018
decode.py First commit Feb 20, 2018
distill.py First commit Feb 20, 2018
model.py First commit Feb 20, 2018
mscoco.py First commit Feb 20, 2018
run.py First commit Feb 20, 2018
test.py First commit Feb 20, 2018
train.py First commit Feb 20, 2018
utils.py First commit Feb 20, 2018

README.md

Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement

PyTorch implementation of the models described in the paper Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement.

We present code for training and decoding both autoregressive and non-autoregressive models, as well as preprocessed datasets and pretrained models.

Dependencies

Python

  • Python 3.6
  • PyTorch 0.3
  • Numpy
  • NLTK
  • torchtext
  • torchvision

GPU

  • CUDA (we recommend using the latest version. The version 8.0 was used in all our experiments.)

Related code

Downloading Datasets & Pre-trained Models

The original translation corpora can be downloaded from (IWLST'16 En-De, WMT'16 En-Ro, WMT'15 En-De, MS COCO). For the preprocessed corpora and pre-trained models, see below.

Dataset Model
IWSLT'16 En-De Data Models
WMT'16 En-Ro Data Models
WMT'15 En-De Data Models
MS COCO Data Models

Before you run the code

Set correct path to data in data_path() function located in data.py:

Loading & Decoding from Pre-trained Models

  1. For vocab_size, use 60000 for WMT'15 En-De, 40000 for the other translation datasets and 10000 for MS COCO.
  2. For params, use big for WMT'15 En-De and small for the other translation datasets.

Autoregressive

$ python run.py --dataset <dataset> --vocab_size <vocab_size> --ffw_block highway --params <params> --lr_schedule anneal --mode test --debug --load_from <checkpoint>

Non-autoregressive

$ python run.py --dataset <dataset> --vocab_size <vocab_size> --ffw_block highway --params <params> --lr_schedule anneal --fast --valid_repeat_dec 20 --use_argmax --next_dec_input both --mode test --remove_repeats --debug --trg_len_option predict --use_predicted_trg_len --load_from <checkpoint>

For adaptive decoding, add the flag --adaptive_decoding jaccard to the above.

Training New Models

Autoregressive

$ python run.py --dataset <dataset> --vocab_size <vocab_size> --ffw_block highway --params <params> --lr_schedule anneal

Non-autoregressive

$ python run.py --dataset <dataset> --vocab_size <vocab_size> --ffw_block highway --params <params> --lr_schedule anneal --fast --valid_repeat_dec 8 --use_argmax --next_dec_input both --denoising_prob --layerwise_denoising_weight --use_distillation

Training the Length Prediction Model

  1. Take a checkpoint pre-trained non-autoregressive model
  2. Resume training using these in addition to the same flags used in step 1: --load_from <checkpoint> --resume --finetune_trg_len --trg_len_option predict

MS COCO dataset

  • Run pre-trained autoregressive model
python run.py --dataset mscoco --params big --load_vocab --mode test --n_layers 4 --ffw_block highway --debug --load_from mscoco_models_final/ar_model --batch_size 1024
  • Run pre-trained non-autoregressive model
python run.py --dataset mscoco --params big --use_argmax --load_vocab --mode test --n_layers 4 --fast --ffw_block highway --debug --trg_len_option predict --use_predicted_trg_len --load_from mscoco_models_final/nar_model --batch_size 1024
  • Train new autoregressive model
python run.py --dataset mscoco --params big --batch_size 1024 --load_vocab --eval_every 1000 --drop_ratio 0.5 --lr_schedule transformer --n_layers 4
  • Train new non-autoregressive model
python run.py --dataset mscoco --params big --use_argmax --batch_size 1024 --load_vocab --eval_every 1000 --drop_ratio 0.5 --lr_schedule transformer --n_layers 4 --fast --use_distillation --ffw_block highway --denoising_prob 0.5 --layerwise_denoising_weight --load_encoder_from mscoco_models_final/ar_model

After training it, train the length predictor (set correct path in load_from argument)

python run.py --dataset mscoco --params big --use_argmax --batch_size 1024 --load_vocab --mode train --n_layers 4 --fast --ffw_block highway --eval_every 1000 --drop_ratio 0.5 --drop_len_pred 0.0 --lr_schedule anneal --anneal_steps 100000 --use_distillation --load_from mscoco_models/new_nar_model --trg_len_option predict --finetune_trg_len --max_offset 20

Citation

If you find the resources in this repository useful, please consider citing:

@article{Lee:18,
  author    = {Jason Lee and Elman Mansimov and Kyunghyun Cho},
  title     = {Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement},
  year      = {2018},
  journal   = {arXiv preprint arXiv:1802.06901},
}