Branch: master
Find file History
myleott and facebook-github-bot Add fairseq to PyPI (#495)
- fairseq can now be installed via pip: `pip install fairseq`
- command-line tools are globally accessible: `fairseq-preprocess`, `fairseq-train`, `fairseq-generate`, etc.
Pull Request resolved: #495

Differential Revision: D14017761

Pulled By: myleott

fbshipit-source-id: 10c9f6634a3056074eac2f33324b4f1f404d4235
Latest commit fbd4cef Feb 9, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information. Add fairseq to PyPI (#495) Feb 9, 2019

Scaling Neural Machine Translation (Ott et al., 2018)

This page includes instructions for reproducing results from the paper Scaling Neural Machine Translation (Ott et al., 2018).

Pre-trained models

Description Dataset Model Test set(s)
(Ott et al., 2018)
WMT14 English-French download (.tar.bz2) newstest2014 (shared vocab):
download (.tar.bz2)
(Ott et al., 2018)
WMT16 English-German download (.tar.bz2) newstest2014 (shared vocab):
download (.tar.bz2)

Training a new model on WMT'16 En-De

Please first download the preprocessed WMT'16 En-De data provided by Google. Then:

  1. Extract the WMT'16 En-De data:
$ TEXT=wmt16_en_de_bpe32k
$ mkdir $TEXT
$ tar -xzvf wmt16_en_de.tar.gz -C $TEXT
  1. Preprocess the dataset with a joined dictionary:
$ fairseq-preprocess --source-lang en --target-lang de \
  --trainpref $TEXT/train.tok.clean.bpe.32000 \
  --validpref $TEXT/newstest2013.tok.bpe.32000 \
  --testpref $TEXT/newstest2014.tok.bpe.32000 \
  --destdir data-bin/wmt16_en_de_bpe32k \
  --nwordssrc 32768 --nwordstgt 32768 \
  1. Train a model:
$ fairseq-train data-bin/wmt16_en_de_bpe32k \
  --arch transformer_vaswani_wmt_en_de_big --share-all-embeddings \
  --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
  --lr 0.0005 --min-lr 1e-09 \
  --dropout 0.3 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
  --max-tokens 3584 \

Note that the --fp16 flag requires you have CUDA 9.1 or greater and a Volta GPU.

If you want to train the above model with big batches (assuming your machine has 8 GPUs):

  • add --update-freq 16 to simulate training on 8*16=128 GPUs
  • increase the learning rate; 0.001 works well for big batches


  title = {Scaling Neural Machine Translation},
  author = {Ott, Myle and Edunov, Sergey and Grangier, David and Auli, Michael},
  booktitle = {Proceedings of the Third Conference on Machine Translation (WMT)},
  year = 2018,