Branch: master
Find file History
myleott and facebook-github-bot Add fairseq to PyPI (#495)
Summary:
- fairseq can now be installed via pip: `pip install fairseq`
- command-line tools are globally accessible: `fairseq-preprocess`, `fairseq-train`, `fairseq-generate`, etc.
Pull Request resolved: #495

Differential Revision: D14017761

Pulled By: myleott

fbshipit-source-id: 10c9f6634a3056074eac2f33324b4f1f404d4235
Latest commit fbd4cef Feb 9, 2019
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
README.md Add fairseq to PyPI (#495) Feb 9, 2019

README.md

Scaling Neural Machine Translation (Ott et al., 2018)

This page includes instructions for reproducing results from the paper Scaling Neural Machine Translation (Ott et al., 2018).

Pre-trained models

Description Dataset Model Test set(s)
Transformer
(Ott et al., 2018)
WMT14 English-French download (.tar.bz2) newstest2014 (shared vocab):
download (.tar.bz2)
Transformer
(Ott et al., 2018)
WMT16 English-German download (.tar.bz2) newstest2014 (shared vocab):
download (.tar.bz2)

Training a new model on WMT'16 En-De

Please first download the preprocessed WMT'16 En-De data provided by Google. Then:

  1. Extract the WMT'16 En-De data:
$ TEXT=wmt16_en_de_bpe32k
$ mkdir $TEXT
$ tar -xzvf wmt16_en_de.tar.gz -C $TEXT
  1. Preprocess the dataset with a joined dictionary:
$ fairseq-preprocess --source-lang en --target-lang de \
  --trainpref $TEXT/train.tok.clean.bpe.32000 \
  --validpref $TEXT/newstest2013.tok.bpe.32000 \
  --testpref $TEXT/newstest2014.tok.bpe.32000 \
  --destdir data-bin/wmt16_en_de_bpe32k \
  --nwordssrc 32768 --nwordstgt 32768 \
  --joined-dictionary
  1. Train a model:
$ fairseq-train data-bin/wmt16_en_de_bpe32k \
  --arch transformer_vaswani_wmt_en_de_big --share-all-embeddings \
  --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000 \
  --lr 0.0005 --min-lr 1e-09 \
  --dropout 0.3 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
  --max-tokens 3584 \
  --fp16

Note that the --fp16 flag requires you have CUDA 9.1 or greater and a Volta GPU.

If you want to train the above model with big batches (assuming your machine has 8 GPUs):

  • add --update-freq 16 to simulate training on 8*16=128 GPUs
  • increase the learning rate; 0.001 works well for big batches

Citation

@inproceedings{ott2018scaling,
  title = {Scaling Neural Machine Translation},
  author = {Ott, Myle and Edunov, Sergey and Grangier, David and Auli, Michael},
  booktitle = {Proceedings of the Third Conference on Machine Translation (WMT)},
  year = 2018,
}