Attention-based sequence to sequence learning
Switch branches/tags
Nothing to show
Clone or download
Latest commit 66d703d Sep 25, 2018
Failed to load latest commit information.
config Update Jun 18, 2018
scripts Post-editing bar charts Jun 13, 2018
translate Fixed bug in Jun 9, 2018
.gitignore New scripts Aug 16, 2017
LICENSE chmod 777 Jun 7, 2016 Update Sep 25, 2018 Update Feb 1, 2018 Fixed test script Dec 13, 2017 Fixed char level Mar 28, 2017


Attention-based sequence to sequence learning


  • TensorFlow 1.2+ for Python 3
  • YAML and Matplotlib modules for Python 3: sudo apt-get install python3-yaml python3-matplotlib
  • A recent NVIDIA GPU

How to use

Train a model (CONFIG is a YAML configuration file, such as config/default.yaml):

./ CONFIG --train -v 

Translate text using an existing model:


or for interactive decoding:

./ CONFIG --decode

Example English→French model

This is the same model and dataset as Bahdanau et al. 2015.

config/WMT14/    # download WMT14 data into raw_data/WMT14
config/WMT14/     # preprocess the data, and copy the files to data/WMT14
./ config/WMT14/baseline.yaml --train -v   # train a baseline model on this data

You should get similar BLEU scores as these (our model was trained on a single Titan X I for about 4 days).

Dev Test +beam Steps Time
25.04 28.64 29.22 240k 60h
25.25 28.67 29.28 330k 80h

Download this model here. To use this model, just extract the archive into the seq2seq/models folder, and run:

 ./ models/WMT14/config.yaml --decode -v

Example German→English model

This is the same dataset as Ranzato et al. 2015.

./ config/IWSLT14/baseline.yaml --train -v
Dev Test +beam Steps
28.32 25.33 26.74 44k

The model is available for download here.

Audio pre-processing

If you want to use the toolkit for Automatic Speech Recognition (ASR) or Automatic Speech Translation (AST), then you'll need to pre-process your audio files accordingly. This README details how it can be done. You'll need to install the Yaafe library, and use scripts/speech/ to extract MFCCs from a set of wav files.


  • YAML configuration files
  • Beam-search decoder
  • Ensemble decoding
  • Multiple encoders
  • Hierarchical encoder
  • Bidirectional encoder
  • Local attention model
  • Convolutional attention model
  • Detailed logging
  • Periodic BLEU evaluation
  • Periodic checkpoints
  • Multi-task training: train on several tasks at once (e.g. French->English and German->English MT)
  • Subwords training and decoding
  • Input binary features instead of text
  • Pre-processing script: we provide a fully-featured Python script for data pre-processing (vocabulary creation, lowercasing, tokenizing, splitting, etc.)
  • Dynamic RNNs: we use symbolic loops instead of statically unrolled RNNs. This means that we don't mean to manually configure bucket sizes, and that model creation is much faster.