Skip to content

NMT State of the art

Alexandre Bérard edited this page May 16, 2016 · 5 revisions

Key papers

Multi-task

Multi-source

Rare words problem

Character-level translation

Representations of words are computed from their character sequences.

Word representations are taken from an embedding matrix as usual. However, instead of mapping out-of-vocabulary words to UNK, a RNN is used to compute a representation. An RNN is also used in the decoding to output unknown words.

They do translation at the character level (no tokenization into words).

Transfer-learning

They include monolingual (target) data in the training data by using as source sentence either a dummy symbol, or a synthetic sentence generated by back-translating the target. This method produces good results, but cannot include much more monolingual data than bilingual data.

Includes a language model in the decoding pipeline.

Pre-training using an auto-encoder.

Pre-training of an encoder-decoder model using another source language (with more resources). They freeze the output (English) embeddings during final training, and use a high dropout rate to avoid overfitting.

Misc

To Read

  • Pointing the Unknown Words
  • Tree-to-Sequence Attentional Neural Machine Translation
  • Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism
  • Improved Neural Machine Translation with SMT Features