Non-autoregressive neural machine translation with monolingual data
Paper link:
Improving Non-autoregressive Neural Machine Translation with Monolingual Data (ACL 2020)
Jiawei Zhou, Phillip Keung
From the github repo.
Download the datasets and extract at the current directory. All the corpus are tokenized and BLEU is evaluated on the tokenized corpus.
See the scripts and notes in data_procs, which includes the general pipeline of
- data downloading
- data processing
that are consistent with the NAR literature.
BLEU scores
Model | WMT16 | WMT16 | WMT14 | WMT14 |
---|---|---|---|---|
En -> Ro | Ro -> En | En -> De | De -> En | |
Our NAR baseline | 31.21 | 32.06 | 23.57 | 29.01 |
+ monolingual data | 31.96 | 33.57 | 25.73 | 30.18 |
+ longer training till convergence | 32.30 | 33.56 | 26.54 | 30.80 |
@article{zhou2020improving,
title={Improving Non-autoregressive Neural Machine Translation with Monolingual Data},
author={Zhou, Jiawei and Keung, Phillip},
journal={arXiv preprint arXiv:2005.00932},
year={2020}
}