Updates
[July, 2020] Updated repo with scripts and notes on experimental settings
This repo implements the following papers and associated features based on OpenNMT-tf1.15:
Transfer Learning in Multilingual Neural Machine Translation
Adapting Multilingual Neural Machine Translation to Unseen Languages
-
See/run:
./setup-env.sh
Experiments utilize the Ted Talks data, for its low-resource nature (ranging from ~5k to ~200k parallel examples) for more than 50 languages paired with English, from Qi et al.
./scripts/get-data.sh
Prepare data for src/s - tgt/s
pair/s (if flag is specified, tgt-lang-id is appended on the src side):
./scripts/build-training-data.sh ['src1-en en-src1 src2-en en-src2'] [flag] [exp-id]
Preprocess (clean, detokenize, and subword segmentation with sentencepiece):
./scripts/preprocess.sh [exp-id] [subword-size]
Train a parent model, that exhibits a relatively high-resource data (e.g. Portuguese-English / Pt-En
).
./train.sh [exp-id] [gpu-device]
Steps for ProgAdapt of the parent model Pt-En
to child low-resource pair Galician-English / Gl-En
.
Data
./scripts/build-training-data.sh 'gl-en' [child-model_exp-id]
Data Preprocessing
./scripts/preprocess.sh [child-model_exp-id] [subword-size]
ProgAdapt Training
Training first customizes the parent model by taking in to consideration the child model (Gl-En
) newly generated vocabulary:
./train-dynamic-tl.sh [parent-model_exp-id] [child-model_exp-id] [gpu-device]
ProgGrow differs from progAdapt by incorporating the Pt-En
parent model translation direction, while learning the new low-resource pair Gl-En
(child model) direction.
Data
./scripts/build-training-data.sh 'pt-en gl-en' flag [child-model_exp-id]
Data Preprocessing
./scripts/preprocess.sh [child-model_exp-id] [subword-size]
ProgGrow Training
./train-dynamic-tl.sh [parent-model_exp-id] [child-model_exp-id] [gpu-device]
At time of transfer-learning you can optionally:
- Load specific components of the parent model. See
load_weights
in config_adapt.yml for more options:
['encoder', 'decoder', 'shared_embeddings', 'src_embs', 'tgt_embs', 'optim', 'projection']
.
-
Freeze sub-networks (i.e. selectively optimize the encoder or decoder). See
freeze
in config_adapt.yml for options. -
In addition to
encoder
and/ordecoder
only customization, you can pre-train a parent model with anencoder-decoder
shared vocab and customize for the child model. See--shared_vocab
and--new_shared_vocab
options in ./train-dynamic-tl.sh.
Note: to replicate the experiments reported in our work, please see further details in the experimental section of each paper.
@article{lakew2018transfer,
title={Transfer learning in multilingual neural machine translation with dynamic vocabulary},
author={Lakew, Surafel M and Erofeeva, Aliia and Negri, Matteo and Federico, Marcello and Turchi, Marco},
journal={arXiv preprint arXiv:1811.01137},
year={2018}
}
@article{lakew2019adapting,
title={Adapting Multilingual Neural Machine Translation to Unseen Languages},
author={Lakew, Surafel M and Karakanta, Alina and Federico, Marcello and Negri, Matteo and Turchi, Marco},
journal={arXiv preprint arXiv:1910.13998},
year={2019}
}