RMLNMT

code implements for paper Improving Both Domain Robustness and Domain Adaptability in Machine Translation(COLING 2022), the code is based on public code: fairseq, we provide the implement of different classifier, and word-level domain mixing.

Requirements

Fairseq (v0.6.0)
Pytorch
all requirements are shown in requirements.txt, you can install using pip install -r requirements.txt

Pipeline

To reproduce the results of our experiments, please clean your OPUS corpus first, especially de-duplicate the corpus (see more details in Appendix of the paper).

Train a domain classifier based on BERT/ CNN etc in domain_classification/Bert_classfier.py or domain_classification/main.py

Score the sentence to represent the domain similarity with general domains:

python meta_score_prepare.py \
--num_labels 11 \
--device_id 7 \
--model_name bert-base-uncased \
--input_path $YOUR_INPUT_PATH \
--cls_data $YOUR_CLASSIFICATION_PATH \
--out_data $YOUR_OUTPUT_PATH \
--script_path $SCRIPT_PATH

Run baseline systems using fairseq, Meta-MT and Meta-curriculum.

code related to the word-lvel domain mixing is in word_moudles, and please use the following command to reproduce the results in our paper:

python -u $code_dir/meta_ws_adapt_training.py $DARA_DIR \
    --train-subset meta-train-spm $META_DEV \
    --damethod bayesian \
    --arch transformer_da_bayes_iwslt_de_en \
    --criterion $CRITERION $BASELINE \
    --domains $DOMAINS --max-tokens 1 \
    --user-dir $user_dir \
    --domain-nums 5 \
    --translation-task en2de \
    --source-lang en --target-lang de \
    --is-curriculum --split-by-cl --distributed-world-size $GPUS \
    --required-batch-size-multiple 1 \
    --tensorboard-logdir $TF_BOARD \
    --optimizer $OPTIMIZER --lr $META_LR $DO_SAVE \
    --save-dir $PT_OUTPUT_DIR --save-interval-updates $SAVEINTERVALUPDATES \
    --max-epoch 20 \
    --skip-invalid-size-inputs-valid-test \
    --flush-secs 1 --train-percentage 0.99 --restore-file $PRE_TRAIN --log-format json \
    --- --task word_adapt_new --is-curriculum \
    --train-subset support --test-subset query --valid-subset dev_sub \
    --max-tokens 2000 --skip-invalid-size-inputs-valid-test \
    --update-freq 10000 \
    --domain-nums 5 \
    --translation-task en2de \
    --distributed-world-size 1 --max-epoch 1 --optimizer adam \
    --damethod bayesian --criterion cross_entropy_da \
    --lr 5e-05 --lr-scheduler inverse_sqrt --no-save \
    --support-tokens 8000 --query-tokens 16000 \
    --source-lang en --label-smoothing 0.1 \
    --adam-betas '(0.9, 0.98)' --warmup-updates 4000 \
    --warmup-init-lr '1e-07' --weight-decay 0.0001 \
    --target-lang de \
    --user-dir $user_dir

If you find our paper useful, please kindly cite our paper. Thanks!

@inproceedings{lai-etal-2022-improving-domain,
    title = "Improving Both Domain Robustness and Domain Adaptability in Machine Translation",
    author = "Lai, Wen  and
      Libovick{\'y}, Jind{\v{r}}ich  and
      Fraser, Alexander",
    booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
    month = oct,
    year = "2022",
    address = "Gyeongju, Republic of Korea",
    publisher = "International Committee on Computational Linguistics",
    url = "https://aclanthology.org/2022.coling-1.461",
    pages = "5191--5204",
}

Contact

If you have any questions about our paper, please feel convenient to let me know through email: lavine@cis.lmu.de

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
domain_classification		domain_classification
fairseq		fairseq
fairseq_cli		fairseq_cli
meta-ws-adapt		meta-ws-adapt
word_moudles		word_moudles
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RMLNMT

Contact

About

Releases

Packages

Languages

lavine-lmu/RMLNMT

Folders and files

Latest commit

History

Repository files navigation

RMLNMT

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages