This code can create a new training-test split of your data given an dictionary in order to evaluate the ability of the model to translation rare words from a dictionary.
There are push-button scripts for the following copora available:
- TED English-German
- Europarl English-German
- Europarl English-Czech
The code can be adapted easily for other copora
The approaches was presented in the paper:
Niehues, J. (2021). Continuous Learning in Neural Machine Translation using Bilingual Dictionaries. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021). Kiew, Ukraine.