Skip to content

nikodallanoce/NeuralMachineTranslation

Repository files navigation

HLT

Human Language Technologies project for the a.y. 2020/2021.

Neural Machine Translation task

English-Italian

Europarl Corpus or the english-italian dataset from http://www.manythings.org/anki/

Sources used

Benchmark

The baselines for confronting the results of our models were chosen from:

Tokenizers and models

We used https://huggingface.co/dbmdz/bert-base-italian-cased as the italian tokenizer for each of our models, for the source language we used the correct tokenizer for each encoder.

Masked language encoders:

Neural machine translation encoders: