<a href="https://colab.research.google.com/github/fpgmina/DeepNLP/blob/main/Practice_5_Part_1_Machine_Translation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Deep Natural Language Processing @ PoliTO**


---
**Teaching Assistant:** Ali Yassine

**Credits:** Moreno La Quatra

**Practice 5:** Machine Translation - Part 1

## **Machine Translation**

Machine Translation is a sub-field of Natural Language Processing that aims at translating a text from a source language to a target language. In this practice, we will experiment with a Transformer-based model for Machine Translation. Specifically, we will benchmark the performance of a pre-trained MT model on Italian-English and English-Italian translation tasks.

![](https://www.deepl.com/img/press/desktop_ENIT_2020-01.png)

In this practice we will use a data collection provided by [tatoeba](https://tatoeba.org/). The following cell download a subset of the data collection, containing parallel Italian-English sentences.


In [None]:
%%capture
!wget https://raw.githubusercontent.com/MorenoLaQuatra/DeepNLP/main/practices/P6/train_it_en.tsv
!wget https://raw.githubusercontent.com/MorenoLaQuatra/DeepNLP/main/practices/P6/test_it_en.tsv

### **Question 1: Parsing data**

The first step is to parse the data collection to generate a list of sentence pairs. The data are provided in `tsv` format, where each line contains a sentence pair in the following format:

`<source_language_sentence>\t<target_language_sentence>\n`

You are provided with a training and a test set. For this question you should parse both data splits and store them in your preferred data structure.

**Note:** store train and test set into separate data objects.

In [None]:
# your code here

### **Question 2: Pre-trained MT models**

Pre-trained MT models are released to the public to allow researchers to experiment with them. In this question you will load a pre-trained MT model and use it to translate sentences from Italian to English and vice-versa.

[EasyNMT](https://github.com/UKPLab/EasyNMT) is a Python library that provides an easy-to-use interface to pre-trained MT models. It provides a simple wrapper over HuggingFace transformers library for machine translation. In this question you will use EasyNMT to load a pre-trained MT model and translate sentences from Italian to English and vice-versa:

- Load the pre-trained model for a specific direction (e.g., Italian-English or English-Italian)
- Translate all the sentences in the test set from the source language to the target language.


**Note 1**: the choice for the MT model is up to you.

**Note 2**: store the translated sentences in both directions using the data structure of your choice.

In [None]:
%%capture
!pip install easynmt sacremoses

In [None]:
# your code here

### **Question 3: BLEU and METEOR scores**

In this question you will evaluate the performance of your machine translation (MT) model using **two** evaluation metrics: **[BLEU evaluation metric](https://github.com/mjpost/sacrebleu)** and **[METEOR evaluation metric](https://huggingface.co/spaces/evaluate-metric/meteor)**. You **must** compute and report scores for both translation directions: `EN→IT` and `IT→EN`.

---

#### BLEU (Bilingual Evaluation Understudy)

**BLEU** measures how much the model’s translation overlaps with a reference translation by comparing shared **n-grams** (word sequences). It gives a precision-oriented score that rewards exact word matches.

- **Pros:** Fast, standardized, and good for large-scale comparisons.
- **Cons:** Only captures exact matches, ignoring synonyms or paraphrases; may not always align with human judgment.

> Use BLEU as implemented in `sacrebleu`.


---

#### METEOR (Metric for Evaluation of Translation with Explicit ORdering)

**METEOR** was developed to better reflect human judgment by allowing more flexible word matching. It aligns hypothesis and reference words using:

- **Exact matches**
- **Stem matches** (e.g., *run* ↔ *running*)
- **Synonyms and paraphrases**

It then combines these matches into a single score with penalties for disordered or fragmented output.

- **Pros:** More linguistically aware; correlates better with human evaluations.
- **Cons:** Slower to compute; depends on external lexical resources.

> Compute METEOR using `evaluate` or `nltk`.

---

The following cell installs the `sacrebleu`, `nltk`, and `evaluate` libraries that can be used to compute these metrics.

In [None]:
%%capture
!pip install sacrebleu evaluate nltk

In [None]:
# your code here

### **Question 4: Comparison with another Pre-trained MT Model**

In this question, you will experiment with another pre-trained MT model and compare its performance with the model used in Question 2.

Follow the same translation procedure as before (EN→IT and IT→EN) and evaluate the results using the BLEU and METEOR metrics from Question 3.

Use [EasyNMT](https://github.com/UKPLab/EasyNMT) to load and run the pre-trained model.


In [None]:
# your code here