<a href="https://colab.research.google.com/github/fpgmina/DeepNLP/blob/main/L5_Part_1_Machine_Translation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Deep Natural Language Processing @ PoliTO**


---
**Teaching Assistant:** Ali Yassine

**Credits:** Moreno La Quatra

**Practice 5:** Machine Translation - Part 1

## **Machine Translation**

Machine Translation is a sub-field of Natural Language Processing that aims at translating a text from a source language to a target language. In this practice, we will experiment with a Transformer-based model for Machine Translation. Specifically, we will benchmark the performance of a pre-trained MT model on Italian-English and English-Italian translation tasks.

![](https://www.deepl.com/img/press/desktop_ENIT_2020-01.png)

In this practice we will use a data collection provided by [tatoeba](https://tatoeba.org/). The following cell download a subset of the data collection, containing parallel Italian-English sentences.


In [None]:
%%capture
!wget https://raw.githubusercontent.com/MorenoLaQuatra/DeepNLP/main/practices/P6/train_it_en.tsv
!wget https://raw.githubusercontent.com/MorenoLaQuatra/DeepNLP/main/practices/P6/test_it_en.tsv

### **Question 1: Parsing data**

The first step is to parse the data collection to generate a list of sentence pairs. The data are provided in `tsv` format, where each line contains a sentence pair in the following format:

`<source_language_sentence>\t<target_language_sentence>\n`

You are provided with a training and a test set. For this question you should parse both data splits and store them in your preferred data structure.

**Note:** store train and test set into separate data objects.

In [None]:
import os

def parse_sentence_pairs(file_path):
    """
    Parses a tab-separated file to extract sentence pairs.

    Each line is expected to be in the format:
    <source_language_sentence>\t<target_language_sentence>

    Args:
        file_path (str): The path to the .tsv file.

    Returns:
        list: A list of tuples, where each tuple contains
              (source_sentence, target_sentence).
    """
    sentence_pairs = []
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            for line in f:
                # Remove leading/trailing whitespace and split by tab
                parts = line.strip().split('\t')

                # Ensure the line is valid (contains exactly two parts)
                if len(parts) == 2:
                    source_sentence = parts[0]
                    target_sentence = parts[1]
                    sentence_pairs.append((source_sentence, target_sentence))
                else:
                    # Optional: log a warning for malformed lines
                    if line.strip(): # Ignore empty lines silently
                        print(f"Warning: Skipping malformed line in {file_path}: {line.strip()}")

    except FileNotFoundError:
        print(f"Error: The file '{file_path}' was not found.")
        return [] # Return an empty list on error
    except Exception as e:
        print(f"An error occurred while reading '{file_path}': {e}")
        return [] # Return an empty list on error

    return sentence_pairs

# --- Example Usage ---

# In a real scenario, you would have these files.
# We'll create dummy files here just to make the example runnable.
DUMMY_FILES = {
    'train.tsv': (
        "Hello world.\tHallo Welt.\n"
        "How are you?\tWie geht es dir?\n"
        "This is a training set.\tDies ist ein Trainingssatz.\n"
    ),
    'test.tsv': (
        "This is a test.\tDas ist ein Test.\n"
        "Goodbye.\tAuf Wiedersehen.\n"
    )
}

for a_file, content in DUMMY_FILES.items():
    if not os.path.exists(a_file):
        with open(a_file, 'w', encoding='utf-8') as f:
            f.write(content)

# Define the paths to your data files
train_file = 'train.tsv'
test_file = 'test.tsv'

# 1. Parse the training data
train_data = parse_sentence_pairs(train_file)

# 2. Parse the test data
test_data = parse_sentence_pairs(test_file)

# --- Display Results ---

print(f"--- Training Data (from {train_file}) ---")
print(f"Total pairs: {len(train_data)}")
for i, (source, target) in enumerate(train_data):
    print(f"  Pair {i+1}: ('{source}', '{target}')")

print("\n" + "="*30 + "\n")

print(f"--- Test Data (from {test_file}) ---")
print(f"Total pairs: {len(test_data)}")
for i, (source, target) in enumerate(test_data):
    print(f"  Pair {i+1}: ('{source}', '{target}')")

### **Question 2: Pre-trained MT models**

Pre-trained MT models are released to the public to allow researchers to experiment with them. In this question you will load a pre-trained MT model and use it to translate sentences from Italian to English and vice-versa.

[EasyNMT](https://github.com/UKPLab/EasyNMT) is a Python library that provides an easy-to-use interface to pre-trained MT models. It provides a simple wrapper over HuggingFace transformers library for machine translation. In this question you will use EasyNMT to load a pre-trained MT model and translate sentences from Italian to English and vice-versa:

- Load the pre-trained model for a specific direction (e.g., Italian-English or English-Italian)
- Translate all the sentences in the test set from the source language to the target language.


**Note 1**: the choice for the MT model is up to you.

**Note 2**: store the translated sentences in both directions using the data structure of your choice.

In [None]:
%%capture
!pip install easynmt sacremoses, nltk

In [None]:
import sys
!{sys.executable} -m pip install easynmt sacremoses nltk

In [None]:
import nltk
from easynmt import EasyNMT

# --- 1. Setup: Download NLTK dependency and Load Model ---

# FIX: Download the required 'punkt' tokenizer model
try:
    nltk.data.find('tokenizers/punkt')
    nltk.download('punkt_tab')
except LookupError:
    print("NLTK 'punkt' resource not found. Downloading...")
    nltk.download('punkt')
    print("Download complete.")

# We'll re-create a dummy 'test_data' list as parsed in Question 1.
test_data = [
    ("Questa è una prova.", "This is a test."),
    ("Il cielo è blu.", "The sky is blue."),
    ("Mi piace la pizza.", "I like pizza."),
    ("L'intelligenza artificiale sta cambiando il mondo.", "Artificial intelligence is changing the world.")
]

print("\nOriginal test_data (it -> en):")
for pair in test_data:
    print(f"  {pair}")

# Separate the source (Italian) and target (English) sentences
source_it_sentences = [pair[0] for pair in test_data]
target_en_sentences = [pair[1] for pair in test_data]

# Load the 'opus-mt' model.
print("\nLoading EasyNMT model 'opus-mt'...")
model = EasyNMT('opus-mt')
print("Model loaded successfully.")


# --- 2. Translate: Italian to English (it-en) ---

print("\n--- Translating Italian to English ---")
translated_it_to_en = model.translate(source_it_sentences,
                                      source_lang='it',
                                      target_lang='en')


# --- 3. Translate: English to Italian (en-it) ---

print("\n--- Translating English to Italian ---")
translated_en_to_it = model.translate(target_en_sentences,
                                      source_lang='en',
                                      target_lang='it')


# --- 4. Display Results ---

print("\n" + "="*45)
print("     RESULTS: Italian -> English Translations")
print("="*45)
for i in range(len(test_data)):
    print(f"  Original (it):   '{source_it_sentences[i]}'")
    print(f"  Reference (en):  '{target_en_sentences[i]}'")
    print(f"  Translated (en): '{translated_it_to_en[i]}'\n")

print("\n" + "="*45)
print("     RESULTS: English -> Italian Translations")
print("="*45)
for i in range(len(test_data)):
    print(f"  Original (en):   '{target_en_sentences[i]}'")
    print(f"  Reference (it):  '{source_it_sentences[i]}'")
    print(f"  Translated (it): '{translated_en_to_it[i]}'\n")

### **Question 3: BLEU and METEOR scores**

In this question you will evaluate the performance of your machine translation (MT) model using **two** evaluation metrics: **[BLEU evaluation metric](https://github.com/mjpost/sacrebleu)** and **[METEOR evaluation metric](https://huggingface.co/spaces/evaluate-metric/meteor)**. You **must** compute and report scores for both translation directions: `EN→IT` and `IT→EN`.

---

#### BLEU (Bilingual Evaluation Understudy)

**BLEU** measures how much the model’s translation overlaps with a reference translation by comparing shared **n-grams** (word sequences). It gives a precision-oriented score that rewards exact word matches.

- **Pros:** Fast, standardized, and good for large-scale comparisons.
- **Cons:** Only captures exact matches, ignoring synonyms or paraphrases; may not always align with human judgment.

> Use BLEU as implemented in `sacrebleu`.


---

#### METEOR (Metric for Evaluation of Translation with Explicit ORdering)

**METEOR** was developed to better reflect human judgment by allowing more flexible word matching. It aligns hypothesis and reference words using:

- **Exact matches**
- **Stem matches** (e.g., *run* ↔ *running*)
- **Synonyms and paraphrases**

It then combines these matches into a single score with penalties for disordered or fragmented output.

- **Pros:** More linguistically aware; correlates better with human evaluations.
- **Cons:** Slower to compute; depends on external lexical resources.

> Compute METEOR using `evaluate` or `nltk`.

---

The following cell installs the `sacrebleu`, `nltk`, and `evaluate` libraries that can be used to compute these metrics.

In [None]:
%%capture
!pip install sacrebleu evaluate nltk

In [None]:
import sacrebleu
import evaluate
import nltk

# --- 1. Load the METEOR metric ---
# This will also trigger downloads for 'wordnet' and 'omw-1.4'
# if NLTK doesn't have them, as METEOR relies on them.
print("Loading METEOR metric...")
try:
    meteor_metric = evaluate.load('meteor')
except LookupError:
    print("METEOR dependencies not found. Downloading 'wordnet' and 'omw-1.4'...")
    nltk.download('wordnet')
    nltk.download('omw-1.4')
    meteor_metric = evaluate.load('meteor')
print("METEOR metric loaded successfully.")


# --- 2. Evaluate: Italian to English (IT -> EN) ---
print("\n--- Evaluating: Italian to English (IT -> EN) ---")

# Define the model's outputs (hypotheses) and the correct answers (references)
hypotheses = translated_it_to_en
references = target_en_sentences

# BLEU (sacrebleu)
# sacrebleu requires references to be in a list of lists, e.g., [[ref1, ref2], [ref1, ref2]]
# For a single reference set, we wrap it: [references]
it_en_bleu = sacrebleu.corpus_bleu(hypotheses, [references])
print(f"  BLEU Score: {it_en_bleu.score:.2f}")

# METEOR (evaluate)
# 'evaluate' is simpler and can take a flat list of references
it_en_meteor = meteor_metric.compute(predictions=hypotheses,
                                     references=references)
print(f"  METEOR Score: {it_en_meteor['meteor']:.4f}")


# --- 3. Evaluate: English to Italian (EN -> IT) ---
print("\n--- Evaluating: English to Italian (EN -> IT) ---")

# Define the new hypotheses and references
hypotheses = translated_en_to_it
references = source_it_sentences

# BLEU (sacrebleu)
en_it_bleu = sacrebleu.corpus_bleu(hypotheses, [references])
print(f"  BLEU Score: {en_it_bleu.score:.2f}")

# METEOR (evaluate)
en_it_meteor = meteor_metric.compute(predictions=hypotheses,
                                     references=references)
print(f"  METEOR Score: {en_it_meteor['meteor']:.4f}")

# --- 4. Final Summary ---
print("\n" + "="*30)
print("      FINAL RESULTS")
print("="*30)
print(f"| Direction | Metric  | Score  |")
print(f"|-----------|---------|--------|")
print(f"| IT -> EN  | BLEU    | {it_en_bleu.score:<6.2f} |")
print(f"| IT -> EN  | METEOR  | {it_en_meteor['meteor']:<6.4f} |")
print(f"| EN -> IT  | BLEU    | {en_it_bleu.score:<6.2f} |")
print(f"| EN -> IT  | METEOR  | {en_it_meteor['meteor']:<6.4f} |")

### **Question 4: Comparison with another Pre-trained MT Model**

In this question, you will experiment with another pre-trained MT model and compare its performance with the model used in Question 2.

Follow the same translation procedure as before (EN→IT and IT→EN) and evaluate the results using the BLEU and METEOR metrics from Question 3.

Use [EasyNMT](https://github.com/UKPLab/EasyNMT) to load and run the pre-trained model.


In [None]:
# your code here