## English to Italian automatic translation

Automatic language translation is often regarded to as the most typical sequence-to-sequence problem. Traditional approaches based on explicitly modeling languages have been proven difficult. In the last decade, deep learning demonstrated to be a more than viable solution to this problem.

Deep learning solutions only requires a large bilingual corpus, and computational resources. Encoder-decoder architectures are the most widely used. They can be implemented with recurrent networks (LSTM and the like) or transformers (which are the state-of-the-art for this problem).

In this lab activity we will build a simple English-to-Italian translation system based on a pair of LSTM networks working together in a encoder-decoder architecture.

## Data

We will use a subset of the English-Italian bilingual dataset from the [Tatoeba Project](https://www.manythings.org/anki/).

There are two files, `text-eng.txt` and `text-ita.txt`, containing 333112 lines, each one reporting one sentenced in English or Italian. Sentences are paired so that the i-th sentence in the English file has a corresponding translation in the i-th sentence in the Italian file.

Each sentence has been already converted to lowercase, rewritten as space-separated tokens (words and punctuation symbols). Each sentence starts with the special `<sos>` token and is terminated by the `<eos>` token. The longest sequences are 20 tokens long.

For instance, this is one example from the English file:

`<sos> do you want me to make coffee ? <eos>`

and this is the corresponding translation in the Italian file:

`<sos> vuoi che prepari del caffè ? <eos>`


In [17]:
# Download the files using gdown
import os
import subprocess

# Check if files already exist
if not os.path.exists("text-eng.txt") or not os.path.exists("text-ita.txt"):
    print("Downloading files...")
    
    # Option 1: Use gdown directly in Python (recommended for venv)
    try:
        import gdown
        url = "https://drive.google.com/file/d/1_npGYZk13fs5hE0kAggiSrmKkqW3OrLT/view?usp=sharing"
        output = gdown.download(url, fuzzy=True)
        
        # Extract the downloaded tar.gz file
        import tarfile
        with tarfile.open(output, 'r:gz') as tar:
            tar.extractall()
        
        # Remove the tar file
        os.remove(output)
        print("Files downloaded and extracted successfully!")
        
    except ImportError:
        print("gdown not installed. Installing...")
        !pip install gdown
        import gdown
        url = "https://drive.google.com/file/d/1_npGYZk13fs5hE0kAggiSrmKkqW3OrLT/view?usp=sharing"
        output = gdown.download(url, fuzzy=True)
        
        import tarfile
        with tarfile.open(output, 'r:gz') as tar:
            tar.extractall()
        os.remove(output)
        print("Files downloaded and extracted successfully!")
        
    except Exception as e:
        print(f"Error downloading with gdown: {e}")
        print("Trying alternative method...")
        
        # Option 2: Use subprocess with explicit Python path (fallback)
        try:
            result = subprocess.run([
                "/home/naoya/pv2/deeplearning/jpn2ita/.venv/bin/python", 
                "-m", "gdown", 
                "--fuzzy", 
                "https://drive.google.com/file/d/1_npGYZk13fs5hE0kAggiSrmKkqW3OrLT/view?usp=sharing"
            ], capture_output=True, text=True, check=True)
            
            # Extract using tar command
            subprocess.run(["tar", "-xzf", "*.tar.gz"], shell=True, check=True)
            print("Files downloaded and extracted successfully with subprocess!")
            
        except subprocess.CalledProcessError as e:
            print(f"Subprocess error: {e}")
            print("Please download files manually from the provided URL")
            
else:
    print("Files already exist!")

# Verify files exist
if os.path.exists("text-eng.txt") and os.path.exists("text-ita.txt"):
    with open("text-eng.txt", 'r') as f:
        eng_lines = len(f.readlines())
    with open("text-ita.txt", 'r') as f:
        ita_lines = len(f.readlines())
    
    print(f"text-eng.txt: {eng_lines:,} lines")
    print(f"text-ita.txt: {ita_lines:,} lines")
else:
    print("Files not found. Please check the download process.")

Downloading files...


Downloading...
From: https://drive.google.com/uc?id=1_npGYZk13fs5hE0kAggiSrmKkqW3OrLT
To: /home/naoya/pv2/deeplearning/jpn2ita/eng-ita.tar.gz
100%|██████████| 3.92M/3.92M [00:05<00:00, 748kB/s]


Files downloaded and extracted successfully!
text-eng.txt: 333,112 lines
text-ita.txt: 333,112 lines


### Vocabularies

First, we need to build separate vocabularies for English and Italian.
For each language we need to find the list of unique tokens, and an inverse  mapping between tokens and their index in the list.

We need to include in the vocabularies also the special tokens `<sos>`, `<eos>` and `<pad>` (that we will need later, and is not in the dataset). It's better if we can manage to have these three tokens in the same position (index) of both vocabularies.

For making the list of VOCABULARY, I used `set`, which enables to add unique tokens. 

In this dataset, there's no UPPERCASE. Hence, we don't need to have the process of convert them into lower case in this process.

In [None]:
SPECIAL = ["<sos>", "<eos>", "<pad>","<unk>"]  # Added "<unk>" for unknown tokens
MAXLEN = 50

# English vocabulary creation with unique tokens
f = open("text-eng.txt")
eng_tokens_set = set(SPECIAL)  # Start with special tokens
for line in f:
    line = line.strip() # Remove leading/trailing whitespace
    if line and len(line.split()) <= MAXLEN: # Check if the line is not empty and does not exceed MAXLEN
        tokens = line.split() # Split the line into tokens
        eng_tokens_set.update(tokens) # Add tokens to the set (automatically handles duplicates)

f.close()

# Convert set to list for vocabulary
ENG_VOCABULARY = list(eng_tokens_set)

# Italian vocabulary creation with unique tokens
f = open("text-ita.txt")
ita_tokens_set = set(SPECIAL)  # Start with special tokens
for line in f:
    line = line.strip()  # Remove leading/trailing whitespace
    if line and len(line.split()) <= MAXLEN:
        tokens = line.split()
        ita_tokens_set.update(tokens) # Add tokens to the set (automatically handles duplicates)

f.close()

# Convert set to list for vocabulary
ITA_VOCABULARY = list(ita_tokens_set)

# Make sure that the three special tokens have the same indices in the two vocabularies.
# Sort vocabularies to ensure consistent ordering, with special tokens first
ENG_VOCABULARY = SPECIAL + sorted([token for token in ENG_VOCABULARY if token not in SPECIAL])
ITA_VOCABULARY = SPECIAL + sorted([token for token in ITA_VOCABULARY if token not in SPECIAL])

# Assign the three indices for special tokens
SOS = 0  # Index of "<sos>"
EOS = 1  # Index of "<eos>"  
PAD = 2  # Index of "<pad>"
UNK = 3  # Index of "<unk>" (unknown token)

# Inverse mappings.
ENG_INVERSE = {w: n for n, w in enumerate(ENG_VOCABULARY)}
ITA_INVERSE = {w: n for n, w in enumerate(ITA_VOCABULARY)}

print(f"English vocabulary size: {len(ENG_VOCABULARY)}")
print(f"Italian vocabulary size: {len(ITA_VOCABULARY)}")
print(f"Special token indices - SOS: {SOS}, EOS: {EOS}, PAD: {PAD}, UNK: {UNK}")
print(f"First 10 English tokens: {ENG_VOCABULARY[:100]}")
print(f"First 10 Italian tokens: {ITA_VOCABULARY[:100]}")
print(f"ENG_INVERSE: {list(ENG_INVERSE.items())[:100]}")
print(f"ITA_INVERSE: {list(ITA_INVERSE.items())[:100]}")


English vocabulary size: 7583
Italian vocabulary size: 9963
Special token indices - SOS: 0, EOS: 1, PAD: 2
First 10 English tokens: ['<sos>', '<eos>', '<pad>', '!', '"', '$', '%', ',', '.', '00', '000', '1', '10', '100', '11', '110', '119', '12', '13', '13-year-old', '15', '18', '1939', '1941', '1945', '1950s', '1960', '1969', '1980', '2', '20', '200', '2003', '2013', '20th', '22', '24', '25', '3', '30', '300', '4', '40', '5', '50', '500', '5th', '6', '60', '7', '70', '8', '80', '9', '90', '911', ':', '?', 'a', "a's", 'abacus', 'abandon', 'abandoned', 'abducted', 'abilities', 'ability', 'able', 'aboard', 'about', 'above', 'abroad', 'abrupt', 'absence', 'absent', 'absent-minded', 'absolute', 'absolutely', 'absorb', 'absorbed', 'absorbs', 'absurd', 'abundant', 'abused', 'abusive', 'accent', 'accept', 'acceptable', 'accepted', 'accepting', 'accepts', 'access', 'accident', 'accidentally', 'accidents', 'accompanied', 'accomplice', 'accomplish', 'accomplished', 'accomplishment', 'according']

In [32]:
# usage example
print(ENG_VOCABULARY[100]) # Should print "<sos>"
print(ENG_INVERSE["italy"])  # Should print 0

account
3571


### Encoding/decoding functions

We need now functions to map strings with sentences into lists of numerical indices, and vice-versa. Thse functions will take as arguments also the vocabularies, or thweir inverses, so that we can use them for both English and Italian.

Having all sequences of the same length simplify training.
For this reason, the `encode_sentence` should add padding to make sure that the list of codes include exactly `MAXLEN` elements.  

In [35]:
def encode_sentence(sentence, inverse):
    """Translate the sentence as a list of numerical codes, given the inverse mapping."""
    inverse_sentence = []  # goal: append numerical codes for each word
    words = sentence.strip().split()  # Split the sentence into words and remove whitespace
    
    # Convert words to indices
    for word in words:
        if word in inverse:
            inverse_sentence.append(inverse[word])
        else:
            inverse_sentence.append(UNK)
            print(f"Warning: Unknown word '{word}' skipped")  # Handle unknown words
            continue
    
    # Add padding to make all sequences the same length (MAXLEN)
    while len(inverse_sentence) < MAXLEN:
        inverse_sentence.append(PAD)  # PAD = 2
    
    # Truncate if too long
    if len(inverse_sentence) > MAXLEN:
        inverse_sentence = inverse_sentence[:MAXLEN]
    
    return inverse_sentence



def decode_sentence(codes, voc):
    """Translate a list of numerical codes into a sentence, given the mapping."""
    sentence = []
    for code in codes:
        if code == PAD:  # Stop at padding
            break
        sentence.append(voc[code])
    return " ".join(sentence)


# Test the functions
eng = "<sos> do you want me to make coffee ? <eos>"
codes = encode_sentence(eng, ENG_INVERSE)
print(f"Encoded length: {len(codes)}")
print(f"Codes: {codes}")
print(f"Decoded: {decode_sentence(codes, ENG_VOCABULARY)}")

ita = "<sos> vuoi che prepari del caffè ? <eos>"
codes = encode_sentence(ita, ITA_INVERSE)
print(f"Encoded length: {len(codes)}")
print(f"Codes: {codes}")
print(f"Decoded: {decode_sentence(codes, ITA_VOCABULARY)}")

Encoded length: 50
Codes: [0, 2004, 7560, 7271, 4082, 6805, 4016, 1322, 57, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
Decoded: <sos> do you want me to make coffee ? <eos>
Encoded length: 50
Codes: [0, 9917, 1648, 6772, 2579, 1345, 51, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
Decoded: <sos> vuoi che prepari del caffè ? <eos>


### Dataset and data loader

All the data will be loaded into memory. The `torch.utils.data.TensorDataset` will make the data accessible to the data loader.

In [36]:
import torch


with open("text-eng.txt") as f:
    eng_sentences = [encode_sentence(line, ENG_INVERSE) for line in f]

with open("text-ita.txt") as f:
    ita_sentences = [encode_sentence(line, ITA_INVERSE) for line in f]

train_set = torch.utils.data.TensorDataset(torch.tensor(eng_sentences), torch.tensor(ita_sentences))
train_loader = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True, drop_last=True)

eng, ita = next(iter(train_loader))
print(eng.shape, eng.dtype, ita.shape, ita.dtype)

print(decode_sentence(eng[0], ENG_VOCABULARY))
print(decode_sentence(ita[0], ITA_VOCABULARY))

torch.Size([64, 50]) torch.int64 torch.Size([64, 50]) torch.int64
<sos> i thought that tom would say that . <eos>
<sos> pensavo che tom l'avrebbe detto . <eos>


## Model

We will use an encoder-decoder architecture (picture from "Dive into deep learning").

![link text](https://d2l.ai/_images/seq2seq.svg)

The encoder will read the English sentence and encode it into a vector of features (we will use both the final hidden state and cell state).

The decoder will output Italian tokens, given the previous one.
The encoded input is passed to the decoder as initial state and as additional input at each step.

In [37]:
DIM = 256
DROPOUT = 0.2
LAYERS = 2

encoder = torch.nn.Sequential(
    torch.nn.Embedding(len(ENG_VOCABULARY), DIM),
    torch.nn.LSTM(DIM, DIM, batch_first=True, dropout=DROPOUT, num_layers=LAYERS)
)

class Decoder(torch.nn.Module):
    def __init__(self, embedding_size, hidden_size):
        super().__init__()
        self.embedding = torch.nn.Embedding(len(ITA_VOCABULARY), embedding_size)
        self.cell_linear = torch.nn.Linear(hidden_size, embedding_size)
        self.lstm = torch.nn.LSTM(embedding_size, hidden_size, batch_first=True, dropout=DROPOUT, num_layers=LAYERS)
        self.linear = torch.nn.Linear(hidden_size, len(ITA_VOCABULARY))

    def forward(self, input, hidden):
        cell_state = hidden[1][-1]
        output = self.embedding(input)
        y = self.cell_linear(cell_state).unsqueeze(1)
        output = output + y
        output, _ = self.lstm(output, hidden)
        output = self.linear(output)
        return output


decoder = Decoder(DIM, DIM)

input1 = torch.zeros(7, 22, dtype=torch.long)
_, hidden = encoder(input1)
print(input1.shape, "->", hidden[0].shape, hidden[1].shape)

input2 = torch.zeros(7, 22, dtype=torch.long)
output = decoder(input2, hidden)
print(input2.shape, "->", output.shape)

torch.Size([7, 22]) -> torch.Size([2, 7, 256]) torch.Size([2, 7, 256])
torch.Size([7, 22]) -> torch.Size([7, 22, 9963])


## Training

During training the cross entropy is minimized.
Each output from the decoder is compared to the next token in the output sequence.

Padding should be ignored during training. The `torch.nn.CrossEntropyLoss` has an optional argument for this.

In [38]:
EPOCHS = 10
LEARNING_RATE = 0.001
DEVICE = ("cuda" if torch.cuda.is_available() else "cpu")

encoder.to(DEVICE)
decoder.to(DEVICE)

optimizer = torch.optim.Adam(list(encoder.parameters()) + list(decoder.parameters()), lr=LEARNING_RATE)
loss_fun = torch.nn.CrossEntropyLoss(ignore_index=PAD)

In [None]:
encoder.train()
decoder.train()

steps = 0
for epoch in range(EPOCHS):
    for lq, sq in train_loader:
        lq = lq.to(DEVICE)
        sq = sq.to(DEVICE)
        _, hidden = encoder(lq)
        output = decoder(sq[:, :-1], hidden)
        loss = loss_fun(output.permute(0, 2, 1), sq[:, 1:])
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        steps += 1
        if steps % 1000 == 0:
            predictions = output.argmax(2)
            correct = (predictions == sq[:, 1:]).sum().item()
            total = (sq[:, 1:] != PAD).sum().item()
            accuracy = 100 * correct / max(total, 1)
            print(f"{steps} [{epoch}]  Loss: {loss.item():.4f}  Acc: {accuracy:.1f}%")
            print(decode_sentence(lq[0], ENG_VOCABULARY))
            print(decode_sentence(sq[0], ITA_VOCABULARY))
            print(decode_sentence(predictions[0], ITA_VOCABULARY))
            print()

## Using the model

To translate a new sentence, you need to:

1. encode the input sentence;
2. initialize the output sentence with the `<sos>` token;
3. pass the current output into the decoder together with the encoder state;
4. take the output token with the highest score, and add it to the current output.
5. repeat from step 3 until the `<eos>` token is generated.

Implement this algorithm and use it to translate some English sentence.

In [None]:
encoder.eval()
decoder.eval()

def translate_sentence(sentence, max_length=MAXLEN):
    """LSTM Encoder-Decoderを使って英語の文をイタリア語に翻訳"""
    with torch.no_grad():
        # 1. 入力文をエンコード
        input_tensor = torch.tensor([encode_sentence(sentence, ENG_INVERSE)], device=DEVICE)
        _, hidden = encoder(input_tensor)
        
        # 2. 出力文を<sos>で初期化
        output = torch.zeros(1, max_length, dtype=torch.long, device=DEVICE)
        output[0, 0] = SOS
        
        # 3. 各ステップで次のトークンを予測
        for i in range(1, max_length):
            decoder_output = decoder(output[:, :i], hidden)
            next_token = decoder_output[0, -1].argmax().item()
            output[0, i] = next_token
            
            # <eos>トークンが生成されたら終了
            if next_token == EOS:

                break
        
        # 4. 結果をデコード
        return decode_sentence(output[0], ITA_VOCABULARY)

# テスト用の英語文
test_sentences = [
    "<sos> how old are you ? <eos>",
    "<sos> i like to play tennis . <eos>",
    "<sos> i hope it snows at christmas . <eos>",
    "<sos> would you like to go to the movie theater . <eos>"
]

print("=== LSTM Translation Results ===")
for eng in test_sentences:
    ita = translate_sentence(eng)
    print(f"EN: {eng}")
    print(f"IT: {ita}")
    print()

# LSTM vs BERT Performance Comparison

この翻訳タスクにおいて、LSTM Encoder-DecoderとBERTベースのモデルの性能を比較します。

## 比較の観点

1. **翻訳品質** - BLEU Score, 人間評価
2. **学習効率** - 収束速度, 必要エポック数
3. **計算効率** - 推論時間, メモリ使用量
4. **モデルサイズ** - パラメータ数

## 実装方針

### LSTM Encoder-Decoder (上記で実装済み)
- ✅ カスタム語彙ベース
- ✅ Sequence-to-Sequence アーキテクチャ
- ✅ 軽量で高速

### BERT-based Translation
- 🔄 事前学習済みBERTを利用
- 🔄 mBERT (Multilingual BERT) または専用モデル
- 🔄 Transformerアーキテクチャ

In [None]:
# BERT Implementation Setup
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Trainer, TrainingArguments
from transformers import DataCollatorForSeq2Seq
import evaluate
import numpy as np
from datasets import Dataset
import time

# BERTベースの翻訳モデル用設定
MODEL_NAME = "Helsinki-NLP/opus-mt-en-it"  # 英語→イタリア語の事前学習済みモデル

print("Available BERT-based models for EN-IT translation:")
print("1. Helsinki-NLP/opus-mt-en-it (MarianMT)")
print("2. facebook/mbart-large-50-many-to-many-mmt")
print("3. google/mt5-small")

# GPU使用可能性の確認
print(f"\nDevice: {DEVICE}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

In [None]:
# BERT Model and Tokenizer Setup
try:
    # MarianMTモデル（英語→イタリア語）を読み込み
    bert_tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
    bert_model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
    bert_model.to(DEVICE)
    
    print(f"✅ Model loaded: {MODEL_NAME}")
    print(f"Model parameters: {bert_model.num_parameters():,}")
    
    # トークナイザーのテスト
    test_input = "How old are you?"
    tokens = bert_tokenizer(test_input, return_tensors="pt")
    print(f"Tokenizer test: '{test_input}' -> {tokens['input_ids'].shape[1]} tokens")
    
except Exception as e:
    print(f"❌ Error loading BERT model: {e}")
    print("Installing transformers...")
    # Transformersがインストールされていない場合

In [None]:
# BERT Data Preprocessing
def prepare_bert_data():
    """既存のデータをBERT用の形式に変換"""
    
    # ファイルから生データを読み込み
    with open("text-eng.txt", 'r') as f:
        eng_raw = [line.strip() for line in f if line.strip()]
    
    with open("text-ita.txt", 'r') as f:
        ita_raw = [line.strip() for line in f if line.strip()]
    
    # <sos>, <eos>タグを除去（BERTは自動で処理）
    eng_clean = [sent.replace("<sos>", "").replace("<eos>", "").strip() for sent in eng_raw]
    ita_clean = [sent.replace("<sos>", "").replace("<eos>", "").strip() for sent in ita_raw]
    
    # 空の文や極端に長い文を除外
    valid_pairs = []
    for eng, ita in zip(eng_clean, ita_clean):
        if eng and ita and len(eng.split()) <= 30 and len(ita.split()) <= 30:
            valid_pairs.append({"english": eng, "italian": ita})
    
    print(f"Valid pairs: {len(valid_pairs):,}")
    print(f"Sample data:")
    for i in range(3):
        print(f"  EN: {valid_pairs[i]['english']}")
        print(f"  IT: {valid_pairs[i]['italian']}")
        print()
    
    return valid_pairs

# データの準備
bert_data = prepare_bert_data()

# Train/Validation split (80/20)
split_idx = int(0.8 * len(bert_data))
train_data = bert_data[:split_idx]
val_data = bert_data[split_idx:]

print(f"Training samples: {len(train_data):,}")
print(f"Validation samples: {len(val_data):,}")

In [None]:
# BERT Translation Function
def translate_with_bert(sentence, model, tokenizer, max_length=50):
    """BERTを使って英語の文をイタリア語に翻訳"""
    model.eval()
    with torch.no_grad():
        # トークン化
        inputs = tokenizer(sentence, return_tensors="pt", padding=True, truncation=True, max_length=max_length)
        inputs = {k: v.to(DEVICE) for k, v in inputs.items()}
        
        # 翻訳生成
        outputs = model.generate(
            **inputs,
            max_length=max_length,
            num_beams=4,
            early_stopping=True,
            do_sample=False
        )
        
        # デコード
        translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
        return translation

# Performance Comparison Function
def compare_translations(test_sentences):
    """LSTMとBERTの翻訳を比較"""
    
    print("=" * 80)
    print("TRANSLATION COMPARISON: LSTM vs BERT")
    print("=" * 80)
    
    lstm_times = []
    bert_times = []
    
    for i, eng_sentence in enumerate(test_sentences, 1):
        print(f"\n--- Test {i} ---")
        
        # 入力文（<sos>, <eos>を除去）
        clean_eng = eng_sentence.replace("<sos>", "").replace("<eos>", "").strip()
        print(f"Input (EN): {clean_eng}")
        
        # LSTM翻訳
        start_time = time.time()
        lstm_result = translate_sentence(eng_sentence)
        lstm_time = time.time() - start_time
        lstm_times.append(lstm_time)
        print(f"LSTM (IT):  {lstm_result} ({lstm_time:.3f}s)")
        
        # BERT翻訳
        start_time = time.time()
        bert_result = translate_with_bert(clean_eng, bert_model, bert_tokenizer)
        bert_time = time.time() - start_time
        bert_times.append(bert_time)
        print(f"BERT (IT):  {bert_result} ({bert_time:.3f}s)")
    
    # 平均処理時間の比較
    print("\n" + "=" * 80)
    print("PERFORMANCE SUMMARY")
    print("=" * 80)
    print(f"Average LSTM time: {np.mean(lstm_times):.3f}s (±{np.std(lstm_times):.3f}s)")
    print(f"Average BERT time: {np.mean(bert_times):.3f}s (±{np.std(bert_times):.3f}s)")
    print(f"Speed ratio (BERT/LSTM): {np.mean(bert_times)/np.mean(lstm_times):.2f}x")

# テスト実行
test_sentences = [
    "<sos> how old are you ? <eos>",
    "<sos> i like to play tennis . <eos>",
    "<sos> i hope it snows at christmas . <eos>",
    "<sos> would you like to go to the movie theater . <eos>",
    "<sos> the weather is beautiful today . <eos>"
]

# 比較実行
compare_translations(test_sentences)

## 詳細な性能評価

### 評価指標

1. **BLEU Score**: 翻訳品質の自動評価
2. **Translation Speed**: 推論速度（sentences/second）
3. **Memory Usage**: GPU/CPUメモリ使用量
4. **Model Size**: パラメータ数とモデルサイズ

### 期待される結果

**LSTM Encoder-Decoder**
- ✅ **速度**: 高速（軽量アーキテクチャ）
- ⚠️ **品質**: 限定的（語彙サイズとコンテキスト）
- ✅ **メモリ**: 低使用量
- ✅ **学習**: 高速収束

**BERT-based (MarianMT)**
- ⚠️ **速度**: 低速（大規模Transformer）
- ✅ **品質**: 高品質（事前学習済み）
- ⚠️ **メモリ**: 高使用量
- ✅ **学習**: 事前学習済みで高性能

In [None]:
# Detailed Performance Evaluation
import psutil
import gc
from collections import defaultdict

def calculate_bleu_score(predictions, references):
    """簡易BLEU計算（実際のプロジェクトではsacrebleuを推奨）"""
    try:
        from nltk.translate.bleu_score import sentence_bleu
        from nltk.tokenize import word_tokenize
        import nltk
        nltk.download('punkt', quiet=True)
        
        total_score = 0
        for pred, ref in zip(predictions, references):
            pred_tokens = word_tokenize(pred.lower())
            ref_tokens = [word_tokenize(ref.lower())]
            score = sentence_bleu(ref_tokens, pred_tokens)
            total_score += score
        
        return total_score / len(predictions)
    except ImportError:
        print("NLTK not available, using simple word overlap metric")
        total_score = 0
        for pred, ref in zip(predictions, references):
            pred_words = set(pred.lower().split())
            ref_words = set(ref.lower().split())
            if len(ref_words) > 0:
                overlap = len(pred_words & ref_words) / len(ref_words)
                total_score += overlap
        return total_score / len(predictions)

def get_memory_usage():
    """現在のメモリ使用量を取得"""
    process = psutil.Process()
    cpu_memory = process.memory_info().rss / 1024 / 1024  # MB
    
    gpu_memory = 0
    if torch.cuda.is_available():
        gpu_memory = torch.cuda.memory_allocated() / 1024 / 1024  # MB
    
    return {"CPU": cpu_memory, "GPU": gpu_memory}

def comprehensive_evaluation():
    """包括的な性能評価"""
    print("🔍 COMPREHENSIVE PERFORMANCE EVALUATION")
    print("=" * 60)
    
    # テストデータの準備
    test_data = val_data[:50]  # 50サンプルで評価
    eng_sentences = [f"<sos> {item['english']} <eos>" for item in test_data]
    eng_clean = [item['english'] for item in test_data]
    ita_references = [item['italian'] for item in test_data]
    
    # メモリ使用量（開始時）
    gc.collect()
    torch.cuda.empty_cache() if torch.cuda.is_available() else None
    start_memory = get_memory_usage()
    
    print(f"📊 Evaluating on {len(test_data)} samples...")
    print(f"💾 Initial memory: CPU {start_memory['CPU']:.1f}MB, GPU {start_memory['GPU']:.1f}MB")
    
    # === LSTM評価 ===
    print("\n🤖 LSTM Evaluation...")
    lstm_start_time = time.time()
    lstm_predictions = []
    
    for eng_sentence in eng_sentences:
        pred = translate_sentence(eng_sentence)
        # <sos>, <eos>を除去
        clean_pred = pred.replace("<sos>", "").replace("<eos>", "").strip()
        lstm_predictions.append(clean_pred)
    
    lstm_total_time = time.time() - lstm_start_time
    lstm_memory = get_memory_usage()
    
    # === BERT評価 ===
    print("\n🧠 BERT Evaluation...")
    bert_start_time = time.time()
    bert_predictions = []
    
    for eng_sentence in eng_clean:
        pred = translate_with_bert(eng_sentence, bert_model, bert_tokenizer)
        bert_predictions.append(pred)
    
    bert_total_time = time.time() - bert_start_time
    bert_memory = get_memory_usage()
    
    # === 結果計算 ===
    lstm_bleu = calculate_bleu_score(lstm_predictions, ita_references)
    bert_bleu = calculate_bleu_score(bert_predictions, ita_references)
    
    # === 結果表示 ===
    print("\n" + "=" * 60)
    print("📈 FINAL RESULTS")
    print("=" * 60)
    
    results = {
        "LSTM": {
            "BLEU Score": f"{lstm_bleu:.4f}",
            "Total Time": f"{lstm_total_time:.2f}s",
            "Speed": f"{len(test_data)/lstm_total_time:.1f} sent/s",
            "Memory (CPU)": f"{lstm_memory['CPU']:.1f}MB",
            "Memory (GPU)": f"{lstm_memory['GPU']:.1f}MB",
        },
        "BERT": {
            "BLEU Score": f"{bert_bleu:.4f}",
            "Total Time": f"{bert_total_time:.2f}s", 
            "Speed": f"{len(test_data)/bert_total_time:.1f} sent/s",
            "Memory (CPU)": f"{bert_memory['CPU']:.1f}MB",
            "Memory (GPU)": f"{bert_memory['GPU']:.1f}MB",
        }
    }
    
    for model_name, metrics in results.items():
        print(f"\n{model_name}:")
        for metric, value in metrics.items():
            print(f"  {metric:12}: {value}")
    
    # 勝者判定
    print(f"\n🏆 WINNER:")
    print(f"  Quality (BLEU):  {'BERT' if bert_bleu > lstm_bleu else 'LSTM'}")
    print(f"  Speed:           {'LSTM' if len(test_data)/lstm_total_time > len(test_data)/bert_total_time else 'BERT'}")
    print(f"  Efficiency:      {'LSTM' if lstm_memory['GPU'] < bert_memory['GPU'] else 'BERT'}")
    
    return results

# 評価実行
if 'bert_model' in locals():
    evaluation_results = comprehensive_evaluation()
else:
    print("❌ BERT model not loaded. Please run the previous cells first.")