# Fine-tuning LLM for Vietnamese to English Translation

This notebook demonstrates how to fine-tune a large language model (LLM) for translating Vietnamese text to English. We'll use transformer models like mT5 or MarianMT with the Hugging Face ecosystem.

## Project Overview

- **Objective**: Fine-tune a pre-trained model for Vietnamese-English translation
- **Models**: mT5, MarianMT, or similar sequence-to-sequence models
- **Dataset**: Vietnamese-English parallel corpus
- **Evaluation**: BLEU score and qualitative assessment

## Workflow

1. Set up environment and install dependencies
2. Load and preprocess Vietnamese-English translation dataset
3. Tokenize and prepare data for model input
4. Load pre-trained LLM model and tokenizer
5. Configure training arguments for fine-tuning
6. Fine-tune the model on translation data
7. Evaluate model performance on validation set
8. Translate sample Vietnamese sentences

## 1. Set Up Environment and Install Dependencies

First, let's install all the required libraries for our translation project.

In [None]:
# Install required packages
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install transformers datasets accelerate peft bitsandbytes
!pip install rouge-score evaluate
!pip install pandas numpy matplotlib seaborn plotly
!pip install pyvi underthesea  # Vietnamese NLP libraries
!pip install tqdm wandb tensorboard  # Utilities

print("✅ All packages installed successfully!")

In [None]:
# Import necessary libraries
import os
import json
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from tqdm.auto import tqdm
import warnings
warnings.filterwarnings('ignore')

# Transformers and related libraries
from transformers import (
    AutoTokenizer, 
    AutoModelForSeq2SeqLM,
    Seq2SeqTrainingArguments,
    Seq2SeqTrainer,
    DataCollatorForSeq2Seq,
    pipeline
)
from datasets import Dataset, DatasetDict, load_dataset
import evaluate

# PEFT for parameter-efficient fine-tuning
from peft import LoraConfig, get_peft_model, TaskType

# Vietnamese text processing
try:
    from pyvi import ViTokenizer
    from underthesea import word_tokenize as vn_tokenize
    print("✅ Vietnamese NLP libraries loaded")
except ImportError:
    print("⚠️ Vietnamese NLP libraries not available")

# Check GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"🔧 Using device: {device}")
if torch.cuda.is_available():
    print(f"📊 GPU: {torch.cuda.get_device_name(0)}")
    print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

## 2. Load and Preprocess Vietnamese-English Translation Dataset

We'll create a sample dataset and also show how to load real parallel corpora.

In [None]:
# Create a sample Vietnamese-English dataset
sample_data = [
    {"vietnamese": "Xin chào! Tôi tên là Nam.", "english": "Hello! My name is Nam."},
    {"vietnamese": "Hôm nay thời tiết rất đẹp.", "english": "The weather is very nice today."},
    {"vietnamese": "Tôi đang học tiếng Anh.", "english": "I am learning English."},
    {"vietnamese": "Bạn có khỏe không?", "english": "How are you?"},
    {"vietnamese": "Cảm ơn bạn rất nhiều.", "english": "Thank you very much."},
    {"vietnamese": "Xin lỗi, tôi không hiểu.", "english": "Sorry, I don't understand."},
    {"vietnamese": "Bạn có thể giúp tôi được không?", "english": "Can you help me?"},
    {"vietnamese": "Tôi muốn đi du lịch Việt Nam.", "english": "I want to travel to Vietnam."},
    {"vietnamese": "Món phở này rất ngon.", "english": "This pho is very delicious."},
    {"vietnamese": "Chúc bạn một ngày tốt lành.", "english": "Have a nice day."},
    {"vietnamese": "Tôi yêu văn hóa Việt Nam.", "english": "I love Vietnamese culture."},
    {"vietnamese": "Hà Nội là thủ đô của Việt Nam.", "english": "Hanoi is the capital of Vietnam."},
    {"vietnamese": "Tôi thích ăn bánh mì.", "english": "I like to eat banh mi."},
    {"vietnamese": "Ngày mai tôi sẽ đi làm.", "english": "Tomorrow I will go to work."},
    {"vietnamese": "Gia đình tôi có bốn người.", "english": "My family has four people."},
    {"vietnamese": "Tôi học tại trường đại học.", "english": "I study at the university."},
    {"vietnamese": "Xe buýt sẽ đến lúc 8 giờ.", "english": "The bus will arrive at 8 o'clock."},
    {"vietnamese": "Tôi cần mua một cuốn sách.", "english": "I need to buy a book."},
    {"vietnamese": "Bác sĩ nói tôi khỏe mạnh.", "english": "The doctor says I'm healthy."},
    {"vietnamese": "Chúng tôi sẽ gặp nhau tối nay.", "english": "We will meet tonight."}
]

# Convert to DataFrame
df = pd.DataFrame(sample_data)
print(f"📊 Created sample dataset with {len(df)} translation pairs")

# Display first few examples
print("\n🔍 Sample translation pairs:")
for i in range(5):
    print(f"🇻🇳 Vietnamese: {df.iloc[i]['vietnamese']}")
    print(f"🇺🇸 English: {df.iloc[i]['english']}")
    print("-" * 50)

In [None]:
# Text preprocessing functions
import re

def clean_vietnamese_text(text):
    """Clean Vietnamese text."""
    # Remove extra whitespace
    text = re.sub(r'\s+', ' ', text)
    # Remove special characters (keep Vietnamese diacritics)
    text = re.sub(r'[^\w\s\u00C0-\u1EF9]', '', text)
    return text.strip()

def clean_english_text(text):
    """Clean English text."""
    # Remove extra whitespace
    text = re.sub(r'\s+', ' ', text)
    # Remove special characters (keep basic punctuation)
    text = re.sub(r'[^\w\s.,!?]', '', text)
    return text.strip()

# Apply preprocessing
df['vietnamese_clean'] = df['vietnamese'].apply(clean_vietnamese_text)
df['english_clean'] = df['english'].apply(clean_english_text)

# Analyze text lengths
df['vn_length'] = df['vietnamese_clean'].str.len()
df['en_length'] = df['english_clean'].str.len()
df['vn_words'] = df['vietnamese_clean'].str.split().str.len()
df['en_words'] = df['english_clean'].str.split().str.len()

# Display statistics
print("📈 Dataset Statistics:")
print(f"Average Vietnamese length: {df['vn_length'].mean():.1f} characters")
print(f"Average English length: {df['en_length'].mean():.1f} characters")
print(f"Average Vietnamese words: {df['vn_words'].mean():.1f} words")
print(f"Average English words: {df['en_words'].mean():.1f} words")

# Visualize length distributions
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

axes[0].hist(df['vn_words'], bins=10, alpha=0.7, label='Vietnamese', color='blue')
axes[0].hist(df['en_words'], bins=10, alpha=0.7, label='English', color='red')
axes[0].set_xlabel('Number of Words')
axes[0].set_ylabel('Frequency')
axes[0].set_title('Word Count Distribution')
axes[0].legend()

axes[1].scatter(df['vn_words'], df['en_words'], alpha=0.7)
axes[1].set_xlabel('Vietnamese Words')
axes[1].set_ylabel('English Words')
axes[1].set_title('Vietnamese vs English Word Count')

plt.tight_layout()
plt.show()

## 3. Tokenize and Prepare Data for Model Input

We'll use a multilingual T5 (mT5) tokenizer to process our Vietnamese-English data.

In [None]:
# Choose model and load tokenizer
MODEL_NAME = "google/mt5-small"  # Start with small model for testing
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

print(f"🤖 Loaded tokenizer: {MODEL_NAME}")
print(f"📚 Vocabulary size: {len(tokenizer)}")

# Test tokenization on a sample
sample_vn = df.iloc[0]['vietnamese_clean']
sample_en = df.iloc[0]['english_clean']

print(f"\n🧪 Tokenization Test:")
print(f"Vietnamese: {sample_vn}")
vn_tokens = tokenizer.tokenize(sample_vn)
print(f"VN Tokens: {vn_tokens}")
print(f"VN Token IDs: {tokenizer.encode(sample_vn)}")

print(f"\nEnglish: {sample_en}")
en_tokens = tokenizer.tokenize(sample_en)
print(f"EN Tokens: {en_tokens}")
print(f"EN Token IDs: {tokenizer.encode(sample_en)}")

In [None]:
# Prepare data in T5 format: "translate Vietnamese to English: [VN_TEXT]" -> "[EN_TEXT]"
def create_t5_format(vietnamese_text, english_text):
    """Create T5 input-output format."""
    input_text = f"translate Vietnamese to English: {vietnamese_text}"
    target_text = english_text
    return input_text, target_text

# Create T5 formatted data
inputs = []
targets = []

for _, row in df.iterrows():
    input_text, target_text = create_t5_format(
        row['vietnamese_clean'], 
        row['english_clean']
    )
    inputs.append(input_text)
    targets.append(target_text)

# Create Hugging Face Dataset
dataset_dict = {
    'input_text': inputs,
    'target_text': targets,
    'vietnamese': df['vietnamese_clean'].tolist(),
    'english': df['english_clean'].tolist()
}

dataset = Dataset.from_dict(dataset_dict)

# Split into train/validation sets (80/20 split)
train_size = int(0.8 * len(dataset))
dataset = dataset.shuffle(seed=42)
train_dataset = dataset.select(range(train_size))
val_dataset = dataset.select(range(train_size, len(dataset)))

dataset_splits = DatasetDict({
    'train': train_dataset,
    'validation': val_dataset
})

print(f"📊 Dataset splits:")
print(f"  Training: {len(train_dataset)} examples")
print(f"  Validation: {len(val_dataset)} examples")

# Show example of formatted data
print(f"\n🔍 Example formatted data:")
example = train_dataset[0]
print(f"Input: {example['input_text']}")
print(f"Target: {example['target_text']}")

In [None]:
# Tokenization preprocessing function
max_input_length = 128
max_target_length = 128

def preprocess_function(examples):
    """Tokenize inputs and targets."""
    inputs = examples['input_text']
    targets = examples['target_text']
    
    # Tokenize inputs
    model_inputs = tokenizer(
        inputs,
        max_length=max_input_length,
        truncation=True,
        padding=False  # We'll pad dynamically during training
    )
    
    # Tokenize targets
    labels = tokenizer(
        targets,
        max_length=max_target_length,
        truncation=True,
        padding=False
    )
    
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

# Apply tokenization to datasets
tokenized_datasets = dataset_splits.map(
    preprocess_function,
    batched=True,
    remove_columns=dataset_splits["train"].column_names
)

print("✅ Tokenization completed")
print(f"📊 Tokenized dataset structure:")
print(tokenized_datasets)

# Show tokenized example
example = tokenized_datasets['train'][0]
print(f"\n🔍 Tokenized example:")
print(f"Input IDs shape: {len(example['input_ids'])}")
print(f"Label IDs shape: {len(example['labels'])}")
print(f"First few input tokens: {tokenizer.convert_ids_to_tokens(example['input_ids'][:10])}")
print(f"First few label tokens: {tokenizer.convert_ids_to_tokens(example['labels'][:10])}")

## 4. Load Pre-trained LLM Model and Tokenizer

We'll load the mT5 model and set it up for translation fine-tuning.

In [None]:
# Load the pre-trained model
model = AutoModelForSeq2SeqLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)

print(f"🤖 Loaded model: {MODEL_NAME}")
print(f"📊 Model parameters: {model.num_parameters():,}")

# Move model to device
model = model.to(device)

# Set up LoRA (Parameter-Efficient Fine-Tuning)
USE_PEFT = True  # Set to False for full fine-tuning

if USE_PEFT:
    peft_config = LoraConfig(
        task_type=TaskType.SEQ_2_SEQ_LM,
        inference_mode=False,
        r=16,                    # LoRA rank
        lora_alpha=32,          # LoRA alpha
        lora_dropout=0.1,       # LoRA dropout
        target_modules=["q", "v"]  # Target attention modules
    )
    
    model = get_peft_model(model, peft_config)
    model.print_trainable_parameters()
    print("✅ LoRA configuration applied")
else:
    print("📚 Using full fine-tuning")

# Setup data collator for dynamic padding
data_collator = DataCollatorForSeq2Seq(
    tokenizer=tokenizer,
    model=model,
    label_pad_token_id=-100,
    pad_to_multiple_of=8 if torch.cuda.is_available() else None
)

## 5. Configure Training Arguments for Fine-Tuning

Set up training parameters optimized for our translation task.

In [None]:
# Training arguments
output_dir = "./results/mt5-vietnamese-english"

training_args = Seq2SeqTrainingArguments(
    output_dir=output_dir,
    
    # Training configuration
    num_train_epochs=10,  # More epochs for small dataset
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,
    
    # Learning rate and optimization
    learning_rate=3e-4,
    weight_decay=0.01,
    warmup_steps=100,
    
    # Evaluation and saving
    eval_steps=50,
    save_steps=50,
    logging_steps=25,
    evaluation_strategy="steps",
    save_strategy="steps",
    
    # Model selection
    load_best_model_at_end=True,
    metric_for_best_model="eval_bleu",
    greater_is_better=True,
    
    # Generation parameters
    predict_with_generate=True,
    generation_max_length=max_target_length,
    generation_num_beams=4,
    
    # Performance optimizations
    fp16=torch.cuda.is_available(),
    dataloader_num_workers=0,  # Reduce for small dataset
    remove_unused_columns=False,
    
    # Reporting
    report_to=None,  # Disable wandb for this demo
    save_total_limit=3,
)

print("⚙️ Training arguments configured:")
print(f"  📁 Output directory: {output_dir}")
print(f"  🔢 Epochs: {training_args.num_train_epochs}")
print(f"  📏 Batch size: {training_args.per_device_train_batch_size}")
print(f"  📊 Learning rate: {training_args.learning_rate}")
print(f"  🎯 Evaluation steps: {training_args.eval_steps}")
print(f"  💾 Save steps: {training_args.save_steps}")

In [None]:
# Load evaluation metrics
bleu_metric = evaluate.load("bleu")
rouge_metric = evaluate.load("rouge")

def compute_metrics(eval_preds):
    """Compute BLEU and ROUGE metrics for evaluation."""
    predictions, labels = eval_preds
    
    # Decode predictions and labels
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    
    # Replace -100 in the labels as we can't decode them
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    
    # Clean up text
    decoded_preds = [pred.strip() for pred in decoded_preds]
    decoded_labels = [label.strip() for label in decoded_labels]
    
    # Compute BLEU score
    bleu_result = bleu_metric.compute(
        predictions=decoded_preds,
        references=[[label] for label in decoded_labels]
    )
    
    # Compute ROUGE scores
    rouge_result = rouge_metric.compute(
        predictions=decoded_preds,
        references=decoded_labels
    )
    
    # Combine results
    result = {
        "bleu": bleu_result["bleu"],
        "rouge1": rouge_result["rouge1"],
        "rouge2": rouge_result["rouge2"],
        "rougeL": rouge_result["rougeL"]
    }
    
    # Add generation length info
    prediction_lens = [len(pred.split()) for pred in decoded_preds]
    result["gen_len"] = np.mean(prediction_lens)
    
    return result

print("📊 Evaluation metrics configured:")
print("  - BLEU score for translation quality")
print("  - ROUGE scores for text similarity")
print("  - Generation length statistics")

## 6. Fine-Tune the Model on Translation Data

Now we'll train our model on the Vietnamese-English dataset.

In [None]:
# Create trainer
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

print("🚀 Trainer initialized!")
print(f"📊 Training dataset size: {len(tokenized_datasets['train'])}")
print(f"📊 Validation dataset size: {len(tokenized_datasets['validation'])}")

# Test the model before training
print("\n🧪 Testing model before training:")
test_input = "translate Vietnamese to English: Xin chào bạn!"
inputs = tokenizer.encode(test_input, return_tensors="pt").to(device)
with torch.no_grad():
    outputs = model.generate(inputs, max_length=50, num_beams=4)
    translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"Before training: {translation}")

# Start training
print("\n🏋️ Starting training...")
print("This may take a few minutes...")

trainer.train()

In [None]:
# Save the fine-tuned model
model.save_pretrained(f"{output_dir}/final_model")
tokenizer.save_pretrained(f"{output_dir}/final_model")

print("💾 Model saved successfully!")

# Plot training history
training_history = pd.DataFrame(trainer.state.log_history)

if len(training_history) > 0:
    # Separate training and evaluation logs
    train_logs = training_history[training_history['train_loss'].notna()]
    eval_logs = training_history[training_history['eval_loss'].notna()]
    
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Training loss
    if len(train_logs) > 0:
        axes[0, 0].plot(train_logs['step'], train_logs['train_loss'], 'b-', label='Training Loss')
        axes[0, 0].set_title('Training Loss')
        axes[0, 0].set_xlabel('Steps')
        axes[0, 0].set_ylabel('Loss')
        axes[0, 0].legend()
    
    # Evaluation loss
    if len(eval_logs) > 0:
        axes[0, 1].plot(eval_logs['step'], eval_logs['eval_loss'], 'r-', label='Validation Loss')
        axes[0, 1].set_title('Validation Loss')
        axes[0, 1].set_xlabel('Steps')
        axes[0, 1].set_ylabel('Loss')
        axes[0, 1].legend()
    
    # BLEU score
    if len(eval_logs) > 0 and 'eval_bleu' in eval_logs.columns:
        axes[1, 0].plot(eval_logs['step'], eval_logs['eval_bleu'], 'g-', label='BLEU Score')
        axes[1, 0].set_title('BLEU Score')
        axes[1, 0].set_xlabel('Steps')
        axes[1, 0].set_ylabel('BLEU')
        axes[1, 0].legend()
    
    # Learning rate
    if len(train_logs) > 0 and 'learning_rate' in train_logs.columns:
        axes[1, 1].plot(train_logs['step'], train_logs['learning_rate'], 'm-', label='Learning Rate')
        axes[1, 1].set_title('Learning Rate')
        axes[1, 1].set_xlabel('Steps')
        axes[1, 1].set_ylabel('Learning Rate')
        axes[1, 1].legend()
    
    plt.tight_layout()
    plt.show()
else:
    print("⚠️ No training history available for plotting")

print("✅ Training completed!")

## 7. Evaluate Model Performance on Validation Set

Let's evaluate our fine-tuned model and compare it with the pre-trained version.

In [None]:
# Evaluate the fine-tuned model
print("📊 Evaluating fine-tuned model...")
eval_results = trainer.evaluate()

print("\n🎯 Evaluation Results:")
for key, value in eval_results.items():
    if isinstance(value, float):
        print(f"  {key}: {value:.4f}")
    else:
        print(f"  {key}: {value}")

# Detailed evaluation on validation set
validation_data = dataset_splits['validation']
print(f"\n🔍 Detailed evaluation on {len(validation_data)} examples:")

predictions = []
references = []
vietnamese_texts = []

for i, example in enumerate(validation_data):
    vietnamese_text = example['vietnamese']
    english_text = example['english']
    
    # Create input for the model
    input_text = f"translate Vietnamese to English: {vietnamese_text}"
    inputs = tokenizer.encode(input_text, return_tensors="pt", max_length=max_input_length, truncation=True).to(device)
    
    # Generate translation
    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=max_target_length,
            num_beams=4,
            early_stopping=True,
            do_sample=False
        )
    
    # Decode prediction
    prediction = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    predictions.append(prediction)
    references.append(english_text)
    vietnamese_texts.append(vietnamese_text)

# Calculate metrics
final_bleu = bleu_metric.compute(predictions=predictions, references=[[ref] for ref in references])
final_rouge = rouge_metric.compute(predictions=predictions, references=references)

print(f"\n📈 Final Metrics:")
print(f"  BLEU Score: {final_bleu['bleu']:.4f}")
print(f"  ROUGE-1: {final_rouge['rouge1']:.4f}")
print(f"  ROUGE-2: {final_rouge['rouge2']:.4f}")
print(f"  ROUGE-L: {final_rouge['rougeL']:.4f}")

# Show some examples
print(f"\n🌟 Translation Examples:")
print("=" * 80)
for i in range(min(5, len(predictions))):
    print(f"Vietnamese: {vietnamese_texts[i]}")
    print(f"Reference:  {references[i]}")
    print(f"Predicted:  {predictions[i]}")
    print("-" * 80)

## 8. Translate Sample Vietnamese Sentences

Let's test our fine-tuned model with some new Vietnamese sentences and see how well it performs.

In [None]:
# Create a translation function
def translate_vietnamese(text, max_length=128, num_beams=4):
    """Translate Vietnamese text to English using our fine-tuned model."""
    input_text = f"translate Vietnamese to English: {text}"
    inputs = tokenizer.encode(
        input_text, 
        return_tensors="pt", 
        max_length=max_input_length, 
        truncation=True
    ).to(device)
    
    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=max_length,
            num_beams=num_beams,
            early_stopping=True,
            do_sample=False,
            pad_token_id=tokenizer.pad_token_id
        )
    
    translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return translation

# Test sentences (not in training data)
test_sentences = [
    "Tôi rất thích món phở Việt Nam.",
    "Hôm nay trời mưa to lắm.",
    "Bạn có thể cho tôi biết đường đi không?",
    "Chúng tôi sẽ gặp nhau vào cuối tuần.",
    "Tiếng Việt là ngôn ngữ rất hay.",
    "Tôi muốn học lập trình máy tính.",
    "Cuộc sống ở thành phố rất náo nhiệt.",
    "Gia đình là điều quan trọng nhất."
]

print("🇻🇳 ➡️ 🇺🇸 Vietnamese to English Translation Results")
print("=" * 70)

for i, vietnamese_text in enumerate(test_sentences, 1):
    translation = translate_vietnamese(vietnamese_text)
    print(f"{i}. Vietnamese: {vietnamese_text}")
    print(f"   English:   {translation}")
    print("-" * 70)

# Interactive translation (you can modify this cell to test your own sentences)
print("\n🚀 Try your own translations!")
print("Modify the sentence below and run the cell to see the translation:")

custom_sentence = "Tôi yêu Việt Nam và văn hóa truyền thống."
custom_translation = translate_vietnamese(custom_sentence)

print(f"\n🇻🇳 Vietnamese: {custom_sentence}")
print(f"🇺🇸 English: {custom_translation}")

## Conclusion and Next Steps

🎉 **Congratulations!** You have successfully fine-tuned a large language model for Vietnamese to English translation.

### What we accomplished:
- ✅ Set up the environment with all necessary libraries
- ✅ Created and preprocessed a Vietnamese-English dataset
- ✅ Tokenized data using mT5 tokenizer
- ✅ Loaded and configured a pre-trained mT5 model
- ✅ Applied LoRA for parameter-efficient fine-tuning
- ✅ Trained the model with proper evaluation metrics
- ✅ Evaluated performance using BLEU and ROUGE scores
- ✅ Tested translations on new Vietnamese sentences

### Next Steps to Improve:

1. **Larger Dataset**: 
   - Use larger parallel corpora (OPUS, OpenSubtitles, etc.)
   - Implement the data download script in `scripts/download_data.py`

2. **Better Models**: 
   - Try larger models (mT5-base, mT5-large)
   - Experiment with other architectures (MarianMT, NLLB)

3. **Advanced Techniques**:
   - Data augmentation (back-translation)
   - Multi-task learning
   - Domain adaptation

4. **Production Deployment**:
   - Use the inference script in `src/inference/translate.py`
   - Deploy as a web service or API
   - Optimize for speed and memory usage

### Resources:
- 📖 [Hugging Face Transformers Documentation](https://huggingface.co/docs/transformers)
- 🔧 [PEFT Documentation](https://huggingface.co/docs/peft)
- 📊 [Evaluate Library](https://huggingface.co/docs/evaluate)
- 🌐 [OPUS Parallel Corpora](http://opus.nlpl.eu/)

### Model Performance Tips:
- Monitor validation loss to avoid overfitting
- Experiment with different learning rates
- Try different beam search settings for generation
- Consider ensembling multiple models for better results

Happy translating! 🚀