# NLLB Amharic-English Translation Evaluation

This notebook evaluates three NLLB models (600M, 1.3B, and 3.3B) for bidirectional Amharic-English translation.

**Models to evaluate:**
- `facebook/nllb-200-distilled-600M` (600M parameters)
- `facebook/nllb-200-1.3B` (1.3B parameters)
- `facebook/nllb-200-3.3B` (3.3B parameters)

**Evaluation includes:**
- Simple sentences (greetings, questions)
- Complex sentences (multi-clause, conditionals)
- Domain-specific text (medical, technical, financial)
- Paragraphs (long-form text)
- Bidirectional translation (Amharic ↔ English)

**Metrics:**
- BLEU Score
- chrF Score
- Translation time per sentence
- Overall performance comparison


In [None]:
# Install required packages
%pip install torch transformers sentencepiece sacrebleu pandas numpy tqdm -q
%pip install accelerate bitsandbytes -q  # For memory optimization

import torch
import json
import time
import pandas as pd
from pathlib import Path
from typing import Dict, List
from sacrebleu import BLEU, CHRF
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from tqdm import tqdm

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")


In [None]:
# Define comprehensive test dataset
# This includes all original test pairs plus additional extensive examples

EXTENSIVE_TEST_DATASET = {
    "test_pairs": [
        # === DAILY CONVERSATION ===
        {"id": 1, "category": "daily_conversation", "amharic": "እንዴት ነህ?", "english": "How are you?"},
        {"id": 2, "category": "daily_conversation", "amharic": "ስምዎ ማን ነው?", "english": "What is your name?"},
        {"id": 3, "category": "daily_conversation", "amharic": "ኢትዮጵያ ጥሩ ናት።", "english": "Ethiopia is good."},
        {"id": 4, "category": "daily_conversation", "amharic": "ዛሬ ምን እየሰራን ነው?", "english": "What are we doing today?"},
        {"id": 5, "category": "daily_conversation", "amharic": "እባክህ ይሄንን አስተርጎምልኝ።", "english": "Please translate this for me."},
        {"id": 6, "category": "daily_conversation", "amharic": "እኔ አሚን ነኝ።", "english": "I am Amin."},
        {"id": 7, "category": "daily_conversation", "amharic": "በጣም ደስ ይለኛል።", "english": "I am very happy."},
        {"id": 8, "category": "daily_conversation", "amharic": "ምን እየሰራክ ነው?", "english": "What are you doing?"},
        
        # === NEWS/SOCIAL ===
        {"id": 9, "category": "news_social", "amharic": "የኢትዮጵያ መንግሥት አዲስ የጤና ፕሮግራሞችን አስጀመረ።", "english": "The Ethiopian government launched new health programs."},
        {"id": 10, "category": "news_social", "amharic": "በአዲስ አበባ የተካሄደው ስብሰባ በተሳካ ሁኔታ ተጠናቋል።", "english": "The meeting held in Addis Ababa was completed successfully."},
        {"id": 11, "category": "news_social", "amharic": "የትምህርት ሚኒስቴር የአዲስ ትምህርት ሥርዓት አስተዋወቀ።", "english": "The Ministry of Education announced a new education system."},
        {"id": 12, "category": "news_social", "amharic": "የኢትዮጵያ አየር መንገድ አዲስ አውሮፕላኖችን ገዝቷል።", "english": "Ethiopian Airlines bought new airplanes."},
        
        # === COMPLEX SENTENCES ===
        {"id": 13, "category": "complex", "amharic": "አንድ ሰው ስለራሱ ማሰብ እንዳይችል እና ስለሌሎች ሰዎች እንዲያስብ የሚያስተምረው ትምህርት አለ።", "english": "There is a lesson that teaches a person not to think about himself and to think about other people."},
        {"id": 14, "category": "complex", "amharic": "በኢትዮጵያ ውስጥ የሚኖሩ የተለያዩ የብሔረሰቦች የራሳቸውን ቋንቋ እና ባህላቸውን ለመጠበቅ መብት አላቸው።", "english": "Different ethnic groups living in Ethiopia have the right to preserve their own language and culture."},
        {"id": 15, "category": "complex", "amharic": "አንድ ሰው የተወሰነ ነገር ለማድረግ አስቸጋሪ ከሆነ በተለያዩ መንገዶች መሞከር አለበት።", "english": "When it is difficult for a person to do something specific, they should try different ways."},
        
        # === DOMAIN-SPECIFIC ===
        {"id": 16, "category": "domain_specific", "amharic": "የኮምፒዩተር ፕሮግራሚንግ የሚያስተምረው ትምህርት በአሁኑ ጊዜ በጣም አስፈላጊ ነው።", "english": "The lesson that teaches computer programming is very important now."},
        {"id": 17, "category": "domain_specific", "amharic": "የተፈጥሮ ሀብቶችን ለመጠበቅ እና ለመጠቀም አዲስ ቴክኖሎጂዎች አስፈላጊ ናቸው።", "english": "New technologies are necessary to preserve and use natural resources."},
        {"id": 18, "category": "domain_specific", "amharic": "የባንክ ስርዓት አሁን ከወደፊቱ የኢኮኖሚ እድገት አስፈላጊ ነው።", "english": "The banking system is important for future economic growth now."},
        {"id": 19, "category": "domain_specific", "amharic": "የሕክምና ሙያዎች ለሰዎች ጤና አስፈላጊ ናቸው።", "english": "Medical professions are important for people's health."},
        {"id": 20, "category": "domain_specific", "amharic": "የአርቲፊሻል ኢንተሊጀንስ ቴክኖሎጂ በፍጥነት እያደገ ነው።", "english": "Artificial intelligence technology is growing rapidly."},
        
        # === PARAGRAPHS (Long-form text) ===
        {"id": 21, "category": "paragraph", "amharic": "ኢትዮጵያ በአፍሪካ ምሥራቅ ውስጥ የምትገኝ ሀገር ናት። ይህች ሀገር ከጥንት ጀምሮ የተለያዩ ባህሎችን እና ቋንቋዎችን አላት። የኢትዮጵያ ህዝብ በጣም ሰላማዊ እና ሞገሳም ነው።", "english": "Ethiopia is a country located in East Africa. This country has various cultures and languages since ancient times. The Ethiopian people are very peaceful and hospitable."},
        
        {"id": 22, "category": "paragraph", "amharic": "የኮምፒዩተር ሳይንስ በዘመናዊው ዓለም ውስጥ በጣም አስፈላጊ የሆነ የመማሪያ ዘርፍ ነው። ብዙ የኮሌጆች ተማሪዎች ይህንን ሙያ ይመርጣሉ ምክንያቱም ከፍተኛ የስራ እድሎች ስላሉት ነው። የአይቲ ኢንዱስትሪ በፍጥነት እያደገ ነው።", "english": "Computer science is a very important field of study in the modern world. Many college students choose this profession because it has high job opportunities. The IT industry is growing rapidly."},
        
        {"id": 23, "category": "paragraph", "amharic": "የጤና አጠባበቅ ለሁሉም ሰው አስፈላጊ ነው። ለጤናማ ሕይወት መኖር ምግብን በትክክል መመገብ እና የሰውነት መለማለድ አስፈላጊ ናቸው። እንዲሁም ሙሉ እንቅልፍ መቀበል እና የጤና ክትትል መስራት አስፈላጊ ነው።", "english": "Health care is important for everyone. To live a healthy life, eating food properly and exercising the body are necessary. Also, getting full sleep and having health checkups are important."},
        
        {"id": 24, "category": "paragraph", "amharic": "የትምህርት ሥርዓት ለሀገር እድገት አስፈላጊ ነው። መንግሥት ለሁሉም ልጆች የትምህርት እድል ለመስጠት እየሰራ ነው። በአሁኑ ጊዜ የአዲስ ቴክኖሎጂ በትምህርት ውስጥ የሚጠቀም ሲሆን ይህ የትምህርት ሥርዓትን ያሻሽላል።", "english": "The education system is important for national development. The government is working to provide educational opportunities for all children. Currently, new technology is being used in education, and this improves the education system."},
        
        {"id": 25, "category": "paragraph", "amharic": "የኢኮኖሚ እድገት ለሀገር እድገት አስፈላጊ ነው። የንግድ እና የኢንዱስትሪ ኢንቬስትመንቶች ኢኮኖሚውን እያሻሻሉ ነው። መንግሥት የኢንቨስትመንት አካባቢዎችን ለመፍጠር እየሰራ ነው። ይህ የስራ እድሎችን ይፈጥራል እና የሕዝቡን የሕይወት ደረጃ ያሻሽላል።", "english": "Economic growth is important for national development. Business and industrial investments are improving the economy. The government is working to create investment opportunities. This creates job opportunities and improves the standard of living of the people."},
    ]
}

print(f"Total test pairs: {len(EXTENSIVE_TEST_DATASET['test_pairs'])}")
print(f"Categories: {set(p['category'] for p in EXTENSIVE_TEST_DATASET['test_pairs'])}")


In [None]:
# NLLB Translator Class
class NLLBTranslator:
    """NLLB Translation Model Wrapper"""
    
    def __init__(self, model_name: str):
        self.model_name = model_name
        self.model = None
        self.tokenizer = None
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.lang_codes = {
            "amharic": "amh_Ethi",
            "english": "eng_Latn"
        }
        
    def load_model(self):
        """Load the NLLB model from Hugging Face Hub"""
        print(f"Loading {self.model_name}...")
        print(f"Device: {self.device}")
        
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
        self.model = AutoModelForSeq2SeqLM.from_pretrained(self.model_name)
        self.model.eval()
        
        if self.device == "cuda":
            self.model = self.model.to(self.device)
            print(f"✓ Model loaded on GPU: {torch.cuda.get_device_name(0)}")
        else:
            print("✓ Model loaded on CPU")
    
    def translate_amh_to_eng(self, amharic_text: str) -> str:
        """Translate Amharic to English"""
        if self.model is None or self.tokenizer is None:
            raise ValueError("Model not loaded. Call load_model() first.")
        
        try:
            src_lang_code = self.lang_codes["amharic"]
            tgt_lang_code = self.lang_codes["english"]
            
            self.tokenizer.src_lang = src_lang_code
            vocab = self.tokenizer.get_vocab()
            
            tgt_lang_id = vocab.get(tgt_lang_code)
            if tgt_lang_id is None:
                raise ValueError(f"Could not find language code token: {tgt_lang_code}")
            
            inputs = self.tokenizer(amharic_text, return_tensors="pt")
            
            if self.device == "cuda":
                inputs = {k: v.to(self.device) for k, v in inputs.items()}
            
            with torch.no_grad():
                generated_tokens = self.model.generate(
                    **inputs,
                    forced_bos_token_id=tgt_lang_id,
                    max_length=512,
                    num_beams=5,
                    length_penalty=1.0
                )
            
            translation = self.tokenizer.batch_decode(
                generated_tokens, 
                skip_special_tokens=True
            )[0]
            
            return translation
        except Exception as e:
            return f"[Error: {str(e)}]"
    
    def translate_eng_to_amh(self, english_text: str) -> str:
        """Translate English to Amharic"""
        if self.model is None or self.tokenizer is None:
            raise ValueError("Model not loaded. Call load_model() first.")
        
        try:
            src_lang_code = self.lang_codes["english"]
            tgt_lang_code = self.lang_codes["amharic"]
            
            self.tokenizer.src_lang = src_lang_code
            vocab = self.tokenizer.get_vocab()
            
            tgt_lang_id = vocab.get(tgt_lang_code)
            if tgt_lang_id is None:
                raise ValueError(f"Could not find language code token: {tgt_lang_code}")
            
            inputs = self.tokenizer(english_text, return_tensors="pt")
            
            if self.device == "cuda":
                inputs = {k: v.to(self.device) for k, v in inputs.items()}
            
            with torch.no_grad():
                generated_tokens = self.model.generate(
                    **inputs,
                    forced_bos_token_id=tgt_lang_id,
                    max_length=512,
                    num_beams=5,
                    length_penalty=1.0
                )
            
            translation = self.tokenizer.batch_decode(
                generated_tokens, 
                skip_special_tokens=True
            )[0]
            
            return translation
        except Exception as e:
            return f"[Error: {str(e)}]"

print("✓ Translator class defined")


In [None]:
# Evaluation Function
def evaluate_model(translator, test_pairs, direction="amh_to_eng"):
    """Evaluate translation quality for a given direction"""
    bleu = BLEU()
    chrf = CHRF()
    
    sources = []
    references = []
    hypotheses = []
    translation_times = []
    
    print(f"\n{'='*70}")
    print(f"Evaluating {direction.upper().replace('_', ' → ')}")
    print(f"{'='*70}")
    
    for pair in tqdm(test_pairs, desc="Translating"):
        if direction == "amh_to_eng":
            source = pair["amharic"]
            reference = pair["english"]
            start_time = time.time()
            translation = translator.translate_amh_to_eng(source)
            translation_time = time.time() - start_time
        else:  # eng_to_amh
            source = pair["english"]
            reference = pair["amharic"]
            start_time = time.time()
            translation = translator.translate_eng_to_amh(source)
            translation_time = time.time() - start_time
        
        sources.append(source)
        references.append(reference)
        hypotheses.append(translation)
        translation_times.append(translation_time)
    
    # Calculate metrics
    bleu_score = bleu.corpus_score(hypotheses, [references]).score
    chrf_score = chrf.corpus_score(hypotheses, [references]).score
    
    results = {
        "direction": direction,
        "model": translator.model_name,
        "metrics": {
            "bleu": bleu_score,
            "chrf": chrf_score
        },
        "performance": {
            "total_time": sum(translation_times),
            "avg_time_per_sentence": sum(translation_times) / len(translation_times),
            "sentences_per_second": len(test_pairs) / sum(translation_times)
        },
        "translations": [
            {
                "id": pair["id"],
                "category": pair["category"],
                "source": src,
                "reference": ref,
                "translation": hyp,
                "time": t
            }
            for pair, src, ref, hyp, t in zip(test_pairs, sources, references, hypotheses, translation_times)
        ]
    }
    
    print(f"\nBLEU Score: {bleu_score:.2f}")
    print(f"chrF Score: {chrf_score:.2f}")
    print(f"Avg Time/Sentence: {results['performance']['avg_time_per_sentence']:.2f}s")
    
    return results

print("✓ Evaluation function defined")


## Model Evaluation

We'll evaluate all three models sequentially (since running them in parallel would require too much GPU memory).


In [None]:
# Models to evaluate
MODELS = [
    "facebook/nllb-200-distilled-600M",
    "facebook/nllb-200-1.3B",
    "facebook/nllb-200-3.3B"
]

test_pairs = EXTENSIVE_TEST_DATASET["test_pairs"]
all_results = {}

# Evaluate each model
for model_name in MODELS:
    print(f"\n{'='*70}")
    print(f"EVALUATING MODEL: {model_name}")
    print(f"{'='*70}")
    
    try:
        # Load model
        translator = NLLBTranslator(model_name)
        translator.load_model()
        
        # Clear GPU cache
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
        
        # Evaluate both directions
        amh_to_eng_results = evaluate_model(translator, test_pairs, "amh_to_eng")
        
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
        
        eng_to_amh_results = evaluate_model(translator, test_pairs, "eng_to_amh")
        
        all_results[model_name] = {
            "amh_to_eng": amh_to_eng_results,
            "eng_to_amh": eng_to_amh_results
        }
        
        # Unload model to free memory
        del translator.model
        del translator.tokenizer
        del translator
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
        
        print(f"\n✓ Completed evaluation for {model_name}")
        
    except Exception as e:
        print(f"\n✗ Error evaluating {model_name}: {e}")
        import traceback
        traceback.print_exc()
        
    print("\n" + "="*70 + "\n")

print("\n✓ All evaluations completed!")


## Results Summary

Generate formatted results for easy copying to Google Docs.


In [None]:
# Create comprehensive results summary
print("="*80)
print("COMPREHENSIVE EVALUATION RESULTS")
print("="*80)
print("\n")

# Summary table
summary_data = []
for model_name in MODELS:
    if model_name in all_results:
        amh_to_eng = all_results[model_name]["amh_to_eng"]
        eng_to_amh = all_results[model_name]["eng_to_amh"]
        
        summary_data.append({
            "Model": model_name.replace("facebook/nllb-200-", ""),
            "Amh→Eng BLEU": f"{amh_to_eng['metrics']['bleu']:.2f}",
            "Amh→Eng chrF": f"{amh_to_eng['metrics']['chrf']:.2f}",
            "Eng→Amh BLEU": f"{eng_to_amh['metrics']['bleu']:.2f}",
            "Eng→Amh chrF": f"{eng_to_amh['metrics']['chrf']:.2f}",
            "Avg Time/Sent": f"{amh_to_eng['performance']['avg_time_per_sentence']:.2f}s"
        })

summary_df = pd.DataFrame(summary_data)
print(summary_df.to_string(index=False))
print("\n" + "="*80 + "\n")


In [None]:
# Detailed results formatted for Google Docs
def format_results_for_docs(all_results, test_pairs):
    """Format results in a way that's easy to copy to Google Docs"""
    
    output = []
    output.append("="*80)
    output.append("NLLB AMHARIC-ENGLISH TRANSLATION EVALUATION RESULTS")
    output.append("="*80)
    output.append("")
    output.append(f"Test Dataset: {len(test_pairs)} sentence pairs")
    output.append(f"Categories: {', '.join(set(p['category'] for p in test_pairs))}")
    output.append("")
    
    # Summary table
    output.append("SUMMARY METRICS")
    output.append("-"*80)
    output.append(f"{'Model':<30} {'Amh→Eng BLEU':<15} {'Amh→Eng chrF':<15} {'Eng→Amh BLEU':<15} {'Eng→Amh chrF':<15} {'Avg Time':<10}")
    output.append("-"*80)
    
    for model_name in MODELS:
        if model_name in all_results:
            amh_to_eng = all_results[model_name]["amh_to_eng"]
            eng_to_amh = all_results[model_name]["eng_to_amh"]
            
            model_short = model_name.replace("facebook/nllb-200-", "")
            output.append(
                f"{model_short:<30} "
                f"{amh_to_eng['metrics']['bleu']:>14.2f} "
                f"{amh_to_eng['metrics']['chrf']:>14.2f} "
                f"{eng_to_amh['metrics']['bleu']:>14.2f} "
                f"{eng_to_amh['metrics']['chrf']:>14.2f} "
                f"{amh_to_eng['performance']['avg_time_per_sentence']:>9.2f}s"
            )
    
    output.append("")
    output.append("="*80)
    output.append("")
    
    # Detailed results for each model
    for model_name in MODELS:
        if model_name not in all_results:
            continue
            
        model_short = model_name.replace("facebook/nllb-200-", "")
        output.append(f"\n{'='*80}")
        output.append(f"MODEL: {model_short}")
        output.append(f"{'='*80}\n")
        
        # Amharic to English
        amh_to_eng = all_results[model_name]["amh_to_eng"]
        output.append(f"AMHARIC → ENGLISH")
        output.append(f"BLEU Score: {amh_to_eng['metrics']['bleu']:.2f}")
        output.append(f"chrF Score: {amh_to_eng['metrics']['chrf']:.2f}")
        output.append(f"Average Time per Sentence: {amh_to_eng['performance']['avg_time_per_sentence']:.2f}s")
        output.append("")
        
        # Sample translations by category
        for category in ["daily_conversation", "news_social", "complex", "domain_specific", "paragraph"]:
            category_translations = [
                t for t in amh_to_eng['translations'] 
                if next(p for p in test_pairs if p['id'] == t['id'])['category'] == category
            ]
            
            if category_translations:
                output.append(f"\n{category.upper().replace('_', ' ')} ({len(category_translations)} examples):")
                output.append("-"*80)
                for t in category_translations[:5]:  # Show first 5
                    output.append(f"\nExample {t['id']}:")
                    output.append(f"  Source (Amharic): {t['source']}")
                    output.append(f"  Reference (English): {t['reference']}")
                    output.append(f"  Translation: {t['translation']}")
                    output.append(f"  Time: {t['time']:.2f}s")
                if len(category_translations) > 5:
                    output.append(f"  ... and {len(category_translations) - 5} more examples")
        
        output.append("\n" + "-"*80)
        
        # English to Amharic
        eng_to_amh = all_results[model_name]["eng_to_amh"]
        output.append(f"\nENGLISH → AMHARIC")
        output.append(f"BLEU Score: {eng_to_amh['metrics']['bleu']:.2f}")
        output.append(f"chrF Score: {eng_to_amh['metrics']['chrf']:.2f}")
        output.append(f"Average Time per Sentence: {eng_to_amh['performance']['avg_time_per_sentence']:.2f}s")
        output.append("")
        
        # Sample translations
        for category in ["daily_conversation", "news_social", "complex", "domain_specific", "paragraph"]:
            category_translations = [
                t for t in eng_to_amh['translations'] 
                if next(p for p in test_pairs if p['id'] == t['id'])['category'] == category
            ]
            
            if category_translations:
                output.append(f"\n{category.upper().replace('_', ' ')} ({len(category_translations)} examples):")
                output.append("-"*80)
                for t in category_translations[:5]:  # Show first 5
                    output.append(f"\nExample {t['id']}:")
                    output.append(f"  Source (English): {t['source']}")
                    output.append(f"  Reference (Amharic): {t['reference']}")
                    output.append(f"  Translation: {t['translation']}")
                    output.append(f"  Time: {t['time']:.2f}s")
                if len(category_translations) > 5:
                    output.append(f"  ... and {len(category_translations) - 5} more examples")
        
        output.append("\n" + "="*80 + "\n")
    
    return "\n".join(output)

# Generate formatted output
formatted_output = format_results_for_docs(all_results, test_pairs)
print(formatted_output)


In [None]:
# Save results to a text file for easy copying
with open("evaluation_results.txt", "w", encoding="utf-8") as f:
    f.write(formatted_output)

print("✓ Results saved to 'evaluation_results.txt'")
print("You can download this file and copy its contents to Google Docs.")


In [None]:
# Create a comparison table showing all models side by side
comparison_data = []

for i, pair in enumerate(test_pairs[:10], 1):  # Show first 10 examples
    row = {
        "ID": i,
        "Category": pair["category"],
        "Source (Amharic)": pair["amharic"][:50] + "..." if len(pair["amharic"]) > 50 else pair["amharic"],
        "Reference (English)": pair["english"]
    }
    
    # Add translations from each model
    for model_name in MODELS:
        if model_name in all_results:
            model_short = model_name.replace("facebook/nllb-200-", "")
            amh_to_eng = all_results[model_name]["amh_to_eng"]
            translation = next(
                (t["translation"] for t in amh_to_eng["translations"] if t["id"] == pair["id"]),
                "N/A"
            )
            row[f"{model_short} Translation"] = translation[:100] + "..." if len(translation) > 100 else translation
    
    comparison_data.append(row)

comparison_df = pd.DataFrame(comparison_data)
print("\n" + "="*80)
print("SIDE-BY-SIDE COMPARISON (First 10 Examples)")
print("="*80)
print(comparison_df.to_string(index=False))


## How to Use This Notebook on Google Colab

### Step 1: Open the Notebook
1. Go to [Google Colab](https://colab.research.google.com/)
2. Click "File" → "Open Notebook"
3. Select "GitHub" tab
4. Enter your GitHub repository URL: `https://github.com/YOUR_USERNAME/YOUR_REPO`
5. Select `amh_translation.ipynb`

### Step 2: Enable GPU Runtime
1. Click "Runtime" → "Change runtime type"
2. Select "GPU" from the Hardware accelerator dropdown
3. Click "Save"

### Step 3: Run All Cells
1. Click "Runtime" → "Run all" (or press Ctrl+F9)
2. The notebook will:
   - Install required packages
   - Load and evaluate all three models sequentially
   - Generate formatted results

### Step 4: Copy Results to Google Docs
1. After execution completes, scroll to the results section
2. Select the formatted text output
3. Copy (Ctrl+C)
4. Paste into your Google Doc

**Note:** Model download may take 10-30 minutes on first run. Subsequent runs will be faster as models are cached.

### Expected Runtime:
- **600M model**: ~5-10 minutes
- **1.3B model**: ~10-15 minutes  
- **3.3B model**: ~15-20 minutes
- **Total**: ~30-45 minutes for complete evaluation

### Memory Requirements:
- Colab free tier provides ~15GB GPU memory (sufficient for all models)
- If you run out of memory, models are evaluated sequentially and memory is cleared between models


<a href="https://colab.research.google.com/github/pyyas-star/colab/blob/main/amh_translation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>