# Week 37: Language Modeling for Multilingual Reading Comprehension


## Setup Instructions

### Virtual Environment Setup

This notebook uses a dedicated Python virtual environment. To set it up:

```powershell
# Navigate to the W37 folder
cd "Final Delivery/W37"

# Create virtual environment
python -m venv venv

# Activate the virtual environment (Windows PowerShell)
.\venv\Scripts\Activate.ps1

# Upgrade pip
python -m pip install --upgrade pip

# Install all required packages
pip install -r requirements.txt

# Register the kernel with Jupyter
python -m ipykernel install --user --name w37_nlp --display-name "Python (W37 NLP)"
```

**Important:** After setup, select the **"Python (W37 NLP)"** kernel for this notebook using the kernel picker in VS Code.

In [1]:
# Environment detection for compatibility with Google Colab and local environments
import sys

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("Running in Google Colab")
    from google.colab import drive
    drive.mount('/content/drive')
else:
    print("Running in local environment with dedicated virtual environment")
    print("Virtual environment: Final Delivery/W37/venv")

Running in local environment with dedicated virtual environment
Virtual environment: Final Delivery/W37/venv


### Library Imports

All required packages are pre-installed in the virtual environment.

In [2]:
# Core Python libraries
import os
import math
import pickle
from pathlib import Path
from collections import Counter, defaultdict
from typing import List, Tuple, Dict, Any

# Data manipulation and analysis
import pandas as pd
import numpy as np

# Machine Learning and Deep Learning
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# PyTorch device setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"PyTorch device: {device}")

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print("✓ All libraries imported successfully")

PyTorch device: cpu
✓ All libraries imported successfully


### Dataset Configuration & Loading

In [8]:
# Dataset configuration
LANGUAGES = ["ar", "ko", "te"]
LANGUAGE_NAMES = {"ar": "Arabic", "ko": "Korean", "te": "Telugu"}

# Dataset paths (auto-detect Colab vs local)
if IN_COLAB:
    BASE_DIR = Path("/content/drive/MyDrive/Colab_Notebooks/NLP/tydi_xor_rc")
else:
    # Local path - dataset is in parent directory
    BASE_DIR = Path("../tydi_xor_rc")

TRAIN_PATH = BASE_DIR / "train.parquet"
VAL_PATH = BASE_DIR / "validation.parquet"

print(f"Dataset directory: {BASE_DIR.resolve()}")

# Load datasets
if TRAIN_PATH.exists() and VAL_PATH.exists():
    df_train = pd.read_parquet(TRAIN_PATH)
    df_val = pd.read_parquet(VAL_PATH)

    # Filter for target languages (Arabic, Korean, Telugu)
    df_train = df_train[df_train["lang"].isin(LANGUAGES)].copy()
    df_val = df_val[df_val["lang"].isin(LANGUAGES)].copy()

    print(f"Training examples: {len(df_train):,}")
    print(f"Validation examples: {len(df_val):,}")
    print("✓ Dataset loaded successfully")
else:
    print("⚠ Dataset files not found. Please ensure the data is downloaded.")
    print(f"  Expected location: {BASE_DIR.resolve()}")
    df_train = None
    df_val = None

Dataset directory: C:\Users\mateu\Documents\UNIVERSIDADE\Mestrado\NLP\NLP_Assignments_25-26\Final Delivery\tydi_xor_rc
Training examples: 6,335
Validation examples: 1,155
✓ Dataset loaded successfully


### BPEmb Model Loading

Loading BPEmb (Byte-Pair Encoding) models for multilingual subword tokenization.

**Note:** BPEmb requires compatible versions of numpy, scipy, and gensim. The virtual environment has these dependencies properly configured (numpy==1.26.4, scipy==1.12.0, gensim==4.3.3).

In [4]:
from bpemb import BPEmb

# BPEmb configuration
BPE_VOCAB_SIZE = 50000
BPE_EMBEDDING_DIM = 100

# Load BPEmb models for all target languages
print("Loading BPEmb models...")
bpemb_models = {}

for lang_code in LANGUAGES + ['en']:
    lang_name = LANGUAGE_NAMES.get(lang_code, 'English')
    print(f"  Loading {lang_name} ({lang_code})...")

    try:
        bpemb_models[lang_code] = BPEmb(lang=lang_code, vs=BPE_VOCAB_SIZE, dim=BPE_EMBEDDING_DIM)
        print(f"    ✓ {lang_name} loaded successfully")
    except Exception as e:
        print(f"    ✗ Unable to load {lang_name}. Error: {e}")

print(f"\n✓ Setup complete - {len(bpemb_models)}/4 models loaded")
print("✓ Ready for Week 37 tasks")

Loading BPEmb models...
  Loading Arabic (ar)...
downloading https://nlp.h-its.org/bpemb/ar/ar.wiki.bpe.vs50000.d100.w2v.bin.tar.gz


100%|██████████| 19006950/19006950 [00:01<00:00, 12927153.19B/s]



    ✓ Arabic loaded successfully
  Loading Korean (ko)...
downloading https://nlp.h-its.org/bpemb/ko/ko.wiki.bpe.vs50000.model


100%|██████████| 1145092/1145092 [00:00<00:00, 8815344.15B/s]



downloading https://nlp.h-its.org/bpemb/ko/ko.wiki.bpe.vs50000.d100.w2v.bin.tar.gz


100%|██████████| 18986830/18986830 [00:01<00:00, 12881787.48B/s]



    ✓ Korean loaded successfully
  Loading Telugu (te)...
downloading https://nlp.h-its.org/bpemb/te/te.wiki.bpe.vs50000.model


100%|██████████| 1616305/1616305 [00:00<00:00, 11513459.33B/s]



downloading https://nlp.h-its.org/bpemb/te/te.wiki.bpe.vs50000.d100.w2v.bin.tar.gz


100%|██████████| 19080935/19080935 [00:01<00:00, 17083933.69B/s]



    ✓ Telugu loaded successfully
  Loading English (en)...
downloading https://nlp.h-its.org/bpemb/en/en.wiki.bpe.vs50000.model


100%|██████████| 1100587/1100587 [00:00<00:00, 9431299.01B/s]



downloading https://nlp.h-its.org/bpemb/en/en.wiki.bpe.vs50000.d100.w2v.bin.tar.gz


100%|██████████| 18972246/18972246 [00:01<00:00, 14824221.28B/s]



    ✓ English loaded successfully

✓ Setup complete - 4/4 models loaded
✓ Ready for Week 37 tasks


---

## Tasks

### Task (a): Bigram Language Model with Laplace Smoothing

Implement statistical n-gram models using bigram probabilities with Laplace smoothing for unseen word combinations.

In [5]:
def tokenize_text(text, bpemb_model):
    """Tokenize text using BPEmb model."""
    try:
        subwords = bpemb_model.encode(text.lower())
        return ['<s>'] + subwords + ['</s>']
    except:
        return ['<s>', '<unk>', '</s>']

class BigramLanguageModel:
    """Bigram language model with Laplace smoothing."""

    def __init__(self):
        self.unigram_counts = Counter()
        self.bigram_counts = defaultdict(lambda: defaultdict(int))
        self.vocab = set()
        self.vocab_size = 0

    def train(self, texts, bpemb_model):
        """Train model on text data."""
        print(f"Training bigram model on {len(texts)} texts...")

        for text in texts:
            tokens = tokenize_text(text, bpemb_model)

            # Count unigrams and bigrams
            for token in tokens:
                self.vocab.add(token)
                self.unigram_counts[token] += 1

            for i in range(len(tokens) - 1):
                w1, w2 = tokens[i], tokens[i + 1]
                self.bigram_counts[w1][w2] += 1

        self.vocab_size = len(self.vocab)
        print(f"  Vocabulary size: {self.vocab_size:,}")

    def get_probability(self, w1, w2):
        """Get P(w2|w1) with Laplace smoothing."""
        numerator = self.bigram_counts[w1][w2] + 1
        denominator = self.unigram_counts[w1] + self.vocab_size
        return numerator / denominator if denominator > 0 else 1.0 / self.vocab_size

    def calculate_perplexity(self, test_texts, bpemb_model):
        """Calculate perplexity on test data."""
        total_log_prob = 0.0
        total_bigrams = 0

        for text in test_texts:
            tokens = tokenize_text(text, bpemb_model)

            for i in range(len(tokens) - 1):
                w1, w2 = tokens[i], tokens[i + 1]
                prob = self.get_probability(w1, w2)
                total_log_prob += math.log(prob + 1e-10)
                total_bigrams += 1

        if total_bigrams == 0:
            return float('inf')

        avg_log_prob = total_log_prob / total_bigrams
        return math.exp(-avg_log_prob)

# Train bigram models for all languages
print("BIGRAM LANGUAGE MODELS")
print("=" * 50)

bigram_models = {}
bigram_results = {}

for lang_code in LANGUAGES + ['en']:
    lang_name = LANGUAGE_NAMES.get(lang_code, 'English')
    print(f"\n{lang_name} ({lang_code}):")

    # Get data
    if lang_code == 'en':
        train_texts = df_train.sample(n=min(2000, len(df_train)), random_state=42)['context'].tolist()
        val_texts = df_val.sample(n=min(500, len(df_val)), random_state=42)['context'].tolist()
    else:
        train_texts = df_train[df_train['lang'] == lang_code]['question'].tolist()[:2000]
        val_texts = df_val[df_val['lang'] == lang_code]['question'].tolist()

    # Train model
    model = BigramLanguageModel()
    model.train(train_texts, bpemb_models[lang_code])

    # Evaluate
    val_perplexity = model.calculate_perplexity(val_texts[:100], bpemb_models[lang_code])

    bigram_models[lang_code] = model
    bigram_results[lang_code] = val_perplexity

    print(f"  Validation perplexity: {val_perplexity:.2f}")

# Summary
print(f"\nBIGRAM RESULTS SUMMARY:")
for lang_code in LANGUAGES + ['en']:
    lang_name = LANGUAGE_NAMES.get(lang_code, 'English')
    print(f"{lang_name}: {bigram_results[lang_code]:.2f}")

BIGRAM LANGUAGE MODELS

Arabic (ar):
Training bigram model on 2000 texts...
  Vocabulary size: 5,027
  Validation perplexity: 760.67

Korean (ko):
Training bigram model on 2000 texts...
  Vocabulary size: 3,904
  Validation perplexity: 348.22

Telugu (te):
Training bigram model on 1355 texts...
  Vocabulary size: 2,682
  Validation perplexity: 596.78

English (en):
Training bigram model on 2000 texts...
  Vocabulary size: 3,904
  Validation perplexity: 348.22

Telugu (te):
Training bigram model on 1355 texts...
  Vocabulary size: 2,682
  Validation perplexity: 596.78

English (en):
Training bigram model on 2000 texts...
  Vocabulary size: 20,782
  Validation perplexity: 4594.44

BIGRAM RESULTS SUMMARY:
Arabic: 760.67
Korean: 348.22
Telugu: 596.78
English: 4594.44
  Vocabulary size: 20,782
  Validation perplexity: 4594.44

BIGRAM RESULTS SUMMARY:
Arabic: 760.67
Korean: 348.22
Telugu: 596.78
English: 4594.44


### Task (b): Trigram Language Model with Kneser-Ney Smoothing


In [6]:
class TrigramKneserNeyModel:
    """Trigram model with Kneser-Ney smoothing."""

    def __init__(self, discount=0.75):
        self.discount = discount
        self.unigram_counts = Counter()
        self.bigram_counts = defaultdict(lambda: defaultdict(int))
        self.trigram_counts = defaultdict(lambda: defaultdict(int))
        self.unigram_continuations = defaultdict(int)
        self.vocab = set()
        self.vocab_size = 0

    def train(self, texts, bpemb_model):
        """Train trigram model."""
        print(f"Training Kneser-Ney trigram model on {len(texts)} texts...")

        for text in texts:
            tokens = tokenize_text(text, bpemb_model)
            if len(tokens) < 3:
                continue

            # Count n-grams
            for i, token in enumerate(tokens):
                self.vocab.add(token)
                self.unigram_counts[token] += 1

                if i > 0:
                    w1, w2 = tokens[i-1], tokens[i]
                    self.bigram_counts[w1][w2] += 1

                if i > 1:
                    w1, w2, w3 = tokens[i-2], tokens[i-1], tokens[i]
                    self.trigram_counts[(w1, w2)][w3] += 1

        # Calculate continuation counts
        for w1 in self.bigram_counts:
            for w2 in self.bigram_counts[w1]:
                self.unigram_continuations[w2] += 1

        self.vocab_size = len(self.vocab)
        print(f"  Vocabulary size: {self.vocab_size:,}")

    def get_unigram_prob_kn(self, w):
        """Kneser-Ney unigram probability."""
        total_continuations = sum(self.unigram_continuations.values())
        if total_continuations == 0:
            return 1.0 / self.vocab_size
        return self.unigram_continuations[w] / total_continuations

    def get_bigram_prob_kn(self, w1, w2):
        """Kneser-Ney bigram probability."""
        bigram_count = self.bigram_counts[w1][w2]
        unigram_count = self.unigram_counts[w1]

        if unigram_count == 0:
            return self.get_unigram_prob_kn(w2)

        discounted_count = max(bigram_count - self.discount, 0)
        prob_bigram = discounted_count / unigram_count

        unique_continuations = len(self.bigram_counts[w1])
        lambda_weight = (self.discount * unique_continuations) / unigram_count

        return prob_bigram + lambda_weight * self.get_unigram_prob_kn(w2)

    def get_trigram_prob_kn(self, w1, w2, w3):
        """Kneser-Ney trigram probability."""
        history = (w1, w2)
        trigram_count = self.trigram_counts[history][w3]
        bigram_count = self.bigram_counts[w1][w2]

        if bigram_count == 0:
            return self.get_bigram_prob_kn(w2, w3)

        discounted_count = max(trigram_count - self.discount, 0)
        prob_trigram = discounted_count / bigram_count

        unique_continuations = len(self.trigram_counts[history])
        lambda_weight = (self.discount * unique_continuations) / bigram_count

        return prob_trigram + lambda_weight * self.get_bigram_prob_kn(w2, w3)

    def calculate_perplexity(self, test_texts, bpemb_model):
        """Calculate perplexity using trigrams."""
        total_log_prob = 0.0
        total_trigrams = 0

        for text in test_texts:
            tokens = tokenize_text(text, bpemb_model)
            if len(tokens) < 3:
                continue

            for i in range(2, len(tokens)):
                w1, w2, w3 = tokens[i-2], tokens[i-1], tokens[i]
                prob = self.get_trigram_prob_kn(w1, w2, w3)
                total_log_prob += math.log(prob + 1e-10)
                total_trigrams += 1

        if total_trigrams == 0:
            return float('inf')

        avg_log_prob = total_log_prob / total_trigrams
        return math.exp(-avg_log_prob)

# Train trigram models
print("\nTRIGRAM LANGUAGE MODELS (KNESER-NEY)")
print("=" * 50)

trigram_models = {}
trigram_results = {}

for lang_code in LANGUAGES + ['en']:
    lang_name = LANGUAGE_NAMES.get(lang_code, 'English')
    print(f"\n{lang_name} ({lang_code}):")

    # Get data (same as bigram)
    if lang_code == 'en':
        train_texts = df_train.sample(n=min(2000, len(df_train)), random_state=42)['context'].tolist()
        val_texts = df_val.sample(n=min(500, len(df_val)), random_state=42)['context'].tolist()
    else:
        train_texts = df_train[df_train['lang'] == lang_code]['question'].tolist()[:2000]
        val_texts = df_val[df_val['lang'] == lang_code]['question'].tolist()

    # Train model
    model = TrigramKneserNeyModel()
    model.train(train_texts, bpemb_models[lang_code])

    # Evaluate
    val_perplexity = model.calculate_perplexity(val_texts[:100], bpemb_models[lang_code])

    trigram_models[lang_code] = model
    trigram_results[lang_code] = val_perplexity

    print(f"  Validation perplexity: {val_perplexity:.2f}")

# Compare with bigram
print(f"\nBIGRAM vs TRIGRAM COMPARISON:")
for lang_code in LANGUAGES + ['en']:
    lang_name = LANGUAGE_NAMES.get(lang_code, 'English')
    bigram_ppl = bigram_results[lang_code]
    trigram_ppl = trigram_results[lang_code]
    improvement = ((bigram_ppl - trigram_ppl) / bigram_ppl) * 100
    direction = "↓" if improvement > 0 else "↑"
    print(f"{lang_name}: {bigram_ppl:.2f} → {trigram_ppl:.2f} ({direction}{abs(improvement):.1f}%)")


TRIGRAM LANGUAGE MODELS (KNESER-NEY)

Arabic (ar):
Training Kneser-Ney trigram model on 2000 texts...
  Vocabulary size: 5,027
  Validation perplexity: 1285.01

Korean (ko):
Training Kneser-Ney trigram model on 2000 texts...
  Vocabulary size: 3,904
  Vocabulary size: 3,904
  Validation perplexity: 117.65

Telugu (te):
Training Kneser-Ney trigram model on 1355 texts...
  Vocabulary size: 2,682
  Validation perplexity: 399.59

English (en):
Training Kneser-Ney trigram model on 2000 texts...
  Validation perplexity: 117.65

Telugu (te):
Training Kneser-Ney trigram model on 1355 texts...
  Vocabulary size: 2,682
  Validation perplexity: 399.59

English (en):
Training Kneser-Ney trigram model on 2000 texts...
  Vocabulary size: 20,782
  Vocabulary size: 20,782
  Validation perplexity: 643.43

BIGRAM vs TRIGRAM COMPARISON:
Arabic: 760.67 → 1285.01 (↑68.9%)
Korean: 348.22 → 117.65 (↓66.2%)
Telugu: 596.78 → 399.59 (↓33.0%)
English: 4594.44 → 643.43 (↓86.0%)
  Validation perplexity: 643.43

B

### Task (c): Neural Language Model (LSTM-based)


In [7]:
class SimpleLanguageModelDataset(Dataset):
    """Simple dataset for language modeling."""

    def __init__(self, texts, bpemb_model, vocab_to_idx, max_length=100):
        self.sequences = []

        for text in texts:
            tokens = tokenize_text(text, bpemb_model)
            if len(tokens) < 2:
                continue

            # Truncate if too long
            if len(tokens) > max_length:
                tokens = tokens[:max_length]

            # Convert to indices
            indices = [vocab_to_idx.get(token, vocab_to_idx['<unk>']) for token in tokens]

            if len(indices) >= 2:
                self.sequences.append({
                    'input_ids': indices[:-1],
                    'target_ids': indices[1:],
                    'length': len(indices) - 1
                })

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, idx):
        return self.sequences[idx]

def collate_batch(batch):
    """Collate function for batching."""
    max_len = max(item['length'] for item in batch)

    input_ids = []
    target_ids = []
    lengths = []

    for item in batch:
        padded_input = item['input_ids'] + [0] * (max_len - len(item['input_ids']))
        padded_target = item['target_ids'] + [0] * (max_len - len(item['target_ids']))

        input_ids.append(padded_input)
        target_ids.append(padded_target)
        lengths.append(item['length'])

    return {
        'input_ids': torch.tensor(input_ids, dtype=torch.long),
        'target_ids': torch.tensor(target_ids, dtype=torch.long),
        'lengths': torch.tensor(lengths, dtype=torch.long)
    }

class SimpleLSTMLanguageModel(nn.Module):
    """Simple LSTM language model."""

    def __init__(self, vocab_size, embedding_dim, hidden_dim, num_layers=2, dropout=0.3):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers, dropout=dropout, batch_first=True)
        self.dropout = nn.Dropout(dropout)
        self.output_projection = nn.Linear(hidden_dim, vocab_size)

    def forward(self, input_ids, lengths=None):
        embedded = self.embedding(input_ids)
        lstm_output, _ = self.lstm(embedded)
        output = self.dropout(lstm_output)
        logits = self.output_projection(output)
        return logits

def build_vocab(texts, bpemb_model, min_freq=2):
    """Build vocabulary from texts."""
    token_counts = Counter()

    for text in texts:
        tokens = tokenize_text(text, bpemb_model)
        token_counts.update(tokens)

    vocab_to_idx = {'<pad>': 0, '</s>': 1, '<s>': 2, '<unk>': 3}
    idx = 4

    for token, count in token_counts.most_common():
        if count >= min_freq and token not in vocab_to_idx:
            vocab_to_idx[token] = idx
            idx += 1

    return vocab_to_idx

def train_neural_model(train_texts, val_texts, bpemb_model, lang_name, epochs=5):
    """Train neural language model."""
    print(f"Training neural model for {lang_name}...")

    # Build vocabulary
    vocab_to_idx = build_vocab(train_texts, bpemb_model)
    vocab_size = len(vocab_to_idx)
    print(f"  Vocabulary size: {vocab_size:,}")

    # Create datasets
    train_dataset = SimpleLanguageModelDataset(train_texts[:1000], bpemb_model, vocab_to_idx)
    val_dataset = SimpleLanguageModelDataset(val_texts[:200], bpemb_model, vocab_to_idx)

    train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True, collate_fn=collate_batch)
    val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False, collate_fn=collate_batch)

    # Initialize model
    model = SimpleLSTMLanguageModel(vocab_size, BPE_EMBEDDING_DIM, 256).to(device)
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    criterion = nn.CrossEntropyLoss(ignore_index=0)

    # Training loop
    best_val_loss = float('inf')
    for epoch in range(epochs):
        # Training
        model.train()
        total_loss = 0
        num_batches = 0

        for batch in train_loader:
            input_ids = batch['input_ids'].to(device)
            target_ids = batch['target_ids'].to(device)

            optimizer.zero_grad()
            logits = model(input_ids)
            loss = criterion(logits.view(-1, vocab_size), target_ids.view(-1))
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            optimizer.step()

            total_loss += loss.item()
            num_batches += 1

        # Validation
        model.eval()
        val_loss = 0
        val_batches = 0

        with torch.no_grad():
            for batch in val_loader:
                input_ids = batch['input_ids'].to(device)
                target_ids = batch['target_ids'].to(device)

                logits = model(input_ids)
                loss = criterion(logits.view(-1, vocab_size), target_ids.view(-1))
                val_loss += loss.item()
                val_batches += 1

        avg_val_loss = val_loss / val_batches if val_batches > 0 else float('inf')
        val_perplexity = math.exp(avg_val_loss)

        print(f"    Epoch {epoch+1}: Train Loss = {total_loss/num_batches:.3f}, Val Perplexity = {val_perplexity:.2f}")

        if avg_val_loss < best_val_loss:
            best_val_loss = avg_val_loss
        elif epoch > 2:  # Early stopping
            break

    return model, math.exp(best_val_loss)

# Train neural models
print("\nNEURAL LANGUAGE MODELS (LSTM)")
print("=" * 50)

neural_models = {}
neural_results = {}

for lang_code in LANGUAGES + ['en']:
    lang_name = LANGUAGE_NAMES.get(lang_code, 'English')
    print(f"\n{lang_name} ({lang_code}):")

    try:
        # Get data (same as previous models)
        if lang_code == 'en':
            train_texts = df_train.sample(n=min(1000, len(df_train)), random_state=42)['context'].tolist()
            val_texts = df_val.sample(n=min(200, len(df_val)), random_state=42)['context'].tolist()
        else:
            train_texts = df_train[df_train['lang'] == lang_code]['question'].tolist()[:1000]
            val_texts = df_val[df_val['lang'] == lang_code]['question'].tolist()[:200]

        # Train model
        model, best_perplexity = train_neural_model(train_texts, val_texts, bpemb_models[lang_code], lang_name)

        neural_models[lang_code] = model
        neural_results[lang_code] = best_perplexity

        print(f"  Best validation perplexity: {best_perplexity:.2f}")

    except Exception as e:
        print(f"  Training failed: {e}")
        neural_results[lang_code] = float('inf')

# Final comparison
print(f"\nFINAL MODEL COMPARISON:")
print("=" * 50)
for lang_code in LANGUAGES + ['en']:
    lang_name = LANGUAGE_NAMES.get(lang_code, 'English')
    bigram_ppl = bigram_results[lang_code]
    trigram_ppl = trigram_results[lang_code]
    neural_ppl = neural_results[lang_code]

    print(f"\n{lang_name}:")
    print(f"  Bigram:  {bigram_ppl:.2f}")
    print(f"  Trigram: {trigram_ppl:.2f}")
    print(f"  Neural:  {neural_ppl:.2f}")


NEURAL LANGUAGE MODELS (LSTM)

Arabic (ar):
Training neural model for Arabic...
  Vocabulary size: 958
    Epoch 1: Train Loss = 4.980, Val Perplexity = 25.99
    Epoch 1: Train Loss = 4.980, Val Perplexity = 25.99
    Epoch 2: Train Loss = 3.953, Val Perplexity = 18.47
    Epoch 2: Train Loss = 3.953, Val Perplexity = 18.47
    Epoch 3: Train Loss = 3.749, Val Perplexity = 17.00
    Epoch 3: Train Loss = 3.749, Val Perplexity = 17.00
    Epoch 4: Train Loss = 3.658, Val Perplexity = 16.39
    Epoch 4: Train Loss = 3.658, Val Perplexity = 16.39
    Epoch 5: Train Loss = 3.565, Val Perplexity = 16.08
  Best validation perplexity: 16.08

Korean (ko):
Training neural model for Korean...
  Vocabulary size: 825
    Epoch 5: Train Loss = 3.565, Val Perplexity = 16.08
  Best validation perplexity: 16.08

Korean (ko):
Training neural model for Korean...
  Vocabulary size: 825
    Epoch 1: Train Loss = 4.676, Val Perplexity = 27.56
    Epoch 1: Train Loss = 4.676, Val Perplexity = 27.56
    Ep