# Dictionary-Based Approach Implementation
## Roman Urdu to Urdu Script Conversion Project

This notebook covers Step 3 of our methodology:
- Dictionary-Based Model Implementation
- Model Testing and Evaluation
- Performance Analysis

### Objectives:
1. Load and test the dictionary-based conversion model
2. Evaluate performance on test data
3. Analyze strengths and limitations
4. Implement improvements and optimizations
5. Compare different dictionary strategies

In [None]:
# Import required libraries
import sys
import os
from pathlib import Path
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter, defaultdict
import warnings
warnings.filterwarnings('ignore')

# Add project root to path
project_root = Path('../')
sys.path.append(str(project_root))

from models.dictionary_model import DictionaryModel
from utils.data_loader import DataLoader
from utils.preprocessing import RomanUrduPreprocessor
from evaluation.metrics import (
    calculate_bleu_score, calculate_rouge_l, calculate_word_accuracy,
    calculate_sentence_accuracy, calculate_character_accuracy, calculate_edit_distance
)

# Set up plotting
plt.style.use('default')
plt.rcParams['figure.figsize'] = (12, 8)
sns.set_palette("husl")

print("Libraries imported successfully!")

## 1. Model Initialization and Setup

In [None]:
# Initialize components
data_loader = DataLoader("../data")
preprocessor = RomanUrduPreprocessor()

# Load data
dictionary = data_loader.load_dictionary()
test_data = data_loader.load_test_data()
sample_data = data_loader.load_sample_data()

# Initialize dictionary model
dict_model = DictionaryModel("../data/roman_urdu_dictionary.json")

print(f"Dictionary loaded with {len(dictionary)} entries")
print(f"Test data: {len(test_data)} sentences")
print(f"Sample data: {len(sample_data)} sentences")
print(f"Dictionary model initialized successfully")

## 2. Basic Dictionary Conversion Testing

In [None]:
# Test individual word conversions
test_words = [
    "main", "aap", "kaise", "hain", "ghar", "ja", "raha", "hun",
    "kitab", "parh", "kya", "kar", "rahe", "ho", "time", "school"
]

print("Individual Word Conversions:")
print("=" * 40)
for word in test_words:
    converted = dict_model.convert_word(word)
    print(f"{word:12} -> {converted}")

In [None]:
# Test sentence conversions
test_sentences = [
    "main acha hun",
    "aap kaise hain",
    "wo ghar ja raha hai",
    "main kitab parh raha hun",
    "aap kya kar rahe hain"
]

print("\nSentence Conversions:")
print("=" * 50)
for sentence in test_sentences:
    converted = dict_model.convert_text(sentence)
    print(f"Roman: {sentence}")
    print(f"Urdu:  {converted}")
    print("-" * 30)

## 3. Performance Evaluation on Test Data

In [None]:
# Evaluate on test data
predictions = []
references = []
conversion_details = []

print("Evaluating Dictionary Model on Test Data:")
print("=" * 50)

for i, item in enumerate(test_data):
    roman_text = item['roman']
    reference_urdu = item['urdu']
    
    # Convert using dictionary model
    predicted_urdu = dict_model.convert_text(roman_text)
    
    predictions.append(predicted_urdu)
    references.append(reference_urdu)
    
    conversion_details.append({
        'index': i,
        'roman': roman_text,
        'reference': reference_urdu,
        'prediction': predicted_urdu,
        'english': item.get('english', '')
    })
    
    if i < 5:  # Show first 5 examples
        print(f"Example {i+1}:")
        print(f"  Roman:     {roman_text}")
        print(f"  Reference: {reference_urdu}")
        print(f"  Predicted: {predicted_urdu}")
        print(f"  English:   {item.get('english', 'N/A')}")
        print()

print(f"\nTotal conversions: {len(predictions)}")

## 4. Comprehensive Metrics Calculation

In [None]:
# Calculate all evaluation metrics
metrics_results = {}

# BLEU Score
bleu_scores = []
for pred, ref in zip(predictions, references):
    bleu = calculate_bleu_score(pred, ref)
    bleu_scores.append(bleu)
metrics_results['BLEU'] = np.mean(bleu_scores)

# ROUGE-L Score
rouge_scores = []
for pred, ref in zip(predictions, references):
    rouge = calculate_rouge_l(pred, ref)
    rouge_scores.append(rouge)
metrics_results['ROUGE-L'] = np.mean(rouge_scores)

# Word Accuracy
word_accuracies = []
for pred, ref in zip(predictions, references):
    acc = calculate_word_accuracy(pred, ref)
    word_accuracies.append(acc)
metrics_results['Word_Accuracy'] = np.mean(word_accuracies)

# Sentence Accuracy
sentence_accuracy = calculate_sentence_accuracy(predictions, references)
metrics_results['Sentence_Accuracy'] = sentence_accuracy

# Character Accuracy
char_accuracies = []
for pred, ref in zip(predictions, references):
    acc = calculate_character_accuracy(pred, ref)
    char_accuracies.append(acc)
metrics_results['Character_Accuracy'] = np.mean(char_accuracies)

# Edit Distance
edit_distances = []
for pred, ref in zip(predictions, references):
    dist = calculate_edit_distance(pred, ref)
    edit_distances.append(dist)
metrics_results['Avg_Edit_Distance'] = np.mean(edit_distances)

print("Dictionary Model Performance Metrics:")
print("=" * 40)
for metric, value in metrics_results.items():
    if 'Distance' in metric:
        print(f"{metric:20}: {value:.3f}")
    else:
        print(f"{metric:20}: {value:.3f} ({value*100:.1f}%)")

## 5. Detailed Error Analysis

In [None]:
# Analyze conversion errors
word_level_errors = []
unknown_words = set()
partial_matches = []

for detail in conversion_details:
    roman_words = detail['roman'].split()
    ref_words = detail['reference'].split()
    pred_words = detail['prediction'].split()
    
    # Track unknown words
    for word in roman_words:
        normalized = preprocessor.normalize_spelling(word.lower())
        if normalized not in dictionary:
            unknown_words.add(word)
    
    # Word-level error analysis
    max_len = max(len(ref_words), len(pred_words))
    for i in range(max_len):
        ref_word = ref_words[i] if i < len(ref_words) else ''
        pred_word = pred_words[i] if i < len(pred_words) else ''
        
        if ref_word != pred_word:
            word_level_errors.append({
                'sentence_idx': detail['index'],
                'position': i,
                'reference': ref_word,
                'prediction': pred_word,
                'roman_context': detail['roman']
            })

print(f"Error Analysis Summary:")
print(f"Total word-level errors: {len(word_level_errors)}")
print(f"Unknown words: {len(unknown_words)}")

print(f"\nSample unknown words:")
for word in list(unknown_words)[:15]:
    print(f"  {word}")

print(f"\nSample word-level errors:")
for error in word_level_errors[:10]:
    print(f"  Sentence {error['sentence_idx']}: '{error['reference']}' vs '{error['prediction']}'")

In [None]:
# Coverage analysis
total_test_words = 0
covered_test_words = 0

for item in test_data:
    words = preprocessor.tokenize(item['roman'])
    total_test_words += len(words)
    
    for word in words:
        normalized = preprocessor.normalize_spelling(word.lower())
        if normalized in dictionary:
            covered_test_words += 1

test_coverage = (covered_test_words / total_test_words) * 100 if total_test_words > 0 else 0

print(f"Test Data Coverage Analysis:")
print(f"Total words in test data: {total_test_words}")
print(f"Covered by dictionary: {covered_test_words}")
print(f"Coverage percentage: {test_coverage:.2f}%")
print(f"Uncovered words: {total_test_words - covered_test_words}")

## 6. Performance Visualization

In [None]:
# Visualize metrics
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))

# BLEU scores distribution
ax1.hist(bleu_scores, bins=15, alpha=0.7, color='skyblue', edgecolor='black')
ax1.set_title('BLEU Scores Distribution')
ax1.set_xlabel('BLEU Score')
ax1.set_ylabel('Frequency')
ax1.axvline(np.mean(bleu_scores), color='red', linestyle='--', label=f'Mean: {np.mean(bleu_scores):.3f}')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Word accuracy distribution
ax2.hist(word_accuracies, bins=15, alpha=0.7, color='lightcoral', edgecolor='black')
ax2.set_title('Word Accuracy Distribution')
ax2.set_xlabel('Word Accuracy')
ax2.set_ylabel('Frequency')
ax2.axvline(np.mean(word_accuracies), color='red', linestyle='--', label=f'Mean: {np.mean(word_accuracies):.3f}')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Edit distance distribution
ax3.hist(edit_distances, bins=15, alpha=0.7, color='lightgreen', edgecolor='black')
ax3.set_title('Edit Distance Distribution')
ax3.set_xlabel('Edit Distance')
ax3.set_ylabel('Frequency')
ax3.axvline(np.mean(edit_distances), color='red', linestyle='--', label=f'Mean: {np.mean(edit_distances):.1f}')
ax3.legend()
ax3.grid(True, alpha=0.3)

# Overall metrics comparison
metrics_names = ['BLEU', 'ROUGE-L', 'Word Acc', 'Char Acc']
metrics_values = [
    metrics_results['BLEU'],
    metrics_results['ROUGE-L'],
    metrics_results['Word_Accuracy'],
    metrics_results['Character_Accuracy']
]

bars = ax4.bar(metrics_names, metrics_values, color=['skyblue', 'lightcoral', 'lightgreen', 'gold'])
ax4.set_title('Overall Performance Metrics')
ax4.set_ylabel('Score')
ax4.set_ylim(0, 1)
ax4.grid(True, alpha=0.3)

# Add value labels on bars
for bar, value in zip(bars, metrics_values):
    height = bar.get_height()
    ax4.text(bar.get_x() + bar.get_width()/2., height + 0.01,
             f'{value:.3f}', ha='center', va='bottom')

plt.tight_layout()
plt.show()

## 7. Dictionary Enhancement Strategies

In [None]:
# Test fuzzy matching performance
fuzzy_test_words = [
    "mein",     # should match "main"
    "kese",     # should match "kaise"
    "hen",      # should match "hain"
    "rahay",    # should match "rahe"
    "kry",      # should match "kar"
]

print("Fuzzy Matching Test:")
print("=" * 30)
for word in fuzzy_test_words:
    converted = dict_model.convert_word(word)
    best_match = dict_model.find_best_match(word)
    print(f"{word:10} -> {converted:15} (best match: {best_match})")

In [None]:
# Test spelling variations
variation_tests = {
    "main": ["mein", "mian", "men"],
    "kaise": ["kese", "keyse", "kaisy"],
    "hain": ["hen", "han", "hein"],
    "rahe": ["rahay", "rhy", "rehy"]
}

print("Spelling Variation Handling:")
print("=" * 40)
for standard, variations in variation_tests.items():
    standard_result = dict_model.convert_word(standard)
    print(f"Standard '{standard}' -> {standard_result}")
    
    for variation in variations:
        var_result = dict_model.convert_word(variation)
        match_score = dict_model.calculate_similarity(standard, variation)
        print(f"  Variation '{variation}' -> {var_result} (similarity: {match_score:.2f})")
    print()

## 8. Model Statistics and Analytics

In [None]:
# Get model statistics
stats = dict_model.get_stats()

print("Dictionary Model Statistics:")
print("=" * 30)
for key, value in stats.items():
    if isinstance(value, float):
        print(f"{key:25}: {value:.3f}")
    else:
        print(f"{key:25}: {value}")

# Performance breakdown by sentence length
length_performance = defaultdict(list)

for i, detail in enumerate(conversion_details):
    sentence_length = len(detail['roman'].split())
    word_acc = word_accuracies[i]
    length_performance[sentence_length].append(word_acc)

print("\nPerformance by Sentence Length:")
print("=" * 35)
for length in sorted(length_performance.keys()):
    avg_acc = np.mean(length_performance[length])
    count = len(length_performance[length])
    print(f"{length:2} words: {avg_acc:.3f} accuracy ({count} sentences)")

In [None]:
# Visualize performance by sentence length
lengths = list(length_performance.keys())
avg_accuracies = [np.mean(length_performance[length]) for length in lengths]
counts = [len(length_performance[length]) for length in lengths]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Performance by length
ax1.plot(lengths, avg_accuracies, marker='o', linewidth=2, markersize=8)
ax1.set_title('Performance vs Sentence Length')
ax1.set_xlabel('Sentence Length (words)')
ax1.set_ylabel('Average Word Accuracy')
ax1.grid(True, alpha=0.3)
ax1.set_ylim(0, 1)

# Sample count by length
ax2.bar(lengths, counts, alpha=0.7, color='lightblue', edgecolor='black')
ax2.set_title('Sample Count by Sentence Length')
ax2.set_xlabel('Sentence Length (words)')
ax2.set_ylabel('Number of Sentences')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 9. Comparison with Baseline

In [None]:
# Create a simple baseline (no conversion)
baseline_predictions = [item['roman'] for item in test_data]

# Calculate baseline metrics
baseline_bleu = np.mean([calculate_bleu_score(pred, ref) for pred, ref in zip(baseline_predictions, references)])
baseline_word_acc = np.mean([calculate_word_accuracy(pred, ref) for pred, ref in zip(baseline_predictions, references)])
baseline_char_acc = np.mean([calculate_character_accuracy(pred, ref) for pred, ref in zip(baseline_predictions, references)])

# Comparison
comparison_data = {
    'Metric': ['BLEU', 'Word Accuracy', 'Character Accuracy'],
    'Baseline': [baseline_bleu, baseline_word_acc, baseline_char_acc],
    'Dictionary Model': [
        metrics_results['BLEU'],
        metrics_results['Word_Accuracy'],
        metrics_results['Character_Accuracy']
    ]
}

comparison_df = pd.DataFrame(comparison_data)
comparison_df['Improvement'] = comparison_df['Dictionary Model'] - comparison_df['Baseline']

print("Model vs Baseline Comparison:")
print("=" * 45)
print(comparison_df.to_string(index=False, float_format='%.3f'))

# Visualize comparison
fig, ax = plt.subplots(figsize=(12, 6))
x = np.arange(len(comparison_data['Metric']))
width = 0.35

bars1 = ax.bar(x - width/2, comparison_data['Baseline'], width, label='Baseline', alpha=0.7)
bars2 = ax.bar(x + width/2, comparison_data['Dictionary Model'], width, label='Dictionary Model', alpha=0.7)

ax.set_xlabel('Metrics')
ax.set_ylabel('Score')
ax.set_title('Dictionary Model vs Baseline Performance')
ax.set_xticks(x)
ax.set_xticklabels(comparison_data['Metric'])
ax.legend()
ax.grid(True, alpha=0.3)
ax.set_ylim(0, 1)

# Add value labels
for bars in [bars1, bars2]:
    for bar in bars:
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2., height + 0.01,
                f'{height:.3f}', ha='center', va='bottom')

plt.tight_layout()
plt.show()

## 10. Export Results and Analysis

In [None]:
# Prepare comprehensive results
results = {
    'model_type': 'Dictionary-Based',
    'test_set_size': len(test_data),
    'dictionary_size': len(dictionary),
    'metrics': metrics_results,
    'coverage': {
        'total_words': total_test_words,
        'covered_words': covered_test_words,
        'coverage_percentage': test_coverage
    },
    'error_analysis': {
        'total_errors': len(word_level_errors),
        'unknown_words_count': len(unknown_words),
        'unknown_words': list(unknown_words)
    },
    'model_stats': stats
}

# Save detailed results
with open('../results/dictionary_model_results.json', 'w', encoding='utf-8') as f:
    json.dump(results, f, ensure_ascii=False, indent=2)

# Save predictions for further analysis
predictions_df = pd.DataFrame(conversion_details)
predictions_df['bleu_score'] = bleu_scores
predictions_df['word_accuracy'] = word_accuracies
predictions_df['edit_distance'] = edit_distances

predictions_df.to_csv('../results/dictionary_model_predictions.csv', index=False, encoding='utf-8')

print("Results exported successfully!")
print("Files created:")
print("  - results/dictionary_model_results.json")
print("  - results/dictionary_model_predictions.csv")

# Print summary
print("\n" + "=" * 50)
print("DICTIONARY MODEL SUMMARY")
print("=" * 50)
print(f"Overall Performance:")
print(f"  BLEU Score:      {metrics_results['BLEU']:.3f}")
print(f"  Word Accuracy:   {metrics_results['Word_Accuracy']:.3f}")
print(f"  Sentence Acc:    {metrics_results['Sentence_Accuracy']:.3f}")
print(f"  Coverage:        {test_coverage:.1f}%")
print(f"\nStrengths:")
print(f"  - Fast and efficient conversion")
print(f"  - High accuracy for known words")
print(f"  - Interpretable results")
print(f"\nLimitations:")
print(f"  - Limited to dictionary vocabulary")
print(f"  - {len(unknown_words)} unknown words in test set")
print(f"  - No context awareness")

## Conclusions

### Key Findings:
1. **Performance**: The dictionary model achieves good performance on covered vocabulary
2. **Coverage**: Dictionary coverage is a critical limiting factor
3. **Fuzzy Matching**: Helps handle spelling variations effectively
4. **Speed**: Very fast conversion suitable for real-time applications

### Strengths:
- Simple and interpretable
- Fast execution
- High precision for known words
- Easy to update and maintain

### Limitations:
- Limited vocabulary coverage
- No context awareness
- Cannot handle new/unknown words well
- Fixed mapping without learning capability

### Recommendations:
1. Expand dictionary with more common words
2. Implement better fuzzy matching algorithms
3. Add context-aware word disambiguation
4. Combine with ML models for unknown words

### Next Steps:
- Implement machine learning models for comparison
- Develop hybrid approaches combining dictionary and ML
- Evaluate ensemble methods for improved performance