# Module 1 — Environment & Basic Translation Test (Colab)

This notebook tests the basic translation pipeline for Hindi ↔ English translation.

## Step 1: Install Dependencies

In [None]:
# Colab: install exact versions that are Colab-friendly
!pip install -q transformers==4.36.2 sentencepiece==0.1.99 sacrebleu==2.3.1 huggingface-hub==0.18.1

## Step 2: Create Sample Data

In [None]:
%%bash
cat > sample_texts.csv <<'CSV'
id,text,lang
1,"Hello, how are you?","en"
2,"नमस्ते, आप कैसे हैं?","hi"
3,"Good morning, everyone","en"
4,"क्या आप आज उपलब्ध हैं?","hi"
CSV

## Step 3: Quick Translation Test

In [None]:
from transformers import pipeline
import time

# Load Hindi to English translator
print("Loading hi->en model (this may take a minute on first run)...")
hi_en = pipeline("translation", model="Helsinki-NLP/opus-mt-hi-en")

# Test translation
test_text = "नमस्ते, आप कैसे हैं?"
t0 = time.time()
result = hi_en(test_text, max_length=512)
elapsed = time.time() - t0

print(f"\nInput (HI): {test_text}")
print(f"Output (EN): {result[0]['translation_text']}")
print(f"Time: {elapsed:.3f}s")

In [None]:
# Load English to Hindi translator
print("Loading en->hi model...")
en_hi = pipeline("translation", model="Helsinki-NLP/opus-mt-en-hi")

# Test translation
test_text = "Hello, how are you?"
t0 = time.time()
result = en_hi(test_text, max_length=512)
elapsed = time.time() - t0

print(f"\nInput (EN): {test_text}")
print(f"Output (HI): {result[0]['translation_text']}")
print(f"Time: {elapsed:.3f}s")

## Step 4: Batch Test with Sample CSV

In [None]:
import csv

# Load samples
samples = []
with open('sample_texts.csv', newline='', encoding='utf-8') as f:
    reader = csv.DictReader(f)
    for row in reader:
        samples.append(row)

# Translate each sample
print("Batch Translation Results:\n")
print("="*60)
for sample in samples:
    text = sample['text']
    lang = sample['lang']
    
    if lang == 'hi':
        t0 = time.time()
        out = hi_en(text, max_length=512)[0]['translation_text']
        elapsed = time.time() - t0
        print(f"[HI→EN]  {text}")
        print(f"         → {out} ({elapsed:.3f}s)\n")
    else:
        t0 = time.time()
        out = en_hi(text, max_length=512)[0]['translation_text']
        elapsed = time.time() - t0
        print(f"[EN→HI]  {text}")
        print(f"         → {out} ({elapsed:.3f}s)\n")

## Optional: Hugging Face Authentication

If you need access to private models or want to avoid rate limits:

In [None]:
from huggingface_hub import notebook_login
notebook_login()

## Summary

If all cells ran successfully, Module 1 is complete! You've verified:
- ✅ Transformers pipeline can be loaded
- ✅ Hindi ↔ English translation works
- ✅ Model inference timing is reasonable

**Next:** Proceed to Module 2 for audio processing and speech recognition.