# **6. Develop a Machine Translation system to translate public information content between English and any Indian language.**

## Step 1: Install Required Libraries

In this step, we install necessary libraries:
- transformers ‚Üí for loading pre-trained models
- torch ‚Üí for deep learning computations
- sentencepiece ‚Üí for tokenization

In [1]:
!pip install transformers torch sentencepiece

Collecting sentencepiece
  Downloading sentencepiece-0.2.1-cp312-cp312-win_amd64.whl.metadata (10 kB)
Downloading sentencepiece-0.2.1-cp312-cp312-win_amd64.whl (1.1 MB)
   ---------------------------------------- 0.0/1.1 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.1 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.1 MB ? eta -:--:--
   ------------------- -------------------- 0.5/1.1 MB 2.1 MB/s eta 0:00:01
   ----------------------------- ---------- 0.8/1.1 MB 2.0 MB/s eta 0:00:01
   ---------------------------------------- 1.1/1.1 MB 1.8 MB/s  0:00:00
Installing collected packages: sentencepiece
Successfully installed sentencepiece-0.2.1




## Step 2: Import Required Modules

We import:
- AutoTokenizer ‚Üí converts text to tokens (numbers)
- AutoModelForSeq2SeqLM ‚Üí loads translation model
- torch ‚Üí tensor operations

In [14]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

## Step 3: Load Pre-trained Translation Model

We use the Helsinki English-Hindi translation model.
This is a Transformer-based encoder-decoder model.

Encoder ‚Üí Reads English sentence
Decoder ‚Üí Generates Hindi sentence

In [15]:
model_name = "Helsinki-NLP/opus-mt-en-hi"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

print("Model Loaded Successfully üöÄ")

Loading weights:   0%|          | 0/258 [00:00<?, ?it/s]



Model Loaded Successfully üöÄ


## Step 4: Tokenization (Text ‚Üí Numbers)

Neural networks understand numbers, not words.

Tokenizer:
- Breaks sentence into subword tokens
- Converts them into token IDs

In [16]:
sentence = "India is a diverse country."

inputs = tokenizer(sentence, return_tensors="pt")

print("Token IDs:")
print(inputs["input_ids"])

print("\nDecoded back to text:")
print(tokenizer.decode(inputs["input_ids"][0]))

Token IDs:
tensor([[ 4535,    23,    19, 13938,  2126,     3,     0]])

Decoded back to text:
India is a diverse country.</s>


## Step 5: Generate Translation

The model:
1. Takes tokenized input
2. Passes through encoder
3. Decoder predicts output tokens
4. Generate method performs beam search decoding

In [5]:
outputs = model.generate(**inputs)

print("Generated Token IDs:")
print(outputs)

Generated Token IDs:
tensor([[61949,  5682,  6962,    12,  1391,   153,     5,    14,     0]])


## Step 6: Decode Output Tokens

We convert predicted token IDs back to readable Hindi text.

In [6]:
translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Original Sentence:")
print(sentence)

print("\nTranslated Sentence:")
print(translated_text)

Original Sentence:
India is developing rapidly.

Translated Sentence:
‡§≠‡§æ‡§∞‡§§ ‡§§‡•á‡§ú‡§º‡•Ä ‡§∏‡•á ‡§¨‡§¢‡§º ‡§∞‡§π‡§æ ‡§π‡•à ‡•§


## Step 7: Create Translation Function

This function wraps the full process:
- Tokenize
- Generate
- Decode

In [7]:
def translate(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True)
    outputs = model.generate(**inputs)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

## Step 8: Translate Custom Input

User can enter any English public information text.

In [15]:
user_text = input("Enter English text: ")

print("\nTranslated Text:")
print(translate(user_text))


Translated Text:
‡§Æ‡•à‡§Ç ‡§è‡§ï ‡§Ü‡§∞‡•ç‡§ü‡§ø‡§ï ‡§µ‡§ø‡§¶‡•ç‡§Ø‡§æ‡§∞‡•ç‡§•‡•Ä ‡§π‡•Ç‡§Å


In [17]:
user_text = input("Enter English text: ")

print("\nOriginal Text:")
print(user_text)

print("\nTranslated Text:")
print(translate(user_text))



Original Text:
I Am Going To Pune

Translated Text:
‡§Æ‡•à‡§Ç ‡§™‡•Å‡§®‡•á ‡§ï‡•á ‡§≤‡§ø‡§è ‡§ú‡§æ ‡§∞‡§π‡§æ ‡§π‡•Ç‡§Å


## Conclusion

This notebook successfully implements a Neural Machine Translation system 
using a Transformer-based pre-trained model.

The system:
- Converts English text to Hindi
- Uses deep learning architecture
- Demonstrates practical NLP application

