Skip to content

nullHawk/translator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

English to Hindi Neural Machine Translator

This project implements a neural machine translation system for translating English text to Hindi using a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The model is built with PyTorch and features a GRU-based encoder-decoder network.

Features

  • Seq2Seq architecture with attention mechanism
  • GRU-based encoder and decoder
  • Customizable model parameters (hidden size, embedding dimensions, etc.)
  • Interactive translation through command line interface
  • Web interface using Gradio
  • Support for vocabulary management and data preprocessing
  • Teacher forcing during training

Project Structure

├── app.py                 # Gradio web interface for translation
├── inference.py           # Functions for model inference
├── train.py               # Model training script
├── models/                # Neural network architecture components
│   ├── encoder.py         # Encoder implementation
│   ├── decoder.py         # Decoder implementation  
│   ├── attention.py       # Attention mechanism
│   └── seq2seq.py         # Seq2Seq model
├── utils/                 # Utility functions
│   ├── config.py          # Configuration parameters
│   ├── data_loader.py     # Data loading utilities
│   └── preprocessing.py   # Text preprocessing functions
├── data/                  # Data directory
│   └── hindi_english_parallel.csv  # Parallel corpus
└── bin/                   # Model checkpoints and vocabularies
    ├── seq2seq.pth        # Trained model weights
    ├── eng_vocab.pkl      # English vocabulary
    └── hin_vocab.pkl      # Hindi vocabulary

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/english-hindi-translator.git
cd english-hindi-translator
  1. Install the required packages:
pip install -r requirment.txt

Usage

Training the Model

To train the translation model:

python train.py

This will:

  • Load and preprocess the Hindi-English parallel corpus
  • Build vocabularies for both languages
  • Initialize and train the Seq2Seq model
  • Save model checkpoints after each epoch

Translation via Command Line

For interactive translation through command line:

python inference.py

Web Interface

To launch the web interface for translation:

python app.py

This will start a Gradio interface that you can access in your web browser.

Model Architecture

  • Encoder: GRU-based with configurable layers and embedding dimensions
  • Decoder: GRU with attention mechanism
  • Attention: Calculates attention scores between encoder outputs and decoder hidden states
  • Training: Uses teacher forcing and cross-entropy loss

Configuration

Model parameters can be adjusted in the config.py file:

  • embedding_dim: Size of word embeddings
  • hidden_size: Size of hidden layers
  • num_layers: Number of RNN layers
  • dropout: Dropout rate
  • batch_size: Training batch size
  • learning_rate: Learning rate for optimizer
  • epochs: Number of training epochs
  • teacher_forcing_ratio: Ratio of teacher forcing during training
  • max_vocab_english: Maximum size of English vocabulary
  • max_vocab_hindi: Maximum size of Hindi vocabulary
  • max_length: Maximum sentence length

Requirements

  • torch
  • pandas
  • numpy
  • tqdm
  • gradio

About

RNN Based Seq2Seq Model to Translate English to Hindi

Topics

Resources

Stars

Watchers

Forks

Languages