# Dialect-Aware Multilingual Translator

## Problem Definition & Objective
India has rich linguistic diversity with multiple languages and regional Hindi dialects. 
Standard translators often fail to capture regional expressions and dialectal nuances.

The objective of this project is to build a dialect-aware AI translation system that can:
- Detect Indian languages and Hindi dialects
- Translate text into regionally accurate language
- Provide voice output for accessibility


## Real-World Relevance & Motivation

India has millions of speakers who communicate in regional dialects such as Bhojpuri, Haryanvi, Awadhi, and Maithili.
Most existing translation systems translate only into standard languages and ignore dialectal variations, making translations less natural and less accessible.

This project aims to bridge that gap by providing dialect-aware translations that sound natural to native speakers and support voice output for better usability.

## Selected Project Track

Natural Language Processing (NLP)  
AI-Powered Language Translation System

## Tech Stack

- Python 3.11
- Streamlit
- Google Gemini API
- gTTS (Text-to-Speech)
- Jupyter Notebook


## Data Understanding & Preparation

This project does not rely on a traditional static dataset. Instead, it works
with real-time user-provided text as input.

### Input Data
- User enters text in any Indian language or Hindi dialect
- The input may include regional vocabulary, informal expressions, or mixed language usage

### Data Handling
- The input text is cleaned by removing unnecessary whitespace
- Language and dialect are identified using the AI model
- No personal data is stored or logged

### Assumptions
- Input text is meaningful and written in a recognizable Indian language
- Internet connectivity is available for API-based processing


## Model / System Design

The Dialect-Aware Multilingual Translator is designed as a modular AI system
that processes user input through multiple logical stages.

### System Architecture
1. User enters text through the Streamlit interface
2. The text is sent to the Google Gemini API
3. The model identifies:
   - Source language
   - Hindi dialect (if applicable)
4. Based on detection, an appropriate translation prompt is generated
5. The translated output is returned to the user
6. Text-to-Speech converts translated text into voice output

### Key Components
- **Input Module**: Accepts user text
- **Language Detection Module**: Identifies language and dialect
- **Translation Module**: Produces regionally accurate translations
- **Speech Module**: Generates voice output using gTTS
- **UI Module**: Displays text and audio via Streamlit

### Design Considerations
- Modular structure for easy updates
- API-based model for scalability
- Focus on regional accuracy over literal translation


In [1]:
# Core Implementation: Library Imports & Setup

import os
import streamlit as st
from gtts import gTTS

# Gemini API setup
import google.generativeai as genai

# Load API key from environment variable
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")

if GEMINI_API_KEY:
    genai.configure(api_key=GEMINI_API_KEY)
else:
    print("Warning: GEMINI_API_KEY not found in environment variables")

  from .autonotebook import tqdm as notebook_tqdm

All support for the `google.generativeai` package has ended. It will no longer be receiving 
updates or bug fixes. Please switch to the `google.genai` package as soon as possible.
See README for more details:

https://github.com/google-gemini/deprecated-generative-ai-python/blob/main/README.md

  import google.generativeai as genai


In [2]:
def dialect_aware_translate(text, target_dialect):
    """
    Uses Gemini to translate text into a specific Indian language or Hindi dialect
    using natural regional expressions.
    """
    prompt = f"""
    You are an expert Indian linguist.

    Translate the following text into {target_dialect}.
    Use natural, culturally accurate, and region-specific expressions.
    Do NOT use formal textbook language.

    Text:
    {text}
    """

    model = genai.GenerativeModel("gemini-pro")
    response = model.generate_content(prompt)

    return response.text


In [3]:
def detect_language_or_dialect(text):
    """
    Detects Indian language or Hindi dialect from input text using Gemini.
    """
    prompt = f"""
    Identify the language or Hindi dialect of the following text.
    Choose from examples like:
    Hindi, Bhojpuri, Haryanvi, Awadhi, Maithili, Punjabi, Bengali, Tamil, Telugu, etc.

    Text:
    {text}

    Respond with ONLY the language or dialect name.
    """

    model = genai.GenerativeModel("gemini-pro")
    response = model.generate_content(prompt)

    return response.text.strip()


In [4]:
def translate_text(text, target_language="English"):
    """
    Translates text into the target language while preserving dialect context.
    """
    detected_lang = detect_language_or_dialect(text)

    prompt = f"""
    You are a professional Indian language translator.

    Detected language/dialect: {detected_lang}
    Target language: {target_language}

    Translate the following text while preserving cultural and dialectal meaning:

    Text:
    {text}
    """

    model = genai.GenerativeModel("gemini-pro")
    response = model.generate_content(prompt)

    return {
        "detected_language": detected_lang,
        "translated_text": response.text.strip()
    }


In [5]:
def text_to_speech(text, filename="output.mp3"):
    """
    Converts text to speech and saves it as an audio file.
    """
    tts = gTTS(text=text, lang="hi")
    tts.save(filename)
    return filename
