 **PROJECT DETAILS**

>**AI - POWERED MULTILEVEL LANGUAGE TRANSLATOR**

*   **AIM:** Develop a multilevel language translator using AI model capable of understanding and responding in multiple languages for tasks like translation, sentiment analysis, and summarization.



*  **PURPOSE:** To bridge language barriers by building a unified NLP model that performs well across diverse languages.




___________________________________________________










**STEP 1:** IMPORT THE IMPORTANT LIBRARIES

__________________________________________________________________________________


In [1]:
import os
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
import gradio as gr

**STEP2:** ENVIRONMENT CLEAN-UP

________________________________________________________________________________

In [2]:
os.environ.pop("HUGGINGFACE_TOKEN", None)

**STEP3:** LOAD MODEL AND TOKENIZER

_________________________________________________________________________________________

In [7]:
model_name = "facebook/m2m100_418M"

tokenizer = M2M100Tokenizer.from_pretrained(model_name)
model = M2M100ForConditionalGeneration.from_pretrained(model_name)


**STEP4:** INFERENCE
_________________________________________________________________________________



In [4]:
def translate(text, src_lang, tgt_lang):
    tokenizer.src_lang = src_lang
    encoded = tokenizer(text, return_tensors="pt")
    generated_tokens = model.generate(
        **encoded,
        forced_bos_token_id=tokenizer.get_lang_id(tgt_lang)
    )
    translated = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
    return translated[0]

**STEP5:** DEPOLYMENT

_______________________________________________________________________________________

In [5]:
src_langs = [
    "en",  # English
    "hi",  # Hindi
    "fr",  # French
    "de",  # German
    "es",  # Spanish
    "zh",  # Chinese
    "ja",  # Japanese
    "ko",  # Korean
    "ar",  # Arabic
    "ru",  # Russian
    "pt",  # Portuguese
    "it",  # Italian
    "tr",  # Turkish
    "bn",  # Bengali
    "ta",  # Tamil
    "gu",  # Gujarati
    "ml",  # Malayalam
    "te",  # Telugu
    "ur",  # Urdu
    "fa",  # Persian
    "id",  # Indonesian
    "vi",  # Vietnamese
    "th",  # Thai
    "pl",  # Polish
    "uk",  # Ukrainian
    "nl",  # Dutch
    "ro",  # Romanian
    "el",  # Greek
    "sv",  # Swedish
    "no",  # Norwegian
    "fi",  # Finnish
]

with gr.Blocks() as demo:
    gr.Markdown("# GAURAV's MULTILEVEL-LANGUAGE TRANSLATOR")
    with gr.Row():
        input_text = gr.Textbox(label="ENTER TEXT TO TRANSLATE")
    with gr.Row():
        src = gr.Dropdown(choices=src_langs, value="English", label="Source Language")
        tgt = gr.Dropdown(choices=src_langs, value="Hindi", label="Target Language")
    with gr.Row():
        translate_btn = gr.Button("TRANSLATE")
    output = gr.Textbox(label="TRANSLATION OUTPUT")

    translate_btn.click(translate, inputs=[input_text, src, tgt], outputs=output)

demo.launch(share=True)



Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://77d28522baaa2fe943.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




**CHALLENGES AND SOLUTIONS**

*   Data imbalance across languages	Used data augmentation and sampling techniques to balance dataset.

*   Token length variations	Applied dynamic padding and truncation with attention masks.

*   Low performance in low-resource languages	Leveraged transfer learning and back-translation techniques.


*   Inference latency	Optimized model using ONNX or quantization for faster inference.
*   Memory issues during training	Used gradient accumulation and mixed-precision training.
________________________________________________________________________________________________________
