# language translation application:

To create a language translation application that translates text from English to Spanish, you can use the transformers library from Hugging Face, which provides access to powerful machine translation models.

As with any machine translation, it's important to note that the translation might not always capture nuanced meanings, especially with complex sentences or idiomatic expressions.

If you encounter any limitations with the model's translations, consider fine-tuning the model on a specific dataset, if available, to better suit your needs.

This code is suitable for small-scale or demonstration purposes. For large-scale or production environments, additional considerations for performance and scalability should be taken into account.

Here's a complete example using the Helsinki-NLP/opus-mt-en-es model, which is specifically trained for English-to-Spanish translation:

In [1]:
pip install transformers sentencepiece


Collecting sentencepiece
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: sentencepiece
Successfully installed sentencepiece-0.1.99


In [2]:
from transformers import MarianMTModel, MarianTokenizer

def translate_to_spanish(text):
    model_name = 'Helsinki-NLP/opus-mt-en-es'
    tokenizer = MarianTokenizer.from_pretrained(model_name)
    model = MarianMTModel.from_pretrained(model_name)

    # Tokenize the text
    batch = tokenizer([text], return_tensors="pt", padding=True)

    # Generate translation
    translated = model.generate(**batch)

    # Decode the translation
    translated_text = tokenizer.decode(translated[0], skip_special_tokens=True)
    return translated_text

# Example usage
english_text = "Hello, how are you?"
spanish_translation = translate_to_spanish(english_text)
print("Translated Text:", spanish_translation)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/826k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.59M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.47k [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/312M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

Translated Text: Hola, ¿cómo estás?


This script defines a function translate_to_spanish that takes an English text input and returns its Spanish translation. The Helsinki-NLP/opus-mt-en-es model from the Marian family of models is used for translation.

# English text to Arabic
To translate English text to Arabic, you can use the Helsinki-NLP/opus-mt-en-ar" model from the Hugging Face Transformers library. This model is specifically trained for English to Arabic translation.

In [4]:
!pip install transformers



In [7]:
from transformers import MarianMTModel, MarianTokenizer

def translate_to_arabic(text):
    model_name = 'Helsinki-NLP/opus-mt-en-ar'
    tokenizer = MarianTokenizer.from_pretrained(model_name)
    model = MarianMTModel.from_pretrained(model_name)

    # Tokenize the text
    batch = tokenizer([text], return_tensors="pt", padding=True)

    # Generate translation
    translated = model.generate(**batch)

    # Decode the translation
    translated_text = tokenizer.decode(translated[0], skip_special_tokens=True)
    return translated_text

# Example usage
english_text = "Hello, how are you?"
arabic_translation = translate_to_arabic(english_text)
print("Translated Text (Arabic):", arabic_translation)


tokenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/801k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/917k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.12M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.39k [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

Translated Text (Arabic): مرحباً، كيف حالك؟
