## Text translation using pre-trained Transformers

In [1]:
from transformers import MarianMTModel, MarianTokenizer

model_name = "Helsinki-NLP/opus-mt-en-es"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

  from .autonotebook import tqdm as notebook_tqdm
Downloading source.spm: 100%|██████████| 802k/802k [00:00<00:00, 3.97MB/s]
Downloading target.spm: 100%|██████████| 826k/826k [00:00<00:00, 4.08MB/s]
Downloading vocab.json: 100%|██████████| 1.59M/1.59M [00:00<00:00, 7.93MB/s]
Downloading tokenizer_config.json: 100%|██████████| 44.0/44.0 [00:00<00:00, 102kB/s]
Downloading config.json: 100%|██████████| 1.47k/1.47k [00:00<00:00, 3.26MB/s]
Downloading pytorch_model.bin: 100%|██████████| 312M/312M [00:16<00:00, 19.4MB/s] 
Downloading generation_config.json: 100%|██████████| 293/293 [00:00<00:00, 969kB/s]


First, we import the `MarianMTModel` and `MarianTokenizer` classes from the `transformers` module, which is a popular Python library for working with transformer-based models such as BERT, GPT, and MarianMT.

Next, we set the `model_name` variable to `Helsinki-NLP/opus-mt-en-es`, which is the name of the pre-trained model that will be used for English-to-Spanish translation.  Read more about this pre-trained model [here](https://huggingface.co/Helsinki-NLP/opus-mt-en-es).

The `MarianTokenizer` class is then used to instantiate a `tokenizer` object, which will be used to tokenize the input text before passing it to the model.

Similarly, the `MarianMTModel` class is used to instantiate a translation `model` object. The model object is initialized with the pre-trained weights of the English-to-Spanish translation model specified by `model_name`.

In [3]:
import warnings
warnings.filterwarnings('ignore')

def translate(text: str) -> str:
    """
    :param text: English text
    :return: Spanish text (translated from the English input)
    """
    # Tokenize the input text
    inputs = tokenizer(text, return_tensors="pt")
    # Generate the corresponding Spanish translation
    outputs = model.generate(**inputs)
    # Decode the translated text
    translated = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return translated

text = "Hello, how are you doing today?"
translated = translate(text)
print(translated)

Hola, ¿cómo estás hoy?


In [4]:
text = "Life is what happens when you're busy making other plans."
translated = translate(text)
print(translated)

La vida es lo que pasa cuando estás ocupado haciendo otros planes.
