# Translation and Summarization with Hugging Face

In [3]:
from transformers import pipeline 
import torch

# Here is some code that suppresses warning messages.
from transformers.utils import logging
logging.set_verbosity_error()

## Translation

We will first proceed with a translation example and select a model from Meta

NLLB: No Language Left Behind: '[nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M)'

By setting the ```torch_dtype``` to ```bfloat16```, we are able to compress the model without any performance degradation:

In [4]:
translator = pipeline(task="translation",
                      model="facebook/nllb-200-distilled-600M",
                      torch_dtype=torch.bfloat16)

config.json:   0%|          | 0.00/846 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.46G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/564 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/4.85M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.3M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/3.55k [00:00<?, ?B/s]

Now that we have loaded the translator, we can start translating

As an example, we will translate the following:

In [5]:
text = """\
My puppy is adorable, \
Your kitten is cute.
Her panda is friendly.
His llama is thoughtful. \
We all have nice pets!"""

We now pass the text to the translator and choose the source and target languages:

In [6]:
text_translated = translator(text,
                             src_lang="eng_Latn",
                             tgt_lang="fra_Latn")

To choose other languages, you can find the other language codes on the page: [Languages in FLORES-200](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200)

In [7]:
text_translated

[{'translation_text': 'Mon chiot est adorable, ton chaton est mignon, son panda est ami, sa lamme est attentive, nous avons tous de beaux animaux de compagnie.'}]

The above translation is not perfect as the translator has got some words wrong.

For example, llama was translated as lamme, which is wrong and should have been just llama

Now let's try to translate English to Marathi:

In [9]:
text_translated = translator(text,
                             src_lang="eng_Latn",
                             tgt_lang="mar_Deva")
text_translated

[{'translation_text': 'माझे पिल्लू खूप सुंदर आहे, तुमची मांजरी खूप गोड आहे. तिचे पांडा खूप मैत्रीपूर्ण आहेत. तिचे लामा खूप विचारशील आहेत. आपल्या सर्वांना छान पाळीव प्राणी आहेत.'}]

## Summarization

In order to have enough free memory to run the rest of the code, please run the following to free up memory on the machine.

In [10]:
import gc
del translator
gc.collect()

48

Now build the summarization pipeline

In [11]:
summarizer = pipeline(task="summarization",
                      model="facebook/bart-large-cnn",
                      torch_dtype=torch.bfloat16)

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Model info: '[Bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn)'

In [12]:
# Sample text to summarise:
text = """Paris is the capital and most populous city of France, with
          an estimated population of 2,175,601 residents as of 2018,
          in an area of more than 105 square kilometres (41 square
          miles). The City of Paris is the centre and seat of
          government of the region and province of Île-de-France, or
          Paris Region, which has an estimated population of
          12,174,880, or about 18 percent of the population of France
          as of 2017."""

In [15]:
summary = summarizer(text,
                     min_length=10,
                     max_length=70)

In [16]:
summary

[{'summary_text': 'Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018. The City of Paris is the centre and seat of the government of the region and province of Île-de-France.'}]