<a href="https://colab.research.google.com/github/ric4234/AI-Fridays/blob/main/Analisi%20Di%20Testi/03_Translation_Summarization.ipynb" target="_parent\"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Translation and Summarization





The goal of this exercise is to use a model to translate text into different languages and a model to summarize a text.





#### 1 - Install dependencies and create utils functions

Firstly, we make sure to install all the needed libraries

In [1]:
!pip install transformers
!pip install torch

Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m16.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m23.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.1/14.1 MB[0m [31m26.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Suppress warning messages

In [2]:
from transformers.utils import logging
logging.set_verbosity_error()

#### 2 - Build and use a translation pipeline

At this point, we create a tranlation pipeline pipeline using bnllb-200-distilled-600M model from facebook (https://huggingface.co/facebook/nllb-200-distilled-600M). We decided to use this model because it is very small (only 600M parameters) and because it can translate text into over 190 different languages.
You can find also a lot of other model from Huggingface hub filtering models by Translation type (https://huggingface.co/models?pipeline_tag=translation&sort=trending)

In [3]:
from transformers import pipeline
import torch
translator = pipeline(task="translation",
                      model="facebook/nllb-200-distilled-600M",
                      torch_dtype=torch.bfloat16) # This parameter compress the model without any performance degradation

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/846 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.46G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/564 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/4.85M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.3M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/3.55k [00:00<?, ?B/s]

Now that the translator is loaded let's pass the user message

In [4]:
text = """
Imagine there's no heaven
It's easy if you try
No hell below us
Above us only sky.
"""

To choose other languages, you can find the other language codes on this page: https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200

In [5]:
text_translated = translator(text,
                             src_lang="eng_Latn",
                             tgt_lang="ita_Latn") # Italiano
text_translated

[{'translation_text': 'Immaginate che non ci sia nessun paradiso, è facile provare. Nessun inferno sotto di noi, sopra di noi solo il cielo.'}]

In [6]:
text_translated = translator(text,
                             src_lang="eng_Latn",
                             tgt_lang="lmo_Latn") # Lombardo
text_translated

[{'translation_text': "Imagin che no gh'è nissun paradiso, se provi a provà a no inferno sott de noi, sora de noi, solo cielo."}]

In [7]:
text_translated = translator(text,
                             src_lang="eng_Latn",
                             tgt_lang="lmo_Latn") # Ligure
text_translated

[{'translation_text': "Imagin che no gh'è nissun paradiso, se provi a provà a no inferno sott de noi, sora de noi, solo cielo."}]

In [8]:
text_translated = translator(text,
                             src_lang="eng_Latn",
                             tgt_lang="srd_Latn") # Sardo
text_translated

[{'translation_text': "Imaginadu chi non b'at paradisu est fàtzile si proas No infernu suta de nois subra de nois solu su celu."}]

In [9]:
text_translated = translator(text,
                             src_lang="eng_Latn",
                             tgt_lang="scn_Latn") # Siciliano
text_translated

[{'translation_text': 'Immaginate chì ùn ci sia micca celu hè faciule se pruvate No infernu sottu à noi sopra à noi solu celu. '}]

In [10]:
text_translated = translator(text,
                             src_lang="eng_Latn",
                             tgt_lang="vec_Latn") # Veneziano
text_translated

[{'translation_text': 'Imagina che no ghe sia paradiso, xe facile se provi No inferno soto de noi Sopra de noi solo cielo.'}]

#### 3 - Build and use a summarization pipeline

In the following code we will build a summarization pipeline using bart-large-cnn model (https://huggingface.co/facebook/bart-large-cnn).

As usual, you can find other models that perform this task via the Hugginface Models section: https://huggingface.co/models?pipeline_tag=summarization

In [None]:
summarizer = pipeline(task="summarization",
                      model="facebook/bart-large-cnn",
                      torch_dtype=torch.bfloat16)

In [None]:
text = """Paris is the capital and most populous city of France, with
          an estimated population of 2,175,601 residents as of 2018,
          in an area of more than 105 square kilometres (41 square
          miles). The City of Paris is the centre and seat of
          government of the region and province of Île-de-France, or
          Paris Region, which has an estimated population of
          12,174,880, or about 18 percent of the population of France
          as of 2017."""

In [None]:
summary = summarizer(text,
                     min_length=10,
                     max_length=100)
summary

[{'summary_text': 'Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018. The City of Paris is the centre and seat of the government of the region and province of Île-de-France.'}]