# Translation and Summarization

- Install transformers 
- Install pytorch
  

### Make sure you are using the right python environment

In [None]:
!where python  # For Windows
!which python  # For Linux/MacOS

# Alternatively, using python to get the path of the Pythono executable
import sys
print(sys.executable)

d:\GitHub\hf-trans-sum\.venv\Scripts\python.exe
C:\Users\Mentash\AppData\Local\Programs\Python\Python312\python.exe
C:\Users\Mentash\AppData\Local\Microsoft\WindowsApps\python.exe
d:\GitHub\hf-trans-sum\.venv\Scripts\python.exe


INFO: Could not find "#".
INFO: Could not find "For".
INFO: Could not find "Windows".


In [None]:
%pip install transformers
%pip install torch

### Logging Configuration for Transformers

The following code configures the logging behavior of the `transformers` library to only display error messages. This helps reduce unnecessary output in the console or notebook.

This code block configures the logging behavior of the `transformers` library:

1. **`from transformers.utils import logging`**:
   - Imports the logging utility from the `transformers` library.

2. **`logging.set_verbosity_error()`**:
   - Sets the logging verbosity level to `ERROR`. This means only error messages will be logged, suppressing less critical messages like warnings, info, or debug logs.

### Purpose:
This code is used to reduce the amount of logging output from the `transformers` library, making the output cleaner by only showing error messages. It is useful when you want to avoid clutter in your console or notebook.

In [72]:
# Import the logging utility from the transformers library
from transformers.utils import logging

# Set the logging verbosity to only display error messages
logging.set_verbosity_error()

### Build the `translation` pipeline using 🤗 Transformers Library

In [73]:
from transformers import pipeline 
import torch

In [74]:
translator = pipeline(task="translation",
                      # Use facebook's no-language-left-behind model
                      model="facebook/nllb-200-distilled-600M", # Use the model directly from Hugging Face Hub
                      # Compress the model to save memory
                      torch_dtype=torch.float16)

In [75]:
text = """\
Hello, My name is Mohamed, \
I would like to open a bank account.
Here is my passport and ID.
I want a checking account type. \
I would like to have a credit card."""

In [None]:
text_translated = translator(text,
                             src_lang="eng_Latn", # Source language: English
                             tgt_lang="deu_Latn") # Target language: German

To choose other languages, you can find the other language codes on the page: [Languages in FLORES-200](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200)

For example:
- Afrikaans: afr_Latn
- Chinese: zho_Hans
- Egyptian Arabic: arz_Arab
- French: fra_Latn
- German: deu_Latn
- Greek: ell_Grek
- Hindi: hin_Deva
- Indonesian: ind_Latn
- Italian: ita_Latn
- Japanese: jpn_Jpan
- Korean: kor_Hang
- Persian: pes_Arab
- Portuguese: por_Latn
- Russian: rus_Cyrl
- Spanish: spa_Latn
- Swahili: swh_Latn
- Thai: tha_Thai
- Turkish: tur_Latn
- Vietnamese: vie_Latn
- Zulu: zul_Latn

In [90]:
print(text_translated)

[{'translation_text': 'مرحبا، اسمي محمد، عايز افتتح حساب مصرفي. هاهي جواز سفر و هويتي. عايز نوع حساب التحقق. عايز بطاقة ائتمان.'}]


## Free up some memory before continuing
- In order to have enough free memory to run the rest of the code, please run the following to free up memory on the machine.
- The following code will call the garbage collector to free up memory that is no longer in use. This is useful when you want to ensure that your program has enough memory available for new tasks or operations.
- Delet the `translation` pipeline to free up memory.

In [51]:
import gc

In [52]:
del translator

In [53]:
gc.collect()

8919

### Build the `summarization` pipeline using 🤗 Transformers Library
We will another model from Meta called `facebook/bart-large-cnn` for summarization. This model is based on the BART architecture and is specifically fine-tuned for the CNN/Daily Mail summarization task.

In [54]:
summarizer = pipeline(task="summarization",
                      model="facebook/bart-large-cnn",
                      torch_dtype=torch.bfloat16)

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [55]:
text = """Paris is the capital and most populous city of France, with
          an estimated population of 2,175,601 residents as of 2018,
          in an area of more than 105 square kilometres (41 square
          miles). The City of Paris is the centre and seat of
          government of the region and province of Île-de-France, or
          Paris Region, which has an estimated population of
          12,174,880, or about 18 percent of the population of France
          as of 2017."""

In [56]:
summary = summarizer(text,
                     min_length=10,
                     max_length=100)

In [57]:
summary

[{'summary_text': 'Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018. The City of Paris is the centre and seat of the government of the region and province of Île-de-France.'}]