# Using transformers
We will start by learning how to use transformer models for different tasks, by using the Hugging Face ``transformers`` library. Before starting, make sure you have the ``transformers`` library installed, and import it.

In [2]:
!pip uninstall sentence-transformers transformers

Found existing installation: sentence-transformers 5.2.2
Uninstalling sentence-transformers-5.2.2:
  Would remove:
    /usr/local/lib/python3.12/dist-packages/sentence_transformers-5.2.2.dist-info/*
    /usr/local/lib/python3.12/dist-packages/sentence_transformers/*
Proceed (Y/n)? Y
  Successfully uninstalled sentence-transformers-5.2.2
Found existing installation: transformers 4.36.0
Uninstalling transformers-4.36.0:
  Would remove:
    /usr/local/bin/transformers-cli
    /usr/local/lib/python3.12/dist-packages/transformers-4.36.0.dist-info/*
    /usr/local/lib/python3.12/dist-packages/transformers/*
Proceed (Y/n)? Y
  Successfully uninstalled transformers-4.36.0


In [3]:
!pip install transformers==4.36.0

Collecting transformers==4.36.0
  Using cached transformers-4.36.0-py3-none-any.whl.metadata (126 kB)
Using cached transformers-4.36.0-py3-none-any.whl (8.2 MB)
Installing collected packages: transformers
Successfully installed transformers-4.36.0


In [None]:
#!pip install transformers torch

In [1]:
from transformers import pipeline
import torch

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


## Sentiment analysis
The easiest way to use a transformer model from the Hugging Face library is by using the ``pipeline()`` function. Hugging Face describes the ``pipeline()`` function as
> connencting a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer.

For example, if we want to use a pre-trained classifier for sentiment analysis, we may do the following:

In [5]:
classifier = pipeline("sentiment-analysis")
sentence = "I loved the theater yesterday"
classifier(sentence)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9997727274894714}]

**Question.** What model is being used by default ?

**Exercise.** The ``classifier`` model can process several input sentences at a time. Create a list of 3 sentences and input the whole list in order to classify the 3 sentences simultaneously.

In [6]:
# TODO: create a list of 3 sentences
sentences_list=["I love driving motor cars", "Your mom need to go to an hospital", "In a far far galaxy a great war begun"]

# TODO: classify the above sentences using the classifier model
classifier=pipeline("text-classification")
classifier(sentences_list)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9992521405220032},
 {'label': 'NEGATIVE', 'score': 0.9974527955055237},
 {'label': 'POSITIVE', 'score': 0.698837399482727}]

**Question.** When classifying the above sentences, how sure is the model ?

**Exercise.** Try and write down a sentence so that the model classifies it as being positive/negative with probabilites close to 0.5.

In [26]:
# TODO: get a score as close to 0.5 as possible.
middle_sentiment = "A red car and a blue moto in a red city"

classifier(middle_sentiment)

[{'label': 'POSITIVE', 'score': 0.5083218216896057}]

**Discussion.** Take a pause to discuss with your classmates and with the teacher:
- Did anyone manage to get a score close to 0.5?
- Which sentences work best?
- Some sentences *feel* like they should get a score close to 0.5, but they don't. Why does this happen?

## Translation
Next we will use the ``pipeline()`` function for the task of machine translation. We will choose to load the *T5 transformer* model, provided by Google, which supports English, French, German and Romanian. In order to do that, we will specify to the ``pipeline()`` function that that we want to use the model ``google-t5/t5-base``, as follows:

In [6]:
translator = pipeline("translation", model="google-t5/t5-base")



**Exercise.** Use the translator model to translate the sentence *The farmers take the cows up to the mountains during summer*.

In [8]:
# TODO: Translate the given sentence

sentence = "The farmers take the cows up to the mountains during summer"
translator(sentence)

[{'translation_text': 'Die Bauern bringen die Kühe im Sommer in die Berge.'}]

**Question.** What language did the model translate the sentence to?

**Exercise.** Choose a different language and have the model translate the same sentence to the new language.

In [12]:
# TODO: Translate the same sentece to a different language
sentence = "The farmers take the cows up to the mountains during summer"
translator("translate English to French: " + sentence)

[{'translation_text': "Les agriculteurs transportent les vaches vers les montagnes durant l'été"}]

**Exercise.** Try and translate the same sentece to Spanish.

In [13]:
# TODO: try and translate to Spanish
translator("translate English to Spanish: " + sentence)

[{'translation_text': 'Spanisch: Im Sommer fahren die Bauern die Kühe in die Berge'}]

**Exercise.** Check the [Hugging Face Model Hub](https://huggingface.co/models) for a translation model that can translate from English to Spanish and use it to translate the sentence above.

**Hint.** Use the tags in the left-hand of the Hugging Face Model Hub side in order to filter the models according to the task.

In [4]:
# TODO: repeat the steps above with a model that handles English 2 Spanish translation.
sentence="The farmers take the cows up to the mountains during summer"
translator_esp=pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")
translator_esp(sentence)



[{'translation_text': 'Los granjeros llevan las vacas a las montañas durante el verano'}]

## Sampling with temperature and top-p
We now check that our `translator` model always seems to produce the same answer for a given prompt. To that end, we create a batch of 10 identical sequences and check that the answers are always the same.

In [7]:
# generate a batch with 10 identical sequences containig the word Doctor
batch = ["French: Doctor"] * 10
print(translator(batch))

[{'translation_text': 'Docteur'}, {'translation_text': 'Docteur'}, {'translation_text': 'Docteur'}, {'translation_text': 'Docteur'}, {'translation_text': 'Docteur'}, {'translation_text': 'Docteur'}, {'translation_text': 'Docteur'}, {'translation_text': 'Docteur'}, {'translation_text': 'Docteur'}, {'translation_text': 'Docteur'}]


**Exercise.** Add the options `do_sample=True`, as well as a `temperature` and a `top_p` value to the `translator` call. Are the answers any different? What `temperature` parameters give interesting answers and which don't? Do not hesitate to compare with your classmates' results.

In [24]:
# TODO: Generate answers with different temperature and top-p values.

print(translator(batch, do_sample=True, temperature=1, top_p=2))

[{'translation_text': ': Docteur'}, {'translation_text': ': Docteur'}, {'translation_text': 'Docteur'}, {'translation_text': ': Docteur'}, {'translation_text': ': Docteur'}, {'translation_text': ': Docteur'}, {'translation_text': ': Docteur'}, {'translation_text': ': Docteur'}, {'translation_text': ': Docteur'}, {'translation_text': ': Docteur'}]
