# Transformers

Transformer are involved in all kinds of NLP applications:
<ul>
<li> Sentiment analysis</li>
<li> Zero-shot classification</li>
<li> Text generation</li>
<li> Mask filling</li>
<li> Named entity recognition</li>
<li> Question answering</li>
<li> Summarization</li>
<li> Translation</li>
</ul>

Sentiment analysis using the default pre-trained model

In [None]:
!pip install datasets evaluate transformers[sentencepiece]

In [4]:
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9598049521446228}]

Providing a list of strings to the classifier:

In [5]:
classifier(["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"])

[{'label': 'POSITIVE', 'score': 0.9598049521446228},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

Example of translation specifying the model:

In [None]:
!pip install sacremoses

In [7]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")
translator("Have a nice day!")

[{'translation_text': '¡Que tengas un buen día!'}]

[More examples of NLP applications based on Transformers](https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter1/section3.ipynb)

* Function [pipeline()](https://huggingface.co/docs/transformers/pipeline_tutorial):
<ul>
* First parameter defines the task
* Default parameters are selected, if not specified
* There are general and task-specific parameters
</ul>
* General parameters: model=str, device=n, batch size=n
* Model is downloaded and cached when creating classifier object
* Three main steps inside pipeline():
<ul>
* The text is preprocessed into model format
* The preprocessed inputs are passed to the model.
* Model predictions are post-processed
</ul>

Check out [all the pre-trained models available](https://huggingface.co/models)

**Exercise:** Use the [small version of the Whisper model from OpenAI](https://huggingface.co/openai/whisper-small) to recognise [this audio in Spanish](https://huggingface.co/datasets/Narsil/asr_dummy/resolve/285aeb6e0cb9a9dbba1ce9b16a98f0b1655d4884/4.flac) generating as a maximum 30 tokens.

In [9]:
from transformers import pipeline
generator = pipeline(task="automatic-speech-recognition",model="openai/whisper-small",max_new_tokens=30)
generator("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/285aeb6e0cb9a9dbba1ce9b16a98f0b1655d4884/4.flac")

{'text': ' Y en las ramas medio sumergidas revoloteaban algunos pájaros de quimérico y legendario plumaque.'}