# Capitulo 1 - Modelos de Transformers

*Esse notebook contém anotações e traduções de algumas seções do seguinte capítulo* [🤗 LLM Course - Chapter 1 - Transfomer Models](https://huggingface.co/learn/llm-course/chapter1/) 

## Transformers, o que eles podem fazer?

### Trabalhando com pipelines

O objeto mais básico da library Transformers é a função `pipeline()`. Ela instancia o modelo desejado (no exemplo abaixo, usamos o modelo de de análise de sentimento) e aplica os pré-processamentos e pós-processamentos cabíveis a ele

In [13]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9598049521446228}]

Podemos até mesmo processar múltiplias sentenças!

In [14]:
classifier(
    [
        "I've been waiting for a HuggingFace course my whole life.", 
        "I hate this so much!"
    ]
)

[{'label': 'POSITIVE', 'score': 0.9598049521446228},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

Por padrão, esse pipeline seleciona um modelo pré-treinado que foi ajustado (*fine-tuned*) para análise de sentimentos em Inglês. Ao instanciar o objeto `classifier`, esse modelo é baixado e mantido localmente (*cached model*), evitando que nas próximas chamadas esse *download* seja feito novamente.

Quando enviamos sentenças para a pipeline, estão envolvidos três passos principais

1. O texto é pré-processado em um formato que o modelo possa entender
2. As entradas pré-processadas são enviados ao modelo
3. As predições do modelo são pós-processadas, para que você possa interpretá-las

### Pipelines disponíveis para diferentes modalidades

A função `pipeline()` dar suporte a outras modalidades além de texto, bem como imagem, áudio e combinações delas (*multimodal*).

> A lista completa de quais delas podem ser exploradas se encontra em [🤗 Transformers documentation](https://huggingface.co/docs/hub/en/models-tasks)

De forma geral, abaixo estão os modelos agrupados por modalidade, que podem ser passados como parâmetro para pipeline

Pipelines de texto
- `text-generation`: gerar texto de um prompt
- `text-classification`: classificar o text em categorias pré-definidas
- `summarization`: criar uma versão resumida de um texto, preservando informações importantes
- `translation`: traduzir texto de um idioma para o outro
- `zero-shot-classification`: classificar o texto sem treinamento prévio em rótulos específicos
- `featurue-extration`: extrair representação vetorial de texto

Pipelines de imagem
- `image-to-text`: gerar descrição textual de imagens
- `image-classification`: identificar objetos numa imagem
- `object-detection`: localizar e identificar objetos em imagens

Pipelines de áudio
- `automatic-speech-recognition`: converter fala em texto
- `audio-classification`: classificar áudio em categorias
- `text-to-speech`: converter texto em áudio falado

Pipelines multimodais
- `image-text-to-text`: responder a uma imagem com base em um prompt de texto

Vamos explorar alguns desses pipelines em mais detalhes!

### Classificação Zero-shot

In [15]:
classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445994257926941, 0.11197380721569061, 0.04342673346400261]}

### Geração de texto

In [16]:
generator = pipeline("text-generation")
generator("In this course, we will teach you how to")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "In this course, we will teach you how to install and use a Raspberry Pi to build your own projects. We will also show you how to use a Raspberry Pi to create your own web apps.\n\nWe're going to install a Raspberry Pi to run these tutorials, but for now, we want to explain a few things. The first thing that you will need to do is to install the following dependencies.\n\nRaspbian - Required\n\nsudo apt-get install libgcc-dev libreoffice-dev\n\nsudo apt-get install libelf-dev libboost-0.4-dev\n\nsudo apt-get install libboost-0.7-dev\n\nsudo apt-get install libssl-dev libgmp-dev libevent-dev\n\nYou'll need a file called libpcap.so. It's located here.\n\nWe'll also install a few packages from the official repository.\n\nInstall\n\npkg install pcap\n\nsudo apt-get install pcap-dev libmp4-dev libgmp2-dev\n\nsudo apt-get install libgmp2-dev libgcrypt-dev libssl-dev\n\nInstall\n\npkg install pcap-dev libcurl3-dev"}]

### Usando qualquer modelo do Hub em um pipeline

In [17]:
generator = pipeline("text-generation", model="HuggingFaceTB/SmolLM2-360M")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': "In this course, we will teach you how to make a better quality home theater system. We will cover not only the basics that you need to know to build a good system, but also some of the finer points that can make a difference.\n\nWith some common sense, you should be able to take a few minutes out of your day to build a system that will work for you. If you don't have the time to learn on your own, you can always hire a professional to do it for you.\n\nWhat You Will Learn\n\nWe will cover everything you need to know to build a good home theater system. We will cover everything from building the components to the installation.\n\nWhat You Will Get\n\nThis course will teach you how to build a home theater system. We will give you step-by-step instructions on how to build a home theater system. We will teach you how to choose the right components for your system. We will explain how to select the right speakers and speakers for your system.\n\nWe will teach you how to

### Preenchimento de máscara

In [18]:
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'score': 0.19619767367839813,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052715748548508,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

### Reconhecimento de entidades nomeada

In [19]:
ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'entity_group': 'PER',
  'score': np.float32(0.9981694),
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': np.float32(0.9796019),
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': np.float32(0.9932106),
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

### Respostas a Perguntas

In [20]:
question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'score': 0.6949766278266907, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}

### Resumo de texto

In [21]:
summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil,    electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .'}]

### Tradução

In [22]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

Device set to use cpu


[{'translation_text': 'This course is produced by Hugging Face.'}]

### Pipelines de imagem e áudio

#### Classificação de imagem

In [23]:
image_classifier = pipeline(
    task="image-classification", model="google/vit-base-patch16-224"
)
result = image_classifier(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
)
print(result)

Device set to use cpu


[{'label': 'lynx, catamount', 'score': 0.43349990248680115}, {'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor', 'score': 0.03479622304439545}, {'label': 'snow leopard, ounce, Panthera uncia', 'score': 0.032401926815509796}, {'label': 'Egyptian cat', 'score': 0.023944783955812454}, {'label': 'tiger cat', 'score': 0.02288925088942051}]


#### Reconhecimento Automático de Fala

In [24]:
transcriber = pipeline(
    task="automatic-speech-recognition", model="openai/whisper-large-v3"
)
result = transcriber(
    "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac"
)
print(result)

Device set to use cpu


{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}


### Combinando dados de múltiplas fontes

Uma aplicação robusta com Transformer é poder combinar e processar dados de fontes diferentes. Isso é especialmente útil quando você precisa:

1. Procurar entre múltiplas bases de dados ou repositórios
2. Consolidar informação de diferentes formatos (texto, imagem e áudio)
3. Criar uma visualização/unificação única das informações relacionadas

Por exemplo, você pode criar um sistema que:

- Procurar por informações entre bases de dados em múltiplas modalidades como texto e imagens
- Combinar resultados de diferentes fontes em uma única resposta. Por exemplo, de um arquivo de áudio e de um texto
- Apresentar as informações mais relevantes a partir de uma base de documentos e metadados