# Introdução Prática aos Modelos da Hugging Face
Este notebook apresenta os conceitos básicos para utilizar os modelos da Hugging Face, explorando:
- Como buscar modelos.
- Como carregar modelos pré-treinados.
- Como realizar inferência em tarefas como classificação, geração de texto e tradução.

Utilizando a biblioteca `transformers`.

In [None]:
# Instalar a biblioteca Hugging Face Transformers
!pip install transformers -y

## Buscar e Explorar Modelos
Os modelos estão disponíveis no site [https://huggingface.co/models](https://huggingface.co/models).

Eles são organizados por tarefas como:
- Classificação de Texto
- Geração de Texto
- Tradução
- Pergunta e Resposta
- Tokenização
- Embeddings

## Pipeline Simples
O pipeline é a maneira mais simples de utilizar modelos para tarefas comuns.

In [16]:
from transformers import pipeline

# Classificação de sentimento
classifier = pipeline('sentiment-analysis')
result = classifier('I love studying AI on saturday mornings!')
print(result)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


[{'label': 'POSITIVE', 'score': 0.9992026686668396}]


## Geração de Texto

In [17]:
generator = pipeline('text-generation', model='gpt2')
result = generator('Once upon a time,', max_length=30)
print(result)

Device set to use mps:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': "Once upon a time, I may have been the best of the best. I have been a loyal and trusted friend of the family.\n\nI was a member of the church before I ever got out of my parents' house. I know most of my friends and neighbors are not my friends. And though I have been baptized many times, I have never been able to get baptized.\n\nI will say that the time when I was a young boy was a wonderful time, a great time, a great time. However, I have never been able to get baptized.\n\nI was not baptized until I was 16 years old. I had no idea it would be that much of a change.\n\nI knew that I was going to be a Mormon and wanted to be a missionary. I didn't know what to do with the money that I was giving. I was so focused on getting to be a Mormon that I didn't even know if I was going to get baptized.\n\nI was a missionary at that time. I was making the difference in the world, and I was going to do something that the world couldn't do. I was going to do something that 

## Tradução de Idiomas

In [18]:
translator = pipeline('translation_en_to_fr')
result = translator('Machine learning is amazing!')
print(result)

No model was supplied, defaulted to google-t5/t5-base and revision a9723ea (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


[{'translation_text': "L'apprentissage par machine est fantastique!"}]


## Pergunta e Resposta com Contexto

In [None]:
qa = pipeline('question-answering')
result = qa({
  'context': 'Hugging Face is creating a tool that democratizes AI.',
  'question': 'What is Hugging Face creating?'
})
print(result)

## Trabalhar diretamente com modelos e tokenizadores

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')
model = AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')

inputs = tokenizer('I love AI', return_tensors='pt')
outputs = model(**inputs)
print(outputs.logits)

## Conclusão

A Hugging Face facilita o uso de modelos de linguagem de ponta com poucas linhas de código.