# Usando modelos pre-entrenados de HuggingFace
En este demo vamos a probar modelos disponibles en [HuggingFace](https://https://huggingface.co/), un repositorio masivo de modelos y datasets para diversas aplicaciones de NLP, Vision, Audio, entre otros.

Como libreria backbone emplearemos Transformers, que a su vez extiende pytorch.

## La abstraccion "pipeline"

Los pipelines de transformers son una forma facil de utilizar modelos para hacer predicciones o inferencia.
Un pipeline ofrece un API simple, disponible para diversas tareas de clasificacion (ej Sentiment Analysis, Named Entity Recognition, etc) y tareas de generacion (ej. Question Answering, Summarization, etc).
Naturalmente, podemos usar modelos entrenado en tareas de otras areas de IA, como Computer Vision, Audio Generation, y mas. Mas informacion en las tareas disponibles, [aqui](https://huggingface.co/docs/transformers/en/task_summary) .



En este ejemplo, vamos a usar un pipeline de *Sentiment Analysis*.
El modelo por defecto (distilbert-base-uncased) fue entrenado para clasificar texto en POSITIVO y NEGATIVO.

In [1]:
from transformers import pipeline

sentiment_pipeline = pipeline("sentiment-analysis")
data = [
    "I love you",
    "I hate you"
]
sentiment_pipeline(data)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9998656511306763},
 {'label': 'NEGATIVE', 'score': 0.9991129040718079}]

## Usando un modelo en especifico
Para usar un modelo del hub, especifique su nombre. Para mas modelos, vea [link text](https://huggingface.co/datasets).

El modelo usado debajo fue entrenado para clasificar un text en categorias de rating, de 1 a 5 stars, y es capaz de clasificar texto en 5 idiomas:  Ingles, Holandes, Aleman, Frances, Espanhol, e Italiano. Mas detalles en [link text](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment)

In [None]:
# Using a specific model for sentiment analysis
specific_model = pipeline(model="nlptown/bert-base-multilingual-uncased-sentiment")

data = [
    "Me encanta esta maquina",
    "I don't like this product",
    "No me gusta this movie"
]


specific_model(data)

config.json:   0%|          | 0.00/953 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/669M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/872k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Device set to use cpu


[{'label': '5 stars', 'score': 0.8067932724952698},
 {'label': '1 star', 'score': 0.571632444858551},
 {'label': '1 star', 'score': 0.5722576379776001}]

## Pipeline de generacion de texto

In [None]:
from transformers import pipeline
from pprint import pprint

summ_pipeline = pipeline(task="summarization")


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


In [None]:
document = """
In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most
commonly used in encoder-decoder architectures with multi-headed self-attention. For translation tasks, the Transformer can be trained significantly
faster than architectures based on recurrent or convolutional layers. On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks,
we achieve a new state of the art. In the former task our best model outperforms even all previously reported ensembles.
"""
outputs = summ_pipeline(document, max_length=64)
pprint(outputs[0]["summary_text"])

(' The Transformer is the first sequence transduction model based entirely on '
 'attention . It replaces recurrent layers most commonly used in '
 'encoder-decoder architectures with multi-headed self-attention . For '
 'translation tasks, the Transformer can be trained significantly faster than '
 'architectures based on recurrent or convolutional layers .')


## Cargando modelos manualmente

Aun cuando *pipeline* ofrece una forma facil de usar modelos pre-entrenados, en aplicaciones especializadas necesitaremos implementar los modulos del pipeline manualmente.
En este caso, estos modulos son *Tokenizer* y *Model*.

Mas informacion:
* [Tokenizers](https://huggingface.co/learn/nlp-course/en/chapter2/4)
* [Models](https://huggingface.co/models)


In [2]:
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
)

tokenizer = AutoTokenizer.from_pretrained('gpt2')
model = AutoModelForCausalLM.from_pretrained('gpt2')

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [3]:
text = "I saw a bird"
inputs = tokenizer(text, return_tensors="pt")
print(inputs)

{'input_ids': tensor([[  40, 2497,  257, 6512]]), 'attention_mask': tensor([[1, 1, 1, 1]])}


In [4]:
outputs = model.generate(**inputs)
print(outputs)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


tensor([[  40, 2497,  257, 6512,  326,  373,  546,  284, 6129, 1497,   13,  314,
          373,  588,   11,  705, 5812,  616, 1793,   11,  326,  338,  257, 6512]])


In [5]:
decoded_text = tokenizer.batch_decode(outputs)
print(decoded_text)

["I saw a bird that was about to fly away. I was like, 'Oh my God, that's a bird"]
