In [None]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.21.2-py3-none-any.whl (4.7 MB)
[K     |████████████████████████████████| 4.7 MB 5.2 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 39.4 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.9.1-py3-none-any.whl (120 kB)
[K     |████████████████████████████████| 120 kB 46.8 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.9.1 tokenizers-0.12.1 transformers-4.21.2


In [None]:
seq1 = """
This restaurant has beautiful outdoor seating and exceptional customer service.
They start you off with some bread and cookies and the bread is divine.
I ordered the quesadillas placeras which is corn tortillas with cheese,
onion and squash blossom flowers with a side of guacamole.
It really filled me up and was only 100 pesos.
They have a menu in English and were following good Covid measures.
"""

In [None]:
seq2 = """
Le compre esta webcam a mi sobrino por cuestiones de la escuela y le funciona tal cual para ello.
La calidad de imagen obviamente no es la mejor.
Pues solamente la usa para que vean que esta presente y listo. Por eso mismo no pidió algo mas.
"""

In [None]:
seq3 = """
Wir waren alle gespannt auf das Geisterhaus besonders unsere Tochter - war ein Weihnachtsgeschenk. Es ist an sich sehr nice mit einigen Gimmicks und einer Taschenlampe mit Gruselgeräuschen. Jetzt das große Aber: Auf dem Foto bzw. Karton sieht das Haus echt groß aus, in Wirklichkeit es es jedoch echt unerwartet klein - es ist etwa nur halb so groß wie das Playmobil Dollhouse, welches wir auch besitzen. Zeitgleich haben wir passend dazu The Mystery Machine gekauft - auch hier: im Vergleich zum Hau
"""

In [None]:
seq4 = """
I was supposed to eat here as my first meal in CDMX. However, there was a huge delay at the airport and I'm getting to my hotel which was a 2-minute walk from Casa de Toño. So, I got there around 9pm on a cold rainy evening and was getting hangry and saw a very long line with a 30-40 minute wait. Nope!
"""

### Análisis de Sentimientos

BERT (*Bidirectional Encoder Representations from Transformers*) o Representación de Codificador Bidireccional de Transformadores es una técnica basada en redes neuronales para el procesamiento del lenguaje natural (NLP) desarrollada por Google.

https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

In [None]:
tokenizer = AutoTokenizer.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")

Downloading tokenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

Downloading config.json:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading vocab.txt:   0%|          | 0.00/851k [00:00<?, ?B/s]

Downloading special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/638M [00:00<?, ?B/s]

In [None]:
from torch import argmax
from torch.nn.functional import softmax

In [None]:
tokens = tokenizer.encode(seq4, return_tensors='pt')
result = model(tokens)

probabilities = {}
probas = softmax(result.logits, dim=1)[0]
probabilities = {str(x): y.item() for x,y in zip(range(1,6),probas)}

mapping = {
    "1": "Very Negative",
    "2": "Negative",
    "3": "Neutral",
    "4": "Positive",
    "5": "Very Positive"
}

stars = str(int(argmax(result.logits))+1)
print(probabilities)
print(mapping[stars])

{'1': 0.3319258987903595, '2': 0.42771202325820923, '3': 0.187200665473938, '4': 0.040829624980688095, '5': 0.012331805191934109}
Negative


### Generación de texto

GPT-2 (*Generative Pre-trained Transformer 2*) es un sistema que hace uso de la inteligencia artificial para generar textos creada en febrero del 2019 por OpenAI, un laboratorio de investigación impulsado por Elon Musk. Se trata de un sistema formado por 1.5 billones de parámetros que generan texto prediciendo palabra a palabra. De esta forma es capaz de traducir textos automáticamente, responder preguntas, resumir fragmentos de texto. Este texto generado es extraído de unas 8 millones de páginas de Internet, por lo que cuenta con un conjunto de datos de unos 40GB de texto para utilizar.

https://huggingface.co/gpt2

In [None]:
from transformers import pipeline, set_seed

In [None]:
generator = pipeline("text-generation", model="gpt2")

Downloading config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [None]:
set_seed(42)
sequence = "Hello, I'm an artifial intelligence that,"
generated_text = generator(sequence, max_length=50, num_return_sequences=1)[0]["generated_text"]
print(generated_text)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Hello, I'm an artifial intelligence that, when I first was born, I thought that my eyesight was completely fine. In fact, my eyesight was almost nonexistent. I wanted to go to college, and I couldn't afford it!


### Clasificación de disparo cero (Zero-shot classification)

El aprendizaje de disparo cero (zero-shot learning) es un tipo de aprendizaje automático, donde en la fase de prueba se observan muestras de clases que no se observaron durante el entrenamiento, y necesita predecir la clase a la que pertenece.

Los métodos de disparo cero generalmente funcionan asociando clases observadas y no observadas a través de algún tipo de información auxiliar, que codifica las propiedades distintivas observables de los objetos. Por ejemplo, dado un conjunto de imágenes de animales para clasificar, junto con descripciones textuales auxiliares de cómo se ven los animales, un modelo de inteligencia artificial que ha sido entrenado para reconocer caballos, pero nunca se le han dado imágenes de una cebras, puede reconocer una cebra.

El aprendizaje de disparo cero se da cuando algunas etiquetas no estan disponibles en el conjunto de entrenamiento.

In [None]:
from transformers import pipeline

In [None]:
zero_shot = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading config.json:   0%|          | 0.00/1.13k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [None]:
output = zero_shot("This is a course about Deep Learning and Big Data", candidate_labels=["education", "politics", "business"])

In [None]:
output

{'sequence': 'This is a course about Deep Learning and Big Data',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.5277082920074463, 0.35652944445610046, 0.11576218158006668]}