In [1]:
# install the requirements
# pip install torch torchvision
# pip install transformers

# Modelos de Lenguaje de OpenAI

A mitad de febrero, [OpenAI publicó un modelo de lenguaje](https://blog.openai.com/better-language-models/) capaz de generar lenguaje natural de formar coherente. Este modelo es generalista y, a pesar de ello, es capaz de rivalizar con los mejores sistemas específicos en tareas como comprensión automática de lenguaje natural, traducción automática, búsqueda de respuestas y resumen automático.

Este modelo, llamado GPT-2, es el resultado de haber entrenado con 8 millones de páginas web (40 GB) con 1 500 millones de parámetros con un único objetivo: predecir cuál es la siguiente palabra.

Sin embargo, OpenAI no ha publicado el modelo para evitar que alguien con malas intenciones pueda hacer un uso dañino de esta tecnología. Sí que han publicado una versión simplificada y más pequeña, y el paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf), en el que explican todo el proceso.

Con ganas y GPUs suficientes (+ tiempo y dinero), se puede replicar el proceso. Otras lecturas interesantes, sobre el tema: 

- [OpenAI's new Multitalented AI Writes, Translates, and Slanders](https://www.theverge.com/2019/2/14/18224704/ai-machine-learning-language-models-read-write-openai-gpt2)
- [Some thoughts on zero-day threats in AI, and OpenAI's GP2](https://www.fast.ai/2019/02/15/openai-gp2/)


Este código de ejemplo está inspirado en [un tweet de Thomas Wolf](https://twitter.com/Thom_Wolf/status/1097465312579072000), de [Hugging Face](https://huggingface.co/).

In [2]:
import torch
from torch.nn import functional as F
from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

A continuación, definimos una función para:

1. tokenizar el texto de entrada y codificarlo como un vector con los pesos obtenidos por el modelo GPT2
2. predecir la siguiente palabra más frecuente
3. decodificar el vector como una secuencia de tokens

In [3]:
def generate(text, length=50):
    """Generate automatic Natural Language from the input text"""
    vec_text = tokenizer.encode(text)
    my_input, past = torch.tensor([vec_text]), None

    for _ in range(length):
        logits, past = model(my_input, past_key_values=past)
        my_input = torch.multinomial(F.softmax(logits[:, -1], dim=1), 1)
        vec_text.append(my_input.item())

    return tokenizer.decode(vec_text)

In [4]:
# defino un texto de entrada
text = "The only think we can do to fight climate change is"

# y generamos automáticamente las secuencias más probables
for _ in range(3):
    print(generate(text, 35), "\n")
    print("-"*50)

The only think we can do to fight climate change is stem the source of the bunsen system," said Kanaana Read, office manager for the Farm Animal Confidence Association.

Watch Grant Britain's Open Road announcement including 

--------------------------------------------------
The only think we can do to fight climate change is to help Canadian farmers," Gunden said. "On the other hand, some of the smaller farmers have opened up new locations and some GMO crops are being introduced in the 

--------------------------------------------------
The only think we can do to fight climate change is say NO else, and build a platform and great response to it," he quipped. "And you'll be in a better position now to criticize immigrants and immigrants groups that 

--------------------------------------------------


In [5]:
countries = "Spain France Italy Greece Russia China Japan India".split()

for country in countries:
    text = f"Since I was born in {country} my mother language is"
    print(generate(text, 1))

Since I was born in Spain my mother language is Spanish
Since I was born in France my mother language is French
Since I was born in Italy my mother language is Italian
Since I was born in Greece my mother language is som
Since I was born in Russia my mother language is Russian
Since I was born in China my mother language is lit
Since I was born in Japan my mother language is j
Since I was born in India my mother language is Hindi


In [6]:
for _ in range(10):
    text = "I'm thirsty and I need to drink a glass of"
    print(generate(text, 1))

I'm thirsty and I need to drink a glass of water
I'm thirsty and I need to drink a glass of wine
I'm thirsty and I need to drink a glass of water
I'm thirsty and I need to drink a glass of wine
I'm thirsty and I need to drink a glass of water
I'm thirsty and I need to drink a glass of wine
I'm thirsty and I need to drink a glass of water
I'm thirsty and I need to drink a glass of water
I'm thirsty and I need to drink a glass of water
I'm thirsty and I need to drink a glass of water


In [7]:
for _ in range(10):
    text = "After 10 years practicing kung-fu I'm a"
    print(generate(text, 3))

After 10 years practicing kung-fu I'm a bit heavy-
After 10 years practicing kung-fu I'm a more experienced force
After 10 years practicing kung-fu I'm a big fan of
After 10 years practicing kung-fu I'm a full-time
After 10 years practicing kung-fu I'm a bit old,
After 10 years practicing kung-fu I'm a big fan.
After 10 years practicing kung-fu I'm a number two in
After 10 years practicing kung-fu I'm a big believer in
After 10 years practicing kung-fu I'm a total novice,
After 10 years practicing kung-fu I'm a master of this


In [8]:
for _ in range(10):
    text = "To become a good person, you need to"
    print(generate(text, 5))

To become a good person, you need to change your mindset, or
To become a good person, you need to be someone of integrity who
To become a good person, you need to learn the things that're
To become a good person, you need to know the big stories and
To become a good person, you need to be deaf. If you
To become a good person, you need to know how to live your
To become a good person, you need to have self-writing.
To become a good person, you need to go through a little bit
To become a good person, you need to have an idea of what
To become a good person, you need to do everything within your power
