## Generative Models with GPT-2

In [1]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import numpy as np
import pandas as pd

2022-10-13 12:29:21.838031: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-13 12:29:22.055732: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-13 12:29:22.807231: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-10-13 12:29:22.807353: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or 

#### Let's play with gpt2 and gpt2-xl
Note that we can use the Auto functions

In [2]:
model_name = 'gpt2'
# model_name = 'gpt2-xl'
device = 'cuda' if torch.cuda.is_available() else 'cpu'

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

In [3]:
seq = "Machine learning with PyTorch can do amazing"
print("\nInput sequence: ")
print(seq)


Input sequence: 
Machine learning with PyTorch can do amazing


In [4]:
inputs = tokenizer(seq, return_tensors="pt").to(device)
print("\nTokenized input data structure: ")
print(inputs)


Tokenized input data structure: 
{'input_ids': tensor([[37573,  4673,   351,  9485, 15884,   354,   460,   466,  4998]],
       device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}


In [5]:
input_ids = inputs["input_ids"]  # just IDS, no attn mask
print("\nToken IDs and their words: ")
for id in input_ids[0]:
  word = tokenizer.decode(id)
  print(id, word)


Token IDs and their words: 
tensor(37573, device='cuda:0') Machine
tensor(4673, device='cuda:0')  learning
tensor(351, device='cuda:0')  with
tensor(9485, device='cuda:0')  Py
tensor(15884, device='cuda:0') Tor
tensor(354, device='cuda:0') ch
tensor(460, device='cuda:0')  can
tensor(466, device='cuda:0')  do
tensor(4998, device='cuda:0')  amazing


#### Now let's run it through the model

In [6]:
with torch.no_grad():
  logits = model(**inputs).logits[:, -1, :]
print("\nAll logits for next word: ")
print(logits)
print(logits.shape)


All logits for next word: 
tensor([[-114.9653, -118.0909, -123.3015,  ..., -124.5989, -127.7998,
         -118.4347]], device='cuda:0')
torch.Size([1, 50257])


In [7]:
probs = torch.softmax(logits, dim=-1)
print("\nAll probabilities: ")
print(probs)


All probabilities: 
tensor([[2.7682e-05, 1.2155e-06, 6.6349e-09,  ..., 1.8128e-09, 7.3829e-11,
         8.6184e-07]], device='cuda:0')


In [8]:
pred_id = torch.argmax(logits).item()
pred_word = tokenizer.decode(pred_id)
pd.DataFrame([pred_id, logits[0, pred_id].cpu(), probs[0, pred_id].cpu(), pred_word], 
              index=['Token ID', 'Logits', 'Probability', 'Predicted Word'], columns =['Value'])

Unnamed: 0,Value
Token ID,1243
Logits,tensor(-104.5570)
Probability,tensor(0.9172)
Predicted Word,things


### Let's look a bit closer

In [9]:
import pandas as pd

#input_txt = "Transformers are the"
input_txt = "Transformers are the "                    # This one is interesting to see how things change
#input_txt = "Transformers are built using the"         # Note the word pieces

input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)
iterations = []
n_steps = 10
choices_per_step = 5

with torch.no_grad():
    for _ in range(n_steps):
        iteration = dict()
        iteration["Input"] = tokenizer.decode(input_ids[0])
        output = model(input_ids=input_ids)
        
        # Select logits of the first batch and the last token and apply softmax to get the probability
        next_token_logits = output.logits[0, -1, :]
        next_token_probs = torch.softmax(next_token_logits, dim=-1)
        sorted_ids = torch.argsort(next_token_probs, dim=-1, descending=True)
        
        # Store tokens with highest probabilities in our little table
        for choice_idx in range(choices_per_step):
            token_id = sorted_ids[choice_idx]
            token_prob = next_token_probs[token_id].cpu().numpy()
            token_choice = (
                f"{tokenizer.decode(token_id)} ({100 * token_prob:.2f}%)"
            )
            iteration[f"Choice {choice_idx+1}"] = token_choice
        iterations.append(iteration)

            
        # Append predicted next token to input
        input_ids = torch.cat([input_ids, sorted_ids[None, 0, None]], dim=-1)

pd.DataFrame(iterations)

Unnamed: 0,Input,Choice 1,Choice 2,Choice 3,Choice 4,Choice 5
0,Transformers are the,vern (14.87%),(11.98%),ids (7.52%),ices (4.99%),urs (4.02%)
1,Transformers are the vern,acular (94.56%),al (3.49%),ier (1.01%),iers (0.16%),us (0.08%)
2,Transformers are the vernacular,of (28.65%),for (17.40%),term (5.95%),name (5.18%),words (2.47%)
3,Transformers are the vernacular of,the (22.59%),all (1.54%),a (1.45%),our (0.75%),this (0.69%)
4,Transformers are the vernacular of the,game (1.31%),time (1.14%),world (1.08%),modern (0.92%),ancient (0.71%)
5,Transformers are the vernacular of the game,. (25.34%),", (22.62%)",'s (8.07%),and (5.68%),world (4.95%)
6,Transformers are the vernacular of the game.,\n (15.49%),They (11.45%),The (8.13%),In (2.92%),It (2.64%)
7,Transformers are the vernacular of the game.\n,\n (99.65%),The (0.04%),A (0.02%),I (0.01%),In (0.01%)
8,Transformers are the vernacular of the game.\n\n,The (9.98%),In (3.43%),Contents (3.10%),A (2.06%),They (1.84%)
9,Transformers are the vernacular of the game.\n...,game (5.70%),first (1.94%),main (1.30%),player (1.04%),""" (0.96%)"


### The generate method runs the transformer several steps

In [10]:
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)
output = model.generate(input_ids, max_new_tokens=n_steps, do_sample=False)
print(tokenizer.decode(output[0]))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Transformers are the vernacular of the game.

The game


In [11]:
max_length = 128
input_txt = """In a shocking finding, scientist discovered \
a herd of unicorns living in a remote, previously unexplored \
valley, in the Andes Mountains. Even more surprising to the \
researchers was the fact that the unicorns spoke perfect English.\n\n
"""
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)
output_greedy = model.generate(input_ids, max_length=max_length, do_sample=False)
print(tokenizer.decode(output_greedy[0]))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


"The unicorns were very intelligent, and they were very intelligent," said Dr. David S. Siegel, a professor of anthropology at the University of California, Berkeley. "They were very intelligent, and they were very intelligent, and they were very intelligent, and they were very intelligent, and they were very intelligent, and they were very intelligent, and they were very intelligent, and they were very


### Let's play around with sampling methods

In [12]:
# encode context the generation is conditioned on
input_ids = tokenizer.encode('I enjoy walking with my cute dog', return_tensors='pt').to(device)

In [13]:
# Some scoring functions

import torch.nn.functional as F

def log_probs_from_logits(logits, labels):
    logp = F.log_softmax(logits, dim=-1)
    logp_label = torch.gather(logp, 2, labels.unsqueeze(2)).squeeze(-1)
    return logp_label

def sequence_logprob(model, labels, input_len=0):
    with torch.no_grad():
        output = model(labels)
        log_probs = log_probs_from_logits(
            output.logits[:, :-1, :], labels[:, 1:])
        seq_log_prob = torch.sum(log_probs[:, input_len:])
    return seq_log_prob.cpu().numpy()

#### Greedy Search

In [14]:
# generate text until the output length (which includes the context length) reaches 50
greedy_output = model.generate(input_ids, max_length=50)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [15]:
logp = sequence_logprob(model, greedy_output, input_len=len(input_ids[0]))
print(tokenizer.decode(greedy_output[0]))
print(f"\nlog-prob: {logp:.2f}")

I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with my dog. I'm not sure if I'll ever be able to walk with my dog.

I'm not sure if I'll

log-prob: -38.90


#### Beam Search

In [16]:
beam_output = model.generate(
    input_ids, 
    max_length=50, 
    num_beams=5, 
    early_stopping=True
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [17]:
logp = sequence_logprob(model, beam_output, input_len=len(input_ids[0]))
print(tokenizer.decode(beam_output[0]))
print(f"\nlog-prob: {logp:.2f}")

I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with him again.

I'm not sure if I'll ever be able to walk with him again. I'm not sure if I'll

log-prob: -37.01


In [18]:
# set no_repeat_ngram_size to 2
beam_output = model.generate(
    input_ids, 
    max_length=50, 
    num_beams=5, 
    no_repeat_ngram_size=2, 
    early_stopping=True
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [19]:
logp = sequence_logprob(model, beam_output, input_len=len(input_ids[0]))
print(tokenizer.decode(beam_output[0]))
print(f"\nlog-prob: {logp:.2f}")

I enjoy walking with my cute dog, but I'm not sure if I'll ever be able to walk with him again.

I've been thinking about this for a while now, and I think it's time for me to take a break

log-prob: -52.53


#### Sampling within the probability distribution of words

In [20]:
# set seed to reproduce results. Feel free to change the seed though to get different results
torch.random.manual_seed(42)

# activate sampling and deactivate top_k by setting top_k sampling to 0
sample_output = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=50, 
    top_k=0
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [21]:
logp = sequence_logprob(model, sample_output, input_len=len(input_ids[0]))
print(tokenizer.decode(sample_output[0]))
print(f"\nlog-prob: {logp:.2f}")

I enjoy walking with my cute dog for the rest of the day, but this had me staying in an unusual room and not going on nights out with friends (which will always be wondered for a mere minute or so at this point).

I

log-prob: -132.17


#### Change the temperature

In [22]:
# use temperature to decrease the sensitivity to low probability candidates
sample_output = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=50, 
    top_k=0, 
    temperature=0.7
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [23]:
logp = sequence_logprob(model, sample_output, input_len=len(input_ids[0]))
print(tokenizer.decode(sample_output[0]))
print(f"\nlog-prob: {logp:.2f}")

I enjoy walking with my cute dog and playing around with him. I have a strong desire to have children, and I want them to be as healthy as possible. We do not mind having the dog play around in the backyard or outside. I really

log-prob: -97.50


#### Top-K sampling

In [24]:
# set top_k to 50
sample_output = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=50, 
    top_k=50
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [25]:
logp = sequence_logprob(model, sample_output, input_len=len(input_ids[0]))
print(tokenizer.decode(sample_output[0]))
print(f"\nlog-prob: {logp:.2f}")

I enjoy walking with my cute dog," said Boudreau.

Other breeds, such as dogs and dogs with arthritis, also have been found to have increased fitness.

"Our dog is very intelligent and enjoys to be around people,

log-prob: -102.27


#### Top-P sampling

In [26]:
# deactivate top_k sampling and sample only from 92% most likely words
sample_output = model.generate(
    input_ids, 
    do_sample=True, 
    max_length=50, 
    top_p=0.92, 
    top_k=0
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [27]:
logp = sequence_logprob(model, sample_output, input_len=len(input_ids[0]))
print(tokenizer.decode(sample_output[0]))
print(f"\nlog-prob: {logp:.2f}")

I enjoy walking with my cute dog and talking to him because he reminds me so much of me. I know he'll pick up on the little pictures and love the way he looks and I want to show him things that he doesn't find so hilarious

log-prob: -105.89


#### Combining Top-K and Top-P Sampling

In [28]:
# set top_k = 50 and set top_p = 0.95 and num_return_sequences = 3
sample_outputs = model.generate(
    input_ids,
    do_sample=True, 
    max_length=50, 
    top_k=25, 
    top_p=0.95, 
    num_return_sequences=3
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [29]:
sample_outputs

tensor([[   40,  2883,  6155,   351,   616, 13779,  3290,   287,   616,  1310,
          1363,    13,   679,  1595,   470,   804,   881,   588,   502,   379,
           477,    13,   314,  1107,  9144,   465,  3241,   284,  3703,   523,
           314,   655,   460,   470,  1037,   475,  8212,    11,   475,   340,
           318,  1107,  1593,   284,   502,   326,   339,  1595,   470,   787],
        [   40,  2883,  6155,   351,   616, 13779,  3290,   526,   628,   198,
           464,  3290,   290,   262,  1545,  1718,   257,  2513,   286,   511,
           898,   706,   484,  5284,   379,   262,  1363,    13,   628,   198,
             1,  1026,   373,  1611,   286, 13779,   553,   531,  4767,    89,
            13,   198,   198,   464,  3155,   338,  3290,   550,   550,   257],
        [   40,  2883,  6155,   351,   616, 13779,  3290,   284,  8073,    13,
           887,   314,   892,   340,   338,  1107,  1593,   329,   257,  1048,
           588,   502,   284,   651,   284,   760,

In [30]:
print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
    print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
    logp = sequence_logprob(model, sample_outputs, input_len=len(input_ids[0]))
    sample_outputs = sample_outputs[1:, :]
    print(f"\nlog-prob: {logp:.2f}\n")

Output:
----------------------------------------------------------------------------------------------------
0: I enjoy walking with my cute dog in my little home. He doesn't look much like me at all. I really appreciate his attention to detail so I just can't help but smile, but it is really important to me that he doesn't make

log-prob: -280.66

1: I enjoy walking with my cute dog."


The dog and the friend took a walk of their own after they arrived at the home.


"It was kind of cute," said Petz.

The couple's dog had had a

log-prob: -183.95

2: I enjoy walking with my cute dog to dinner. But I think it's really important for a person like me to get to know them as much as possible. It means that we can learn from each other, and we can share some of our own

log-prob: -81.51



#### New stuff

In [38]:
def do_gpt(prompt):
    input_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)
    sample_outputs = model.generate(
        input_ids,
        do_sample=True, 
        max_length=100, 
        top_k=25, 
        top_p=0.95
    )
    print(tokenizer.decode(sample_outputs[0]))

In [33]:
do_gpt("I love to eat pizza all day")

I love to eat pizza all day," he tells me, with an amused snort. "And I know I should be better at it. Because I know, like, when you eat something, you'll get a kick out of it."

He looks back at the pizza, which was his favorite, and takes a bite. "But when I get home…


In [37]:
do_gpt("It's the end of the world")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


It's the end of the world, it's what I do. I want to live life and make money and not feel like I'm not doing good things. I want people to see I'm not doing what they want to see. I want people to go out and see the things that I've done that have made the world better and the way I'm doing it, I want to make sure that those people know I don't care about anything."

In an interview with The Associated


In [39]:
do_gpt("My cat turned to me with a murderous look in his eyes")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


My cat turned to me with a murderous look in his eyes. "Are you going to tell me anything about what you're up to?" "No, I think you're going to tell me something about what I've been up to for a long time," she said. "This is the first time, though, I've heard any real information about who my cat is and what it's capable of doing." "I'll do anything to help," I said. Her face hardened in embarrassment.
