<a href="https://colab.research.google.com/github/evan-placenis/Pretrained-Transformer-Text-Generation/blob/main/Transformer_Text_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
pip install huggingface_hub



In [2]:
pip install transformers



In [34]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

In [6]:
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [35]:
max_length = 128
input_txt = """In a shocking finding, scientists discovered a heard of unicorns living in a \n
remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the \n
researchers was the fact that the unicorns spoke perfect english. \n\n
"""
input_ids = tokenizer(input_txt, return_tensors = "pt")["input_ids"]

Functions to compute the sequence log probabilities of the decoders

In [10]:
import torch.nn.functional as F
def log_probs_from_logits(logits, labels):
  logp = F.log_softmax(logits, dim = -1)
  logp_label = torch.gather(logp, 2, labels.unsqueeze(2)).squeeze(-1)
  return logp_label

def sequence_logprob(model, labels, input_len=0):
  with torch.no_grad():
    output = model(labels)
    log_probs = log_probs_from_logits(
        output.logits[:,:-1,:], labels[:, 1:])
    seq_log_prob = torch.sum(log_probs[:, input_len:])

  return seq_log_prob.numpy()

## Greedy Search

In [36]:
output_greedy = model.generate(input_ids, max_length = max_length, do_sample = False)
print(tokenizer.decode(output_greedy[0]))

In a shocking finding, scientists discovered a heard of unicorns living in a 

remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the 

researchers was the fact that the unicorns spoke perfect english. 


The unicorns were found in a small, isolated area of the Andes Mountains, in the Andes Mountains, in the Andes Mountains, in the Andes Mountains, in the Andes Mountains, in the Andes Mountains, in the Andes Mountains, in the Andes Mountains, in the Andes Mountains, in the Andes Mountains,


In [11]:
logp = sequence_logprob(model, output_greedy, input_len=len(input_ids[0]))
print(f"log-prob: {logp:.2f}")

log-prob: -62.87


## Beam Search Decoding

In [47]:
output_beam = model.generate(input_ids, max_length = max_length, num_beams = 5, do_sample = False)

print(tokenizer.decode(output_beam[0]))
logp = sequence_logprob(model, output_beam, input_len=len(input_ids[0]))
print(f"\nlog-prob: {logp:.2f}")

In a shocking finding, scientists discovered a heard of unicorns living in a 

remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the 

researchers was the fact that the unicorns spoke perfect english. 


According to the researchers, the unicorns were able to communicate with each other through the 

sound of their own voices. The unicorns were able to communicate with each other through the 

sound of their own voices. The unicorns were able to communicate with each other through the 

sound of their own voices. The unicorns

log-prob: -43.64


In [46]:
#stopping repetition by introducing an n-gram penalty
output_beam = model.generate(input_ids, max_length = max_length, num_beams = 5,
                             do_sample = False, no_repeat_ngram_size = 2)

print(tokenizer.decode(output_beam[0]))
logp = sequence_logprob(model, output_beam, input_len=len(input_ids[0]))
print(f"\nlog-prob: {logp:.2f}")

In a shocking finding, scientists discovered a heard of unicorns living in a 

remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the 

researchers was the fact that the unicorns spoke perfect english. 


According to a study published in Nature Communications, a team of researchers from the University of California, Santa Barbara, and the Université de Montréal in Paris, France, found that a group of <|endoftext|>

log-prob: -53.27


## Sampling Methods

In [48]:
output_temp = model.generate(input_ids, max_length = max_length,
                             do_sample = True, temperature = 2.0, top_k = 0)
print("Tempurature = 2:")
print(tokenizer.decode(output_temp[0]))

Tempurature = 2:
In a shocking finding, scientists discovered a heard of unicorns living in a 

remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the 

researchers was the fact that the unicorns spoke perfect english. 


AtSouthClean ca separated fingers thinather cancers ☭а Date satcosbf TRA transient flask sum Ter anti plate TC Borg Games Zar Agency ground access Nam thri automotive Mechanical firm Video Wyoming Turkey Sue United Pictures behindasu cheersDisclaimer took complaining fulfilling Crania FAA piecesury Meadow : Ack want PC Alleg Directory adventureair BMC Personal AIVo Mathahn Horizon continu run


In [49]:
output_temp = model.generate(input_ids, max_length = max_length,
                             do_sample = True, temperature = 0.5, top_k = 0)
print("Tempurature = 0.5:")
print(tokenizer.decode(output_temp[0]))

Tempurature = 0.5:
In a shocking finding, scientists discovered a heard of unicorns living in a 

remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the 

researchers was the fact that the unicorns spoke perfect english. 


The group of unicorns, which had been caught in the middle of a

forest and were caught by the same team of scientists, was able to speak the

English of the unicorns.


The researchers from the University of Colorado Boulder said, "The unicorns are very kind and

very intelligent and they are very intelligent.


##Top-k Sampling

In [50]:
output_topk = model.generate(input_ids, max_length = max_length,
                             do_sample = True, top_k = 50)

print(tokenizer.decode(output_topk[0]))

In a shocking finding, scientists discovered a heard of unicorns living in a 

remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the 

researchers was the fact that the unicorns spoke perfect english. 


A few dozen unicorns lived in the "Cumar Valley" during the 1980s and 1990s... according to The

Cumar Valley. And with only a few hundreds living at the site, there was little in terms of

population. Yet, they could speak English quite well: A third of all living in the


Top-p Sampling (nucleus)

In [51]:
output_top = model.generate(input_ids, max_length = max_length,
                             do_sample = True, top_p = 0.9)

print(tokenizer.decode(output_top[0]))

In a shocking finding, scientists discovered a heard of unicorns living in a 

remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the 

researchers was the fact that the unicorns spoke perfect english. 


The unicorns are not only in the Andes Mountains but also in Mexico, Central America and Argentina as well. 


Many studies of these unicorns have confirmed their existence.

One study found that unicorns can detect their parents and thus learn language. 

Another study has discovered that the unicorns may learn from a person.
