# Exercise: Generating one token at a time

In this exercise, we will get to understand how an LLM generates text--one token at a time, using the previous tokens to predict the following ones.

## Step 1. Load a tokenizer and a model

First we load a tokenizer and a model from HuggingFace's transformers library. A tokenizer is a function that splits a string into a list of numbers that the model can understand.

In this exercise, all the code will be written for you. All you need to do is follow along!

In [37]:
from transformers import AutoModelForCausalLM, AutoTokenizer

In [38]:
# To load a pretrained model and a tokenizer using Hugging Face, we only need two lines of code.
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")



In [39]:
# We create a partial sentence and tokenize it.
text = "Once upon a time, generative models"

inputs = tokenizer(text, return_tensors="pt")

In [40]:
# Show the tokens as numbers, i.e. "input_ids"
print(inputs["input_ids"])

tensor([[7454, 2402,  257,  640,   11, 1152,  876, 4981]])


In [41]:
inputs

{'input_ids': tensor([[7454, 2402,  257,  640,   11, 1152,  876, 4981]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]])}

## Step 2. Examine the tokenization

Let's explore what these tokens mean!

In [42]:
# Show how the sentence is tokenized
import pandas as pd



In [43]:
def show_tokenization(inputs):
    return pd.DataFrame(
        [(id, tokenizer.decode(id)) for id in inputs["input_ids"][0]],
        columns =["id", "token"],
    )
    
    

In [44]:
show_tokenization(inputs)

Unnamed: 0,id,token
0,tensor(7454),Once
1,tensor(2402),upon
2,tensor(257),a
3,tensor(640),time
4,tensor(11),","
5,tensor(1152),gener
6,tensor(876),ative
7,tensor(4981),models


### Subword tokenization

The interesting thing is that tokens in this case are neither just letters nor just words. Sometimes shorter words are represented by a single token, but other times a single token represents a part of a word, or even a single letter. This is called subword tokenization.

## Step 2. Calculate the probability of the next token

Now let's use PyTorch to calculate the probability of the next token given the previous ones.

In [45]:
# Calculate the probabilites for the next token for all possible choices. We show the
# top 5 choices and the corresponding words or subwords for these tokens.

import torch 

with torch.no_grad():
    logits = model(**inputs).logits[:, -1, :]
    probabilities = torch.nn.functional.softmax(logits[0], dim=-1)

In [46]:
def show_next_token_choices(probabilities, top_n=5):
    return pd.DataFrame(
        [
           (id, tokenizer.decode(id), p.item())
           for id, p in enumerate(probabilities) 
           if p.item()
        ],
        columns=["id", "token", "p"]
    ).sort_values("p", ascending=False)[:top_n]

In [47]:
show_next_token_choices(probabilities)

Unnamed: 0,id,token,p
286,286,of,0.194091
547,547,were,0.179078
389,389,are,0.052251
550,550,had,0.031235
423,423,have,0.031007


In [48]:
# Obtain the token id for the most probable next token
next_token_id = torch.argmax(probabilities).item()

In [49]:
print(f"Next token id: {next_token_id}")
print(f"Next token: {tokenizer.decode(next_token_id)}")

Next token id: 286
Next token:  of


In [50]:
#  We append the most likely token to the text. 
text += tokenizer.decode(next_token_id)
text

'Once upon a time, generative models of'

## Step 3. Generate some more tokens

The following cell will take `text`, show the most probable tokens to follow, and append the most likely token to text. Run the cell over and over to see it in action!

In [51]:
# Press ctrl + enter to run the code in a cell
from IPython.display import display, Markdown 

# Show the text
print(text)

Once upon a time, generative models of


In [None]:
# Convert to tokenss  
inputs = tokenizer(text, return_tensors="pt")