# Exercise: Generating one token at a time

In this exercise, we will get to understand how an LLM generates text--one token at a time, using the previous tokens to predict the following ones.

## Step 1. Load a tokenizer and a model

First we load a tokenizer and a model from HuggingFace's transformers library. A tokenizer is a function that splits a string into a list of numbers that the model can understand.

In this exercise, all the code will be written for you. All you need to do is follow along!

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer

In [2]:
# To load a pretrained model and a tokenizer using Hugging Face, we only need two lines of code.
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


In [3]:
# We create a partial sentence and tokenize it.

inputs = tokenizer("I love Generative AI", return_tensors="pt")

In [4]:
# Show the tokens as numbers, i.e. "input_ids"
print(inputs["input_ids"])

tensor([[  40, 1842, 2980,  876, 9552]])


In [5]:
inputs

{'input_ids': tensor([[  40, 1842, 2980,  876, 9552]]), 'attention_mask': tensor([[1, 1, 1, 1, 1]])}

## Step 2. Examine the tokenization

Let's explore what these tokens mean!

In [6]:
# Show how the sentence is tokenized
import pandas as pd



In [7]:
def show_tokenization(inputs):
    return pd.DataFrame(
        [(id, tokenizer.decode(id)) for id in inputs["input_ids"][0]],
        columns =["id", "token"],
    )
    
    

In [8]:
show_tokenization(inputs)

Unnamed: 0,id,token
0,tensor(40),I
1,tensor(1842),love
2,tensor(2980),Gener
3,tensor(876),ative
4,tensor(9552),AI


### Subword tokenization

The interesting thing is that tokens in this case are neither just letters nor just words. Sometimes shorter words are represented by a single token, but other times a single token represents a part of a word, or even a single letter. This is called subword tokenization.

## Step 2. Calculate the probability of the next token

Now let's use PyTorch to calculate the probability of the next token given the previous ones.