## Validation of GPT2 traied on TinyStories

Load the model and tokenizer
Model checkpoint from training `gpt2_tinystories.py` : `gpt2-tinystories-final`


In [1]:
import torch
from transformers import (
    GPT2LMHeadModel,
    GPT2TokenizerFast,
)

# Load tokenizer
tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
tokenizer.pad_token = tokenizer.eos_token

# Load model and move to GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = GPT2LMHeadModel.from_pretrained('gpt2-tinystories-fr-scratch-final')
model.to(device)
model.eval()

print("Model loaded on: ", device)

Model loaded on:  cuda


### Generate sample stories given prompts

In [2]:
# Test generation function
def generate_text(prompt, max_length=100):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    attention_mask = inputs['attention_mask'].to(model.device)
    outputs = model.generate(
        inputs.input_ids,
        attention_mask=attention_mask,
        max_length=max_length,
        num_return_sequences=1,
        no_repeat_ngram_size=2,
        do_sample=True,
        top_k=50,
        top_p=0.95,
        temperature=0.7
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Test the model
test_prompts = [
    "Once upon a time",
    "The little dog",
    "In the garden"
]

print("\nTesting model generation:")
for prompt in test_prompts:
    print(f"\nPrompt: {prompt}")
    print(f"Generated text: {generate_text(prompt)}")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Testing model generation:

Prompt: Once upon a time


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated text: Once upon a time, there was a little girl named Lily. She loved to play in the park with her friends. One day, Lily's mom took her to the store to buy some milk. While they were there, a big dog came running towards them. Lily got scared and started to cry. 

Her mom told her not to worry and that they would be safe in their house. They bought some ice cream and went back to their car. When they got home, they sat

Prompt: The little dog


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated text: The little dog was walking down the street. He was so excited! He had never been so close to the playground before.

As he walked, he saw a tall slide. It looked so fun! The dog ran to it and started to climb. His little legs took him up the slide and he felt like he was flying!
 
Suddenly, the little boy saw the dog and shouted, "Hey, that's my slide! Give it back!"
 The little kid was angry

Prompt: In the garden
Generated text: In the garden, there was a big, red wagon. It was so big that it could carry lots of things. Little Jane was playing with her toys when she saw something shiny in the corner. She went over to take a look and saw it was an onion.

Jane was very excited and she said to her mom, "Mommy, look! An onion!" Her mom smiled and said, â€œThatâ€™s an important onion, Jane. Letâ


### Use an interactive demo

This will prompt the user for the beginning of a story and then generate the rest of the story.
Type `exit` to quit the demo.

```python

In [3]:
# make an interactive prompt that asks for user input
def interactive_prompt():
    while True:
        prompt = input("\nEnter a prompt: ")
        if prompt.lower() == "exit":
            break
        print(f"Generated text: {generate_text(prompt)}")
        
interactive_prompt()

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated text: At the river, there was a little girl named Lily. She was three years old. One day, she was walking near the edge of the lake when she saw something incredible. It was an enormous boat!

Lily was so excited! She wanted to get closer and take a closer look. So, with one big push, Lily started to pull the boat closer to the shore.
 
When she reached the beach, the sun was shining so brightly that it made the water spark


### Evaluate the model on a validation set

Using perplexity as the evaluation metric

```python

In [4]:
import math
from transformers import DataCollatorForLanguageModeling
from torch.utils.data import DataLoader
from datasets import load_dataset
from IPython.display import display
import ipywidgets as widgets


dataset = load_dataset("roneneldan/TinyStories")

def tokenize_function(examples):
    """Tokenize dataset examples."""
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=512,
        padding="max_length"
    )

# Tokenize test dataset
tokenized_test = dataset["validation"].map(
    tokenize_function,
    batched=True,
    remove_columns=dataset["validation"].column_names,
    num_proc=4  # Parallel processing
)

# Create data collator
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

# Create dataloader
test_dataloader = DataLoader(
    tokenized_test,
    batch_size=8,
    collate_fn=data_collator,
    shuffle=False
)

# Initialize metrics
total_loss = 0
total_tokens = 0


device = model.device

progress_bar = widgets.IntProgress(min=0, max=len(test_dataloader), description='Evaluating')
display(progress_bar)


# Evaluation loop
with torch.no_grad():
    for i, batch in enumerate(test_dataloader):
        # Prepare input
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = input_ids.clone()
        
        # Forward pass
        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            labels=labels
        )
        
        # Calculate loss
        loss = outputs.loss
        
        # Count non-padding tokens
        non_pad_mask = labels.ne(tokenizer.pad_token_id)
        num_tokens = non_pad_mask.sum().item()
        
        # Accumulate loss and token count
        total_loss += loss.item() * num_tokens
        total_tokens += num_tokens
        
        progress_bar.value += 1

# Calculate perplexity
avg_loss = total_loss / total_tokens
perplexity = math.exp(avg_loss)

print(f"\nResults on TinyStories validation set:")
print(f"Average Loss: {avg_loss:.4f}")
print(f"Perplexity: {perplexity:.4f}")
print(f"Total tokens evaluated: {total_tokens}")

Map (num_proc=4):   0%|          | 0/21990 [00:00<?, ? examples/s]

IntProgress(value=0, description='Evaluating', max=2749)


Results on TinyStories validation set:
Average Loss: 14.6298
Perplexity: 2257549.5746
Total tokens evaluated: 4675638


In [5]:
def calculate_perplexity(text, model, tokenizer, device="cuda"):
    # Encode text
    encodings = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        max_length=512
    )
    
    # Move to device
    input_ids = encodings["input_ids"].to(device)
    
    # Calculate perplexity
    with torch.no_grad():
        outputs = model(input_ids, labels=input_ids)
        loss = outputs.loss
    
    return math.exp(loss.item())

# Test the model
texts = [
    "Once upon a time",
    "The little dog",
    "In the garden"
]

# Calculate perplexity for each text
for text in texts:
    perplexity = calculate_perplexity(text, model, tokenizer, device)
    print(f"\nText: {text}")
    print(f"Perplexity: {perplexity:.2f}")


Text: Once upon a time
Perplexity: 1.01

Text: The little dog
Perplexity: 38.56

Text: In the garden
Perplexity: 5.08


In [6]:
# Evaluation loop
total_loss = 0
total_perplexity = 0
counter = 0
with torch.no_grad():
    for i, batch in enumerate(test_dataloader):
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device) 
        labels = input_ids.clone()
        
        # Calculate perplexity
        with torch.no_grad():
            outputs = model(input_ids, labels=input_ids)
            loss = outputs.loss
            total_loss += loss.item()
            total_perplexity += math.exp(loss.item())
            counter += 1
            
        progress_bar.value += 1
       
# Calculate average loss and perplexity
avg_loss = total_loss / counter
perplexity = total_perplexity / counter

print(f"Average Loss: {avg_loss:.4f}")
print(f"Perplexity: {perplexity:.4f}")

Average Loss: 10.3397
Perplexity: 86240.4700
