## Validation of GPT2 traied on TinyStories

Load the model and tokenizer
Model checkpoint from training `gpt2_tinystories.py` : `gpt2-tinystories-final`


In [1]:
import torch
from transformers import (
    GPT2LMHeadModel,
    GPT2TokenizerFast,
)

# Load tokenizer
tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
tokenizer.pad_token = tokenizer.eos_token

# Load model and move to GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = GPT2LMHeadModel.from_pretrained('gpt2-tinystories-final')
model.to(device)

print("Model loaded on: ", device)

Model loaded on:  cuda


### Generate sample stories given prompts

In [2]:
# Test generation function
def generate_text(prompt, max_length=100):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    attention_mask = inputs['attention_mask'].to(model.device)
    outputs = model.generate(
        inputs.input_ids,
        attention_mask=attention_mask,
        max_length=max_length,
        num_return_sequences=1,
        no_repeat_ngram_size=2,
        do_sample=True,
        top_k=50,
        top_p=0.95,
        temperature=0.7
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Test the model
test_prompts = [
    "Once upon a time",
    "The little dog",
    "In the garden"
]

print("\nTesting model generation:")
for prompt in test_prompts:
    print(f"\nPrompt: {prompt}")
    print(f"Generated text: {generate_text(prompt)}")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Testing model generation:

Prompt: Once upon a time


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated text: Once upon a time, there was a little girl named Lily. She loved to play outside in the park with her friends. One day, Lily and her friend Jack were playing hide-and-seek. Jack was hiding behind a big tree when he heard a loud noise. 

"What was that?" asked Jack.
   "I don't know," replied Lily, "but it sounds like someone's house is on fire." 

 Jack and Lily quickly ran to the nearby

Prompt: The little dog


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated text: The little dog was running around the yard and he saw a cable lying on the ground. He picked it up and started to play with it. Suddenly, he heard a loud noise. It was his mom calling him for lunch.

He ran to the kitchen to get something to eat. But when he got there, his dog's mom had already gone. She had left the cable on her desk. 
The dog didn't want to leave the cord alone. So he started running away

Prompt: In the garden
Generated text: In the garden, there was a little girl named Lucy. She had a toy garden with lots of flowers and vegetables. Every day, she would take her toy shovel and dig in the dirt.

One day Lucy found something very special. It was an old box with a mysterious lid. Lucy opened the lid and saw lots and lots inside. Inside were lots, but not all. There were many different kinds of vegetables, too. 
   Lucy picked up a few of the vegetables


### Use an interactive demo

This will prompt the user for the beginning of a story and then generate the rest of the story.
Type `exit` to quit the demo.

```python

In [3]:
# make an interactive prompt that asks for user input
def interactive_prompt():
    while True:
        prompt = input("\nEnter a prompt: ")
        if prompt.lower() == "exit":
            break
        print(f"Generated text: {generate_text(prompt)}")
        
interactive_prompt()

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated text: In the middle of the road was a beautiful beach. The sand was soft and the sun shone down on it.

Two children, Max and Lily, were playing in the sand. They ran and laughed as they built castles, dug tunnels and even built a sandcastle. 
Max had a bucket of water and he was pouring it on the ground. Lily laughed and said "It's so funny!"
  Max was playing with his bucket and having a great time. He was having


### Evaluate the model on a validation set

Using perplexity as the evaluation metric

```python

In [4]:
import math
from transformers import DataCollatorForLanguageModeling
from torch.utils.data import DataLoader
from datasets import load_dataset
from IPython.display import display
import ipywidgets as widgets


dataset = load_dataset("roneneldan/TinyStories")

def tokenize_function(examples):
    """Tokenize dataset examples."""
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=512,
        padding="max_length"
    )

# Tokenize test dataset
tokenized_test = dataset["validation"].map(
    tokenize_function,
    batched=True,
    remove_columns=dataset["validation"].column_names
)

# Create data collator
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

# Create dataloader
test_dataloader = DataLoader(
    tokenized_test,
    batch_size=8,
    collate_fn=data_collator
)

def compute_metrics(eval_pred):
    """Compute perplexity metric."""
    logits, labels = eval_pred
    shift_logits = logits[..., :-1, :].contiguous()
    shift_labels = labels[..., 1:].contiguous()
    
    loss = torch.nn.functional.cross_entropy(
        torch.from_numpy(shift_logits.reshape(-1, shift_logits.shape[-1])),
        torch.from_numpy(shift_labels.reshape(-1))
    )
    
    try:
        perplexity = math.exp(loss.item())
    except OverflowError:
        perplexity = float("inf")
    
    return {"perplexity": perplexity}

###################

device = model.device
# Evaluation loop
total_loss = 0
total_perplexity = 0
counter = 0
progress_bar = widgets.IntProgress(min=0, max=len(test_dataloader), description='Evaluating')
display(progress_bar)

with torch.no_grad():
    for i, batch in enumerate(test_dataloader):
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device) 
        labels = input_ids.clone()
        
        # Calculate perplexity
        with torch.no_grad():
            outputs = model(input_ids, labels=input_ids)
            loss = outputs.loss
            total_loss += loss.item()
            total_perplexity += math.exp(loss.item())
            counter += 1
            
        progress_bar.value += 1
       
# Calculate average loss and perplexity
avg_loss = total_loss / counter
perplexity = total_perplexity / counter

print(f"Average Loss: {avg_loss:.4f}")
print(f"Perplexity: {perplexity:.4f}")

IntProgress(value=0, description='Evaluating', max=2749)

Average Loss: 10.3397
Perplexity: 86240.4700


In [5]:
def calculate_perplexity(text, model, tokenizer, device="cuda"):
    # Encode text
    encodings = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        max_length=512
    )
    
    # Move to device
    input_ids = encodings["input_ids"].to(device)
    
    # Calculate perplexity
    with torch.no_grad():
        outputs = model(input_ids, labels=input_ids)
        loss = outputs.loss
    
    return math.exp(loss.item())

# Test the model
texts = [
    "Once upon a time",
    "The little dog",
    "In the garden"
]

# Calculate perplexity for each text
for text in texts:
    perplexity = calculate_perplexity(text, model, tokenizer, device)
    print(f"\nText: {text}")
    print(f"Perplexity: {perplexity:.2f}")


Text: Once upon a time
Perplexity: 1.01

Text: The little dog
Perplexity: 38.56

Text: In the garden
Perplexity: 5.08
