# Interactive Generation with Your Pretrained GPT

Use this notebook to load a pretrained (or SFT) GPT checkpoint from this repo and generate text.

- Adjust the checkpoint path and model hyperparameters to match your training run
- Tokenizer is set up with the special tokens used in this project
- Generation is simple and fast; for longer/creative outputs, increase `max_new_tokens`

Tip: If you fine-tuned the model (SFT), you can point `checkpoint_path` to an SFT checkpoint instead.


In [None]:
# Setup: imports and utility paths
import os
import torch
from transformers import AutoTokenizer

import gpt  # uses this repo's GPT implementation

# Prefer GPU if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")


Using device: cuda:7


In [38]:
# Configuration: set your checkpoint path and model config here
# NOTE: Adjust to match the model you trained.

# Example default paths (edit as needed)
# - Pretraining outputs typically saved by pretrain_gpt.py
# - SFT outputs typically saved by sft_gpt.py
 #= "/shared/0/projects/teaching/eecs595/models/pico-gpt/pretrained-models/"  # <- change me
checkpoint_path = '/shared/0/projects/teaching/eecs595/models/pico-gpt/pretrained-models/gpt.1B-2-epoch.rope.model.pth'

# Model architecture used during training
config = {
    "vocab_size": None,          # will be computed from tokenizer below
    "context_length": 1024,
    "emb_dim": 512,
    "n_heads": 8,
    "n_layers": 12,
    "drop_rate": 0.0,
    "qkv_bias": False,
}

# Generation defaults (you can tweak later)
max_new_tokens = 128
temperature = 0.8
print(checkpoint_path)

/shared/0/projects/teaching/eecs595/models/pico-gpt/pretrained-models/gpt.1B-2-epoch.rope.model.pth


In [39]:
# Tokenizer: use repo's setup to ensure special tokens match training
# This mirrors `gpt.setup_tokenizer()` behavior.

tokenizer = gpt.setup_tokenizer()

# Compute actual vocab size that includes special tokens used during training
special_tokens = ["<|user|>", "<|assistant|>", "<|end|>", "<|system|>", "<|pad|>"]
max_token_id = max(tokenizer.convert_tokens_to_ids(tok) for tok in special_tokens)
actual_vocab_size = max_token_id + 1
print(f"Tokenizer vocab size detected: {actual_vocab_size}")

if config["vocab_size"] is None:
    config["vocab_size"] = actual_vocab_size


Original tokenizer:
Tokens: [15496, 11, 616, 1438, 318, 6035, 1763, 741]
Decoded: Hello, my name is Dan Pressel
Original vocab size: 50257
=== Adding Special Tokens ===
After adding special tokens:
New vocab size: 50257
Special tokens: {'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|pad|>', 'additional_special_tokens': ['<|system|>', '<|user|>', '<|assistant|>', '<|end|>']}
Tokens with special tokens: [15496, 11, 616, 1438, 318, 6035, 1763, 741]
Decoded: Hello, my name is Dan Pressel
Tokenizer vocab size detected: 50262


In [40]:
# Load model and checkpoint

# Create model in CPU first to avoid GPU OOM on load
model = gpt.GPTModel(config)

print(checkpoint_path)
# Load checkpoint state dict (map to CPU, then move)
state_dict = torch.load(checkpoint_path, map_location='cpu')

# Backward-compat: fix a known typo from some checkpoints
if 'embbeding.token_embeddings.weight' in state_dict:
    corrected = {}
    for k, v in state_dict.items():
        corrected[k.replace('embbeding.', 'embedding.')] = v
    state_dict = corrected

# Validate vocab size compatibility with checkpoint
original_vocab_size = state_dict['embedding.token_embeddings.weight'].shape[0]
if original_vocab_size != config['vocab_size']:
    raise ValueError(f"Vocabulary size mismatch: checkpoint={original_vocab_size}, expected={config['vocab_size']}")

missing, unexpected = model.load_state_dict(state_dict, strict=False)
print(f"Loaded state_dict. Missing keys: {len(missing)}, Unexpected keys: {len(unexpected)}")

# Move model to device
model = model.to(device)
model.eval()

print("Model loaded successfully!")


/shared/0/projects/teaching/eecs595/models/pico-gpt/pretrained-models/gpt.1B-2-epoch.rope.model.pth
Loaded state_dict. Missing keys: 0, Unexpected keys: 0
Model loaded successfully!


In [None]:
# Simple generation helper using repo's functions

def generate(prompt: str, max_new_tokens: int = None, temperature: float = None):
    max_tokens = max_new_tokens if max_new_tokens is not None else globals().get('max_new_tokens', 128)
    temp = temperature if temperature is not None else globals().get('temperature', 1.0)
    with torch.no_grad():
        model_device = next(model.parameters()).device
        encoded = tokenizer.encode(prompt)
        input_ids = torch.tensor(encoded, dtype=torch.long, device=model_device).unsqueeze(0)
        out_ids = gpt.generate_new_tokens(
            model=model,
            idx=input_ids,
            max_new_tokens=max_tokens,
            context_size=config['context_length'],
            temperature=temp,
        )
        # Move to CPU for decoding only
        out_ids = out_ids.to('cpu')
        return tokenizer.decode(out_ids.squeeze(0).tolist())

# Quick smoke test
print(generate("Hello, my name is", max_new_tokens=32, temperature=0.9))


Hello, my name is my name, first cemeter, but I like many. His name is bintmo-sickiers.
Norris]. A name a cat


: 

## Chat-style prompting (optional)
You can format prompts with the special tokens used in training for chat-like behavior. Example:

```
<|system|>You are a helpful assistant.<|end|>
<|user|>Write a haiku about autumn leaves.<|end|>
<|assistant|>
```

Then call `generate(prompt)`; the model will continue the assistant response.
