# Quick Start: Train a GPT-2 Style Model in 10 Minutes

This notebook demonstrates the simplest possible workflow:
1. Load a pre-built transformer model
2. Train on WikiText-2 dataset
3. Generate text samples

**Hardware**: Works on Colab free tier (T4 GPU)

**Time**: ~10 minutes

## Setup

Install dependencies and download utilities.

In [None]:
# Install dependencies
!pip install -q torch pytorch-lightning transformers datasets tokenizers

# Download utils package
!wget -q https://github.com/matt-hans/transformer-builder-colab-templates/archive/refs/heads/main.zip
!unzip -q main.zip
!mv transformer-builder-colab-templates-main/utils .
!rm -rf transformer-builder-colab-templates-main main.zip

## Load Model

For this example, we'll use a simple transformer model.
Replace this with your model from Transformer Builder!

In [None]:
import torch
import torch.nn as nn

# Simple transformer for demonstration
from transformers import GPT2Config, GPT2LMHeadModel

# Create small GPT-2 config
config = GPT2Config(
    vocab_size=50257,
    n_positions=512,
    n_embd=512,
    n_layer=6,
    n_head=8,
)

model = GPT2LMHeadModel(config)
print(f"Model created: {sum(p.numel() for p in model.parameters()):,} parameters")

## Train Model

One function does everything:
- Load WikiText-2 dataset
- Create GPT-2 tokenizer (exact vocab match)
- Train for 3 epochs
- Save best checkpoint

In [None]:
from utils.training import train_model

results = train_model(
    model=model,
    dataset='wikitext',
    config_name='wikitext-2-raw-v1',
    vocab_size=50257,
    max_epochs=3,
    batch_size=16,
    learning_rate=1e-4
)

print(f"\n✓ Training complete!")
print(f"Best checkpoint: {results['best_model_path']}")
print(f"Final metrics: {results['final_metrics']}")

## Generate Text

Test the trained model with text generation.

In [None]:
from transformers import GPT2Tokenizer

# Load tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# Get trained model
trained_model = results['model'].model  # Extract from adapter
trained_model.eval()

# Generate text
prompt = "The transformer architecture"
input_ids = tokenizer.encode(prompt, return_tensors='pt')

with torch.no_grad():
    output = trained_model.generate(
        input_ids,
        max_length=100,
        num_return_sequences=3,
        temperature=0.8,
        top_k=50,
        do_sample=True
    )

print("\n" + "="*80)
print("Generated Samples")
print("="*80 + "\n")

for i, sample in enumerate(output, 1):
    text = tokenizer.decode(sample, skip_special_tokens=True)
    print(f"Sample {i}:")
    print(text)
    print()

## Summary

Congratulations! You've:
- ✅ Trained a transformer model
- ✅ Saved checkpoints automatically
- ✅ Generated text samples

### Next Steps

- **Customize**: Adjust hyperparameters (epochs, batch size, learning rate)
- **Use Your Model**: Replace the demo model with your Transformer Builder model
- **Export**: See `04_model_export.ipynb` for ONNX/TorchScript export
- **Advanced**: Check `03_large_scale_training.ipynb` for multi-GPU training