# Part 5: Instruction Tuning
## From text completion to following instructions

In Parts 1-4, we built models that complete text. Give them a prompt, they continue it. But that's not how we use ChatGPT or Claude:

**Pretrained model**:
```
Input: "What is the capital of France?"
Output: "What is the capital of Germany? What is the capital of Spain?..."
```

**Instruction-tuned model**:
```
Input: "What is the capital of France?"
Output: "Paris"
```

The pretrained model doesn't know it should **answer** the question—it just continues the text pattern.

## What is Instruction Tuning?

Instruction tuning (also called supervised fine-tuning or SFT) teaches the model to:

1. Recognize that an input is a question/instruction
2. Generate an appropriate response
3. Stop after answering

We do this by fine-tuning on (instruction, response) pairs.

## Setup

In [None]:
import torch
import torch.nn.functional as F
from torch import nn
import math
import os
import requests
import matplotlib.pyplot as plt

torch.manual_seed(42)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

## Step 1: Create an Instruction Dataset

For a real model, we'd use datasets like Alpaca, Dolly, or OpenAssistant. For our educational purposes, we'll create a small synthetic dataset.

In [None]:
# Our mini instruction dataset
instruction_data = [
    # Simple Q&A
    {"instruction": "What is the capital of France?", "response": "The capital of France is Paris."},
    {"instruction": "What is the capital of India?", "response": "The capital of India is New Delhi."},
    {"instruction": "What is the capital of Japan?", "response": "The capital of Japan is Tokyo."},
    {"instruction": "What is the capital of Germany?", "response": "The capital of Germany is Berlin."},
    {"instruction": "What is the capital of Italy?", "response": "The capital of Italy is Rome."},
    
    # Math
    {"instruction": "What is 2 + 2?", "response": "2 + 2 equals 4."},
    {"instruction": "What is 5 * 3?", "response": "5 * 3 equals 15."},
    {"instruction": "What is 10 - 7?", "response": "10 - 7 equals 3."},
    {"instruction": "What is 20 / 4?", "response": "20 / 4 equals 5."},
    
    # Definitions
    {"instruction": "Define machine learning.", "response": "Machine learning is a type of artificial intelligence where computers learn patterns from data."},
    {"instruction": "What is Python?", "response": "Python is a popular programming language known for its simplicity and readability."},
    {"instruction": "What is an algorithm?", "response": "An algorithm is a step-by-step procedure for solving a problem or performing a task."},
    
    # Instructions
    {"instruction": "Say hello.", "response": "Hello!"},
    {"instruction": "Count to five.", "response": "1, 2, 3, 4, 5."},
    {"instruction": "Name three colors.", "response": "Red, blue, and green."},
    {"instruction": "List three fruits.", "response": "Apple, banana, and orange."},
    
    # More Q&A
    {"instruction": "Who wrote Romeo and Juliet?", "response": "William Shakespeare wrote Romeo and Juliet."},
    {"instruction": "What planet is closest to the sun?", "response": "Mercury is the planet closest to the sun."},
    {"instruction": "How many days are in a week?", "response": "There are 7 days in a week."},
    {"instruction": "What is H2O?", "response": "H2O is the chemical formula for water."},
]

print(f"Number of instruction-response pairs: {len(instruction_data)}")

## Step 2: Format for Training

We need a consistent format that the model can learn. A common format is:

```
### Instruction:
{instruction}

### Response:
{response}<|endoftext|>
```

In [None]:
def format_example(example):
    """Format an instruction-response pair for training."""
    return f"""### Instruction:
{example['instruction']}

### Response:
{example['response']}<|endoftext|>"""

# Preview formatting
print("Example formatted training data:")
print("=" * 50)
print(format_example(instruction_data[0]))
print("=" * 50)
print(format_example(instruction_data[5]))

In [None]:
# Create training text
training_text = "\n\n".join(format_example(ex) for ex in instruction_data)
print(f"Total training characters: {len(training_text)}")
print(f"\nFirst 500 characters:")
print(training_text[:500])

## Step 3: Build Vocabulary and Dataset

In [None]:
# Build vocabulary from training data
chars = sorted(set(training_text))
stoi = {ch: i for i, ch in enumerate(chars)}
itos = {i: ch for ch, i in stoi.items()}
vocab_size = len(stoi)

print(f"Vocabulary size: {vocab_size}")
print(f"Characters: {''.join(chars[:50])}...")

In [None]:
# Encode training data
block_size = 128

def build_dataset(text, block_size, stoi):
    data = [stoi[ch] for ch in text]
    X, Y = [], []
    for i in range(len(data) - block_size):
        X.append(data[i:i + block_size])
        Y.append(data[i + 1:i + block_size + 1])
    return torch.tensor(X), torch.tensor(Y)

X, Y = build_dataset(training_text, block_size, stoi)
print(f"Training examples: {len(X)}")

## Step 4: Model Architecture

We'll use the transformer from Part 4, but with a twist: we'll initialize with pretrained weights in a real scenario. For simplicity, we'll train from scratch on our small dataset.

In [None]:
class PositionalEncoding(nn.Module):
    def __init__(self, d_model, max_len=5000):
        super().__init__()
        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len).unsqueeze(1).float()
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        self.register_buffer('pe', pe.unsqueeze(0))

    def forward(self, x):
        return x + self.pe[:, :x.shape[1]]


class MultiHeadAttention(nn.Module):
    def __init__(self, d_model, n_heads):
        super().__init__()
        self.n_heads = n_heads
        self.d_k = d_model // n_heads
        self.W_qkv = nn.Linear(d_model, 3 * d_model, bias=False)
        self.W_out = nn.Linear(d_model, d_model, bias=False)

    def forward(self, x):
        batch, seq_len, d_model = x.shape
        qkv = self.W_qkv(x).reshape(batch, seq_len, 3, self.n_heads, self.d_k)
        qkv = qkv.permute(2, 0, 3, 1, 4)
        Q, K, V = qkv[0], qkv[1], qkv[2]

        scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.d_k)
        mask = torch.triu(torch.ones(seq_len, seq_len, device=x.device), diagonal=1).bool()
        scores = scores.masked_fill(mask, float('-inf'))

        attention = F.softmax(scores, dim=-1)
        output = torch.matmul(attention, V)
        output = output.permute(0, 2, 1, 3).reshape(batch, seq_len, d_model)
        return self.W_out(output)


class TransformerBlock(nn.Module):
    def __init__(self, d_model, n_heads, dropout=0.1):
        super().__init__()
        self.attention = MultiHeadAttention(d_model, n_heads)
        self.ffn = nn.Sequential(
            nn.Linear(d_model, 4 * d_model),
            nn.GELU(),
            nn.Linear(4 * d_model, d_model),
            nn.Dropout(dropout)
        )
        self.ln1 = nn.LayerNorm(d_model)
        self.ln2 = nn.LayerNorm(d_model)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        x = x + self.dropout(self.attention(self.ln1(x)))
        x = x + self.ffn(self.ln2(x))
        return x


class InstructionLM(nn.Module):
    """Transformer for instruction following."""

    def __init__(self, vocab_size, d_model, n_heads, n_layers, block_size, dropout=0.1):
        super().__init__()
        self.token_emb = nn.Embedding(vocab_size, d_model)
        self.pos_enc = PositionalEncoding(d_model, block_size)
        self.dropout = nn.Dropout(dropout)
        self.blocks = nn.ModuleList([
            TransformerBlock(d_model, n_heads, dropout)
            for _ in range(n_layers)
        ])
        self.ln_final = nn.LayerNorm(d_model)
        self.output = nn.Linear(d_model, vocab_size)
        self.block_size = block_size

    def forward(self, x):
        x = self.token_emb(x)
        x = self.pos_enc(x)
        x = self.dropout(x)
        for block in self.blocks:
            x = block(x)
        x = self.ln_final(x)
        return self.output(x)

In [None]:
# Create model
d_model = 128
n_heads = 4
n_layers = 4

model = InstructionLM(
    vocab_size=vocab_size,
    d_model=d_model,
    n_heads=n_heads,
    n_layers=n_layers,
    block_size=block_size,
    dropout=0.1
).to(device)

num_params = sum(p.numel() for p in model.parameters())
print(f"Model parameters: {num_params:,}")

## Step 5: Training

In [None]:
def train(model, X, Y, epochs=1000, batch_size=32, lr=3e-4):
    model.train()
    optimizer = torch.optim.AdamW(model.parameters(), lr=lr)

    X, Y = X.to(device), Y.to(device)
    losses = []

    for epoch in range(epochs):
        perm = torch.randperm(X.shape[0])
        total_loss, n_batches = 0, 0

        for i in range(0, len(X), batch_size):
            idx = perm[i:i+batch_size]
            x_batch, y_batch = X[idx], Y[idx]

            logits = model(x_batch)
            loss = F.cross_entropy(logits.view(-1, vocab_size), y_batch.view(-1))

            optimizer.zero_grad()
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            optimizer.step()

            total_loss += loss.item()
            n_batches += 1

        losses.append(total_loss / n_batches)
        if epoch % 200 == 0:
            print(f"Epoch {epoch}: Loss = {losses[-1]:.4f}")

    return losses

losses = train(model, X, Y, epochs=2000, batch_size=32, lr=1e-3)

In [None]:
plt.figure(figsize=(10, 4))
plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Instruction Tuning Loss')
plt.grid(True, alpha=0.3)
plt.show()

## Step 6: Inference with Instructions

In [None]:
@torch.no_grad()
def ask(model, instruction, max_tokens=100, temperature=0.7):
    """
    Ask the model a question in instruction format.
    """
    model.eval()

    # Format the prompt
    prompt = f"""### Instruction:
{instruction}

### Response:
"""

    # Encode prompt
    tokens = [stoi.get(ch, 0) for ch in prompt]
    generated = list(prompt)

    for _ in range(max_tokens):
        # Use last block_size tokens
        context = tokens[-block_size:] if len(tokens) >= block_size else tokens
        x = torch.tensor([context]).to(device)

        logits = model(x)
        logits = logits[0, -1, :] / temperature
        probs = F.softmax(logits, dim=-1)
        next_idx = torch.multinomial(probs, 1).item()

        next_char = itos[next_idx]
        tokens.append(next_idx)
        generated.append(next_char)

        # Stop at end-of-text token or double newline
        if '<|endoftext|>' in ''.join(generated[-15:]):
            break

    response = ''.join(generated)

    # Extract just the response part
    if "### Response:" in response:
        response = response.split("### Response:")[-1].strip()
        response = response.replace("<|endoftext|>", "").strip()

    return response

In [None]:
# Test with training examples
print("=" * 60)
print("TESTING ON TRAINING EXAMPLES")
print("=" * 60)

test_questions = [
    "What is the capital of France?",
    "What is 2 + 2?",
    "Who wrote Romeo and Juliet?",
    "Say hello.",
]

for q in test_questions:
    print(f"\nQ: {q}")
    print(f"A: {ask(model, q)}")

In [None]:
# Test generalization (these weren't in training!)
print("\n" + "=" * 60)
print("TESTING GENERALIZATION (not in training)")
print("=" * 60)

novel_questions = [
    "What is the capital of Spain?",  # Similar pattern, different answer
    "What is 3 + 3?",                  # Similar pattern
    "Name three animals.",             # Similar to "Name three colors"
    "Count to three.",                 # Similar to "Count to five"
]

for q in novel_questions:
    print(f"\nQ: {q}")
    print(f"A: {ask(model, q)}")

### Generalization Limits

With only 20 training examples, generalization is limited. Real instruction-tuned models like Alpaca use 52K examples, and models like GPT-3.5 use millions.

However, even with few examples, the model learns the **format**: instruction → response pattern.

## Step 7: Understanding What Changed

The key insight: **instruction tuning doesn't add new knowledge**—it teaches the model to **access and format** its knowledge appropriately.

In [None]:
# Let's see what happens WITHOUT instruction format
@torch.no_grad()
def complete(model, prompt, max_tokens=100, temperature=0.7):
    """Raw text completion without instruction formatting."""
    model.eval()

    tokens = [stoi.get(ch, 0) for ch in prompt]
    generated = list(prompt)

    for _ in range(max_tokens):
        context = tokens[-block_size:]
        x = torch.tensor([context]).to(device)

        logits = model(x)[0, -1, :] / temperature
        probs = F.softmax(logits, dim=-1)
        next_idx = torch.multinomial(probs, 1).item()

        tokens.append(next_idx)
        generated.append(itos[next_idx])

    return ''.join(generated)

print("Raw completion (no instruction format):")
print("-" * 50)
print(complete(model, "What is the capital of France?", max_tokens=80))

With instruction format, the model knows to:
1. Look for "### Instruction:" marker
2. Generate a direct response after "### Response:"
3. Stop at the end-of-text token

## The Real-World Instruction Tuning Pipeline

```
┌─────────────────────────────────────────────────────────────┐
│                 INSTRUCTION TUNING PIPELINE                  │
└─────────────────────────────────────────────────────────────┘

1. PRETRAIN on massive text corpus
   └── Model learns language patterns, facts, reasoning

2. COLLECT instruction data
   ├── Human-written (expensive, high quality)
   ├── GPT-generated (Alpaca approach)
   └── From existing NLP datasets

3. FORMAT data consistently
   └── Instruction: {task}
       Response: {answer}

4. FINE-TUNE on instruction data
   ├── Usually 1-3 epochs
   ├── Lower learning rate than pretraining
   └── Careful not to forget pretrained knowledge

5. EVALUATE
   ├── Held-out instructions
   ├── Human evaluation
   └── Benchmark tasks
```

## Summary

We implemented instruction tuning from scratch:

| Aspect | What We Did |
|--------|-------------|
| Dataset | 20 instruction-response pairs |
| Format | "### Instruction:\n{q}\n\n### Response:\n{a}" |
| Model | Same transformer from Part 4 |
| Training | Standard next-token prediction |
| Result | Model follows instruction format |

**Key insight**: Instruction tuning is just supervised fine-tuning on carefully formatted data. The magic is in the data, not the training procedure.

## What's Next

In **Part 6**, we'll add **DPO (Direct Preference Optimization)**—training the model to prefer good responses over bad ones. This is the final step toward alignment.

## Exercises

1. **More data**: Add 50 more instruction-response pairs. How does quality improve?
2. **Different formats**: Try "User: {q}\nAssistant: {a}" format
3. **Multi-turn**: Add conversation examples with multiple turns
4. **Negative examples**: What if you include bad examples? (Spoiler: Part 6!)
5. **LoRA**: Implement low-rank adaptation for efficient fine-tuning