[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shang-vikas/series1-coding-exercises/blob/main/exercises/blog-09/exercise-01.ipynb)

# ðŸ§ª Exercise 3 â€” Real Pretraining on IMDB (Next-Token Prediction)

We train a small decoder-only Transformer as a language model on IMDB reviews.

This is real pretraining objective.

## Install + Setup

In [None]:
%pip install datasets transformers accelerate -q

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from datasets import load_dataset
from transformers import AutoTokenizer
import math
import time

device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

## Load Real Dataset

In [None]:
dataset = load_dataset("imdb")

# Use unsupervised split for language modeling
train_texts = dataset["unsupervised"]["text"][:20000]

tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

def tokenize(batch):
    return tokenizer(batch, truncation=True, padding="max_length", max_length=128)

encoded = [tokenizer(t, truncation=True, padding="max_length", max_length=128, return_tensors="pt") for t in train_texts]

## Build Simple Decoder Transformer

In [None]:
class TinyGPT(nn.Module):
    def __init__(self, vocab_size, d_model=256, nhead=4, num_layers=4):
        super().__init__()
        self.token_emb = nn.Embedding(vocab_size, d_model)
        self.pos_emb = nn.Embedding(128, d_model)
        decoder_layer = nn.TransformerEncoderLayer(d_model, nhead, batch_first=True)
        self.transformer = nn.TransformerEncoder(decoder_layer, num_layers)
        self.ln = nn.LayerNorm(d_model)
        self.head = nn.Linear(d_model, vocab_size)

    def forward(self, x):
        b, t = x.shape
        pos = torch.arange(t, device=x.device).unsqueeze(0)
        x = self.token_emb(x) + self.pos_emb(pos)
        x = self.transformer(x)
        x = self.ln(x)
        return self.head(x)

model = TinyGPT(tokenizer.vocab_size).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=3e-4)
criterion = nn.CrossEntropyLoss(ignore_index=tokenizer.pad_token_id)

## Training Loop (Real Pretraining)

In [None]:
loader = DataLoader(encoded, batch_size=16, shuffle=True)

for epoch in range(2):
    model.train()
    total_loss = 0
    start = time.time()

    for batch in loader:
        input_ids = batch["input_ids"].squeeze(1).to(device)

        inputs = input_ids[:, :-1]
        targets = input_ids[:, 1:]

        logits = model(inputs)

        loss = criterion(logits.reshape(-1, logits.size(-1)), targets.reshape(-1))

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    avg_loss = total_loss / len(loader)
    print(f"Epoch {epoch+1} | Loss {avg_loss:.4f} | Perplexity {math.exp(avg_loss):.2f} | Time {time.time()-start:.1f}s")

## Test Generation

In [None]:
def generate(prompt, max_tokens=50, temperature=1.0):
    model.eval()
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

    for _ in range(max_tokens):
        logits = model(input_ids)
        next_token_logits = logits[:, -1, :] / temperature
        probs = F.softmax(next_token_logits, dim=-1)
        next_token = torch.multinomial(probs, num_samples=1)
        input_ids = torch.cat([input_ids, next_token], dim=1)

    return tokenizer.decode(input_ids[0], skip_special_tokens=True)

print(generate("This movie was absolutely", temperature=0.7))

## ðŸ”Ž What They Will See

- Loss decreases
- Perplexity decreases
- Model generates IMDB-style continuation
- No concept of "answering" â€” just continuation

This is real pretraining behavior.

# ðŸ§ª Exercise 4 â€” Real Instruction Tuning (SFT)

Now we fine-tune the same model on real instruction data.

We use a small open Alpaca subset.

## Load Instruction Dataset

In [None]:
alpaca = load_dataset("tatsu-lab/alpaca", split="train[:5000]")

def format_example(example):
    prompt = f"### Instruction:\n{example['instruction']}\n### Response:\n{example['output']}"
    return tokenizer(prompt, truncation=True, padding="max_length", max_length=128, return_tensors="pt")

sft_data = [format_example(ex) for ex in alpaca]
sft_loader = DataLoader(sft_data, batch_size=16, shuffle=True)

## Supervised Fine-Tuning Loop

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

for epoch in range(1):
    model.train()
    total_loss = 0

    for batch in sft_loader:
        input_ids = batch["input_ids"].squeeze(1).to(device)

        inputs = input_ids[:, :-1]
        targets = input_ids[:, 1:]

        logits = model(inputs)
        loss = criterion(logits.reshape(-1, logits.size(-1)), targets.reshape(-1))

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    print("SFT Loss:", total_loss / len(sft_loader))

## Compare Before vs After

In [None]:
print(generate("Explain gravity simply.", temperature=0.7))

Now the model answers.

Because instructionâ†’response patterns were injected.

No architecture changed.

Only distribution changed.

# ðŸ§ª Exercise 5 â€” Preference Steering (Simplified Real Ranking)

We simulate preference optimization using IMDB sentiment.

Goal: teach model to prefer positive tone.

## Create Preference Pairs

In [None]:
positive_reviews = [t for t in train_texts if "good" in t.lower()][:1000]
negative_reviews = [t for t in train_texts if "bad" in t.lower()][:1000]

# We treat positive as preferred.

## Reward Model

In [None]:
class RewardModel(nn.Module):
    def __init__(self, hidden_size):
        super().__init__()
        self.linear = nn.Linear(hidden_size, 1)

    def forward(self, hidden_states):
        pooled = hidden_states.mean(dim=1)
        return self.linear(pooled)

reward_model = RewardModel(256).to(device)
reward_optimizer = torch.optim.Adam(reward_model.parameters(), lr=1e-4)

## Train Reward Model

In [None]:
for i in range(200):
    pos = tokenizer(positive_reviews[i], truncation=True, padding="max_length", max_length=128, return_tensors="pt").input_ids.to(device)
    neg = tokenizer(negative_reviews[i], truncation=True, padding="max_length", max_length=128, return_tensors="pt").input_ids.to(device)

    pos_hidden = model.token_emb(pos)
    neg_hidden = model.token_emb(neg)

    pos_score = reward_model(pos_hidden)
    neg_score = reward_model(neg_hidden)

    loss = F.relu(1 - pos_score + neg_score).mean()

    reward_optimizer.zero_grad()
    loss.backward()
    reward_optimizer.step()

## Use Reward to Reweight LM

In [None]:
alpha = 2.0

for batch in loader:
    input_ids = batch["input_ids"].squeeze(1).to(device)

    inputs = input_ids[:, :-1]
    targets = input_ids[:, 1:]

    logits = model(inputs)
    lm_loss = criterion(logits.reshape(-1, logits.size(-1)), targets.reshape(-1))

    hidden = model.token_emb(inputs)
    reward = reward_model(hidden).mean()

    weighted_loss = lm_loss - alpha * reward

    optimizer.zero_grad()
    weighted_loss.backward()
    optimizer.step()

    break

Now test:

In [None]:
print(generate("This movie was", temperature=0.7))

Tone shifts slightly positive.

This demonstrates alignment mechanics.

# ðŸ§ª Exercise 6 â€” Show Hallucination Mechanically

Ask model something outside IMDB distribution.

In [None]:
print(generate("The capital of Atlantis is", temperature=0.7))

It will invent.

Because it learned patterns of answering, not truth retrieval.

## ðŸ’¡ What This Entire Notebook Demonstrates

- Pretraining = next-token prediction on real text
- SFT = behavior steering via distribution shift
- Preference signals = probability reweighting
- No truth module
- Hallucination is structural

## ðŸ’¡ What This Does NOT Demonstrate

- Billion-scale representation power
- RLHF full pipeline
- True large-scale generalization

But the mechanisms are identical.

Scale changes capability.

Not objective.