# 02 – Cache Activations

**Purpose:** Generate the activation dataset we'll use to train SAEs.

This notebook is orchestration + sanity checks:
1. Explain what we're caching and why
2. Run `cache_activations.py`
3. Load the resulting memmap
4. Sanity-check: shape, mean/std, histogram

---

## What are we caching?

We extract **MLP outputs from layer 6** of GPT-2 for many tokens. Each token position produces a 768-dim vector.

**Why cache?**
- SAE training needs many activation vectors
- Running the model during SAE training is slow
- Pre-caching lets us iterate quickly on SAE hyperparameters

**What gets saved?**
- A memory-mapped file (`*.mmap`) of shape `[n_tokens, 768]`
- dtype: float16 (saves disk space, sufficient precision)

---

## Run the caching script

This runs `src/cache_activations.py` which:
1. Loads GPT-2
2. Hooks layer 6 MLP
3. Runs text through the model
4. Saves activations to disk

In [None]:
import os
os.chdir('/Users/poonam/projects/mechinterp-from-scratch')

# Run with default settings (250k tokens)
!python -m src.cache_activations --n_tokens 50000

---

## Load and inspect the cached activations

In [None]:
import numpy as np

CACHE_PATH = "artifacts/cache/gpt2_l6_mlpout_fp16.mmap"
N_TOKENS = 50000
D_IN = 768

# Load memmap (read-only)
activations = np.memmap(CACHE_PATH, dtype=np.float16, mode='r', shape=(N_TOKENS, D_IN))

print(f"Shape: {activations.shape}")
print(f"Dtype: {activations.dtype}")
print(f"Size on disk: {os.path.getsize(CACHE_PATH) / 1e6:.1f} MB")

---

## Sanity checks

In [None]:
# Convert sample to float32 for stats
sample = activations[:10000].astype(np.float32)

print("Basic statistics (first 10k tokens):")
print(f"  Mean: {sample.mean():.4f}")
print(f"  Std:  {sample.std():.4f}")
print(f"  Min:  {sample.min():.4f}")
print(f"  Max:  {sample.max():.4f}")

In [None]:
# Check for NaNs or Infs
n_nan = np.isnan(sample).sum()
n_inf = np.isinf(sample).sum()
print(f"NaN count: {n_nan}")
print(f"Inf count: {n_inf}")

if n_nan == 0 and n_inf == 0:
    print("\n✓ No NaN or Inf values")

In [None]:
import matplotlib.pyplot as plt

# Histogram of activation magnitudes
magnitudes = np.linalg.norm(sample, axis=1)

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# L2 norm distribution
axes[0].hist(magnitudes, bins=50, edgecolor='black', alpha=0.7)
axes[0].set_xlabel('L2 Norm')
axes[0].set_ylabel('Count')
axes[0].set_title('Distribution of Activation Magnitudes')
axes[0].axvline(magnitudes.mean(), color='red', linestyle='--', label=f'Mean: {magnitudes.mean():.1f}')
axes[0].legend()

# Individual activation values
axes[1].hist(sample.flatten(), bins=100, edgecolor='black', alpha=0.7)
axes[1].set_xlabel('Activation Value')
axes[1].set_ylabel('Count')
axes[1].set_title('Distribution of Individual Activation Values')

plt.tight_layout()
plt.show()

In [None]:
# Check a few individual vectors
print("Sample activation vectors (first 3 tokens, first 10 dims):")
print(activations[:3, :10])

---

## Summary

**What we did:**
- Ran GPT-2 on text and cached layer 6 MLP outputs
- Verified the cache has correct shape and reasonable statistics
- No NaN/Inf values, activations look well-behaved

**Next:** Use these activations to train a Sparse Autoencoder (notebook 03).