# Intro

![Machine Learning](https://imgs.xkcd.com/comics/machine_learning.png)

# Setup

I stole this from an example, we don't need all this complexity. But I think it's cool to see.

Moreover, most of my python / jupyter / colab knowledge is copied from a bunch of examples. See [Sources](#Sources).

To open this in Google Colab, click [here](https://colab.research.google.com/github/klao/t9r-class/blob/master/htt_clean.ipynb).

In [None]:
import sys

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    %pip install einops
    %pip install jaxtyping
    %pip install transformer_lens
    %pip install git+https://github.com/callummcdougall/CircuitsVis.git#subdirectory=python
else:
    # See README.md for local setup
    pass

In [None]:
import os
import sys
import plotly.express as px
import torch as t
from torch import Tensor
import torch.nn as nn
import torch.nn.functional as F
from pathlib import Path
import numpy as np
import einops
from fancy_einsum import einsum
from jaxtyping import Int, Float
from typing import List, Optional, Tuple
import functools
from tqdm import tqdm
from IPython.display import display
import webbrowser
import gdown
from transformer_lens.hook_points import HookPoint
from transformer_lens import utils, HookedTransformer, HookedTransformerConfig, FactoredMatrix, ActivationCache
from transformer_lens.utils import get_corner
import circuitsvis as cv

# Saves computation time, since we don't need it for the contents of this notebook
t.set_grad_enabled(False)

device = t.device("cuda" if t.cuda.is_available() else "cpu")

# Getting acquainted with the model

In [None]:
gpt2 = HookedTransformer.from_pretrained("gpt2-small")

## Input: "What does this eat?" aka Tokenization

In [None]:
gpt2.tokenizer

In [None]:
text = ("This is a story about Quomatarus."
  + " When one day Quomatarus decided to do something different and bought a plane ticket to Lamanandu."
  + " When he arrived to the airport Quomatarus noticed")

In [None]:
tokens = gpt2.to_tokens(text)
str_tokens = gpt2.to_str_tokens(text)
print(str_tokens)
print(tokens.shape)

### Embedding

In [None]:
gpt2.W_E.shape

## Output: "What comes out?"

In [None]:
print(gpt2(tokens).shape)
print(gpt2(tokens, return_type="loss"))

In [None]:
#
# logits, cache = gpt2.run_with_cache(tokens, remove_batch_dim=True)
logits, cache = gpt2.run_with_cache(tokens)
print(logits.shape)
print(cache)

In [None]:
# Logits? Probabilities?

In [None]:
# Next token?

In [None]:
# How well did it predict the actual tokens?
# Log probs

In [None]:
# Plot it! Which tokens did it do well on? Which poorly? Why?

# Structure

## What do the "big brothers" look like?

- GPT-3: https://arxiv.org/abs/2005.14165v4
- PaLM: https://jmlr.org/papers/v24/22-1144.html
- LLaMA: https://arxiv.org/abs/2302.13971

In [None]:
for name, p in gpt2.named_parameters():
  if ".0." in name or "blocks" not in name:
    print(name, p.shape)

In [None]:
for activation_name, activation in cache.items():
    # Only print for the first layer
    if ".0." in activation_name or "blocks" not in activation_name:
        print(activation_name, activation.shape)

In [None]:
# Replicate part of the code. Maybe MLP?

In [None]:
# Look at attention patterns
# cv.attention.attention_pattern(s), don't forget to squeeze!

# Compare block 0 head 5 to block 5 head 5!

# Induction Heads

In [None]:

attention_pattern = cache["pattern", 5, "attn"].squeeze()

display(cv.attention.attention_patterns(
    tokens=str_tokens,
    attention=attention_pattern,
    attention_head_names=[f"L5H{i}" for i in range(12)],
))

# Sources

These are the main inspirations:

* https://arena-ch1-transformers.streamlit.app/[1.2]_Intro_to_Mech_Interp
* https://transformer-circuits.pub/2021/framework/index.html

Videos:

* https://neelnanda.io/transformer-tutorial

Other:

* https://www.lesswrong.com/posts/TvrfY4c9eaGLeyDkE/induction-heads-illustrated
