# Investigation of single token prediction in Pythia 160m

Investigating how the Pythia 160m model represents knowledge from pre-training, using a prompt intended to generate a token for Dublin as the capital of Ireland. This is the smallest model which behaves like this.

In [1]:
# Install dependencies and set up the package
# %pip install transformers torch matplotlib

# Install the local package in development mode
import sys
sys.path.insert(0, '../src')

In [7]:
from transformer_algebra import PromptedTransformer, load_pythia_model, logits, predict, expand

# Load Pythia-160m-deduped from HuggingFace
model, tokenizer = load_pythia_model("EleutherAI/pythia-160m-deduped")
print(f"Model: {model.config.name_or_path}")
print(f"Layers: {model.config.num_hidden_layers}")
print(f"Heads: {model.config.num_attention_heads}")
print(f"d_model: {model.config.hidden_size}")

Model: EleutherAI/pythia-160m-deduped
Layers: 12
Heads: 12
d_model: 768


In [3]:
T = PromptedTransformer(model, tokenizer, "The capital of Ireland")
T  # Displays as T(<4 tokens>)

T(<4 tokens>)

In [4]:
result = T(" is")
result  # T(embed(' is'))

T(embed(' is'))

In [5]:
type(result)

transformer_algebra.core.ResidualVector

In [8]:
expand(result)

embed(' is') + ΔB^0 + ΔB^1 + ΔB^2 + ΔB^3 + ΔB^4 + ΔB^5 + ΔB^6 + ΔB^7 + ΔB^8 + ΔB^9 + ΔB^10 + ΔB^11

`expand(result)` should return something like $LN^T(\underline{\text{is}} + ΔB^0(\underline{\text{is}}) + ΔB^1(\underline{\text{is}}) + ... ΔB^{11}(\underline{\text{is}}))$

In [17]:

#logits(x) 
#returns an object representing a mapping from tokens to logits
L = logits(result)
L

logits(T(embed(' is')))

In [18]:
L.summary()

"Top 5 predictions:\n  1. ' the' (logit: 786.61)\n  2. ' a' (logit: 784.93)\n  3. ' Belfast' (logit: 784.57)\n  4. ' Dublin' (logit: 784.47)\n  5. ' now' (logit: 784.44)"

Expected result: Top 5 logits after final layer:
  1. ' the' (logit: 786.61)
  2. ' a' (logit: 784.93)
  3. ' Belfast' (logit: 784.57)
  4. ' Dublin' (logit: 784.47)
  5. ' now' (logit: 784.44)

In [19]:
L[" Dublin"]

<unembed(' Dublin'), T(embed(' is'))> = 784.47

Expected:

$<\overline{\text{ Dublin}}, T(\underline{\text{is}})> = 784.47$


In [23]:
expand(L[" Dublin"])

<unembed(' Dublin'), T(embed(' is'))> = 784.47

Intended:

$<\overline{\text{Dublin}}, LN^T(\underline{\text{is}} + ΔB^0(\underline{\text{is}}) + ΔB^1(\underline{\text{is}}) + ... ΔB^{11}(\underline{\text{is}}))$

In [27]:
predict(result)
# Returns an object representing a mapping from tokens to probabilities

P(token | T(embed(' is')))

Intended:

P(token | T(embed(' is')))


In [28]:
p = predict(result)
p.summary()

"Top 5 predictions:\n  1. ' the' (22.66%, logit=786.61)\n  2. ' a' (4.22%, logit=784.93)\n  3. ' Belfast' (2.94%, logit=784.57)\n  4. ' Dublin' (2.67%, logit=784.47)\n  5. ' now' (2.59%, logit=784.44)"

Expected:

"Top 5 predictions:\n  1. ' the' (22.66%, logit=786.61)\n  2. ' a' (4.22%, logit=784.93)\n  3. ' Belfast' (2.94%, logit=784.57)\n  4. ' Dublin' (2.67%, logit=784.47)\n  5. ' now' (2.59%, logit=784.44)"


In [30]:
p[" Dublin"] # represents likelihood of a given next token

P(' Dublin' | T(embed(' is'))) = 2.67%

In [31]:
expand(p[" Dublin"])

P(' Dublin' | T(embed(' is'))) = 2.67%

Intended:

$\lambda <\overline{\text{Dublin}}, LN^T(\underline{\text{is}} + ΔB^0(\underline{\text{is}}) + ΔB^1(\underline{\text{is}}) + ... ΔB^{11}(\underline{\text{is}}))> = 2.67$ where $\lambda = ...$


The next step depends on the expansion of normalisation described in speculation.md

In [34]:
x1 = expand(p[" Dublin"])
expand(x1) # maybe we need to identify what part of the expression is to be expanded?

P(' Dublin' | T(embed(' is'))) = 2.67%

Intended:

$\Lambda (<\overline{\text{Dublin}}, \gamma^T \odot \underline{\text{is}}> + <\overline{\text{Dublin}}, ΔB^0(\gamma^T \odot  \underline{\text{is}})> +  ...  <\overline{ \text{Dublin}}, ΔB^0(\gamma^T \odot \underline{\text{is}})>$ where $\lambda = ...$

Note definition of lambda will be more complicated than the previous output