Logitscope

A simplified and more performant next token Patchscope.

Alternatively, a more expensive and more accurate version of the logit lens.

Background

The Patchscopes framework is a general framework, introduced in the Patchscopes paper, for patching residual activations from some prompt/layer/position/model into another prompt/layer/position/model.

A Patchscope technique is a specific parameterization of the Patchscopes framework.

Next token extraction is the general problem of extracting the next token of a sequence using a single residual activation somewhere in the sequence. This roughly indicates when a specific residual contains enough information to predict the next token and can be useful for circuit finding. For the problem to be non-trivial, the residual activation should not be the one in the last layer and last position. The logit lens can be viewed as a technique for solving this problem.

I use "next token Patchscope" to refer to a Patchscope technique for solving next token extraction.

The token identity Patchscope is the next token Patchscope introduced in the Patchscopes paper. The idea is to take a prompt like "cat -> cat\n1135 -> 1135\nhello -> hello\n?" and patch a residual activation into the ? token without a layer shift. For instance, it could patch the layer 5 residual of = in "1+1=" into the layer 5 residual for the ? token to see if it predicts 2.

Method

This project studies a new next token Patchscope that is identical to the token identity Patchscope except with the prompt "?" instead of "cat -> cat\n1135 -> 1135\nhello -> hello\n?".

Comparison with Patchscope

The goal of this is to fix some failure modes of "cat -> cat\n1135 -> 1135\nhello -> hello\n?":

The in-context examples show -> tokens after the first tokens, so there's a bias to predicting -> after the ?.
- Indeed, the most likely next token after ? is -> if early layers are patched, which can explain why patchscope has low performance in early layers.
If -> is added to the end of the prompt, there's a bias towards predicting ?.
- This causes a significant drop in performance if -> is added to the end of the prompt.

"?" tries to fix these two issues by removing the in-context examples:

On a surface level, the bias towards -> gets fixed because -> doesn't appear.
On a deeper level, the bias towards ? can be viewed as a bias towards predicting the current token rather than the next token.
- As a thought experiment, imagine patching the 0th layer pre-residual of a token like "health" into the ? token in "cat -> cat\n1135 -> 1135\nhello -> hello\n?". This is equivalent to the prompt "cat -> cat\n1135 -> 1135\nhello -> hello\nhealth", for which the most likely next two tokens are -> health, not -> care. However, in the next token extraction problem, we want to extract care and not health.

Comparison with Logit Lens

While performance could perhaps be improved with different in-context examples, keeping it simple makes it feel like a more expensive and more accurate version of the logit lens where we do the rest of the forward pass instead of skipping to the decoder. The name logitscope captures how we use a longer "scope" (all remaining layers) as opposed to a thin "lens" (decoder layer) to observe the logits.

Results

logitscope performs better than patchscope in both precision@1 and surprisal (the two metrics in the Patchscopes paper.) with the same dataset (The Pile) and one of the same models (GPT-J 6B). I used the same preprocessing steps with slight changes: 2K examples from the start instead of after an offset of 10K, and reduced the word/character limits to avoid running out of GPU memory.

For precision@1, higher is better. For surprisal, lower is better.

How to Run

Run pip install -r requirements.txt and then all cells in logitscope.ipynb. I ran this on an A100 80GB with 100GB of disk space, using Python 3.11.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.gitignore		.gitignore
README.md		README.md
logitscope.ipynb		logitscope.ipynb
logitscope.svg		logitscope.svg
patchscope.py		patchscope.py
patchscope_logitlens.json		patchscope_logitlens.json
patchscope_logitscope.json		patchscope_logitscope.json
patchscope_patchscope.json		patchscope_patchscope.json
prec1.png		prec1.png
requirements.txt		requirements.txt
surprisal.png		surprisal.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Logitscope

Background

Method

Comparison with Patchscope

Comparison with Logit Lens

Results

How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Logitscope

Background

Method

Comparison with Patchscope

Comparison with Logit Lens

Results

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages