## Note on the Functioning of Logit Lens

This note is based on the article [Interpreting GPT: The Logit Lens](https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens#other_examples).


## What is Logit Lens?

*Logit Lens* is used for analyzing the functioning of language models like GPT. Instead of focusing on **how** the model processes data, *Logit Lens* allows us to examine **what** the model predicts as the next token at various stages of text generation. This helps to understand **what the model "thinks" during its operation**.

With *Logit Lens*, we can:
- Peek inside the model and see which words it considers at different layers,
- Check how early the model begins to "guess" what word should appear next — often, by the middle of processing, the model already has accurate predictions about the next token,
- Trace how these predictions change across subsequent layers until the final output.

However, it's important to note that *Logit Lens* shows **only one aspect of the model's operation** – its predictions at various stages. It doesn't answer questions like:

- **How** does the model arrive at these predictions,
- What other information (e.g., intermediate representations, rules, structures) is stored and processed in the network.


## How Does Logit Lens Work?

- **Tokenization**
    The input text, e.g.,

    `"We train GPT-3, an aut..."`
    is split into tokens:

    `["We", " train", " GPT", "-", "3", ",", " an", " aut", ...]`

    In practice, these are often not full words but parts of words or individual characters.

- **Embedding**

    Each token is transformed into a vector of numbers using the **embedding matrix**. This converts the text into a set of numerical vectors. In this way, each token becomes a point in a multidimensional space, where points that are close to each other represent similar meanings.

- **Processing Through GPT Layers**

    The vectors pass through successive **transformer layers**

    Each layer adds new information, such as:

    - Considering the context (what came before?),
    - Recognizing linguistic structures (sentences, syntax),
    - Developing meaning based on the surrounding tokens.

    After each layer, we get **updated vectors** that increasingly "understand" the meaning of the text and what might come next.

- **Standard Model Output**
    At the end of the last layer, we get the final vector representing "what should come next." This vector is transformed by multiplying it by the transpose of the embedding matrix, yielding logits. A softmax applied to these logits gives a probability distribution. The token with the highest probability is selected as the next one.

- **Logit Lens: What Do We Do Differently?**

    Logit Lens performs exactly **the same step**, but **for the vectors from earlier layers**.
    This allows us to see how the model's "types" changed throughout the processing.

    In the standard model operation, logits are computed **only at the end** — after the last layer.

    **Logit Lens allows us to perform exactly the same step (multiplying by transposed matrix, softmax)**, but **for each intermediate layer**

    This way, we can:

    - See **what the model's predictions were** after each layer.
    - Compare whether it **guessed correctly early on**, or if it **changed its mind only towards the end**.


### Whar can we learn from reading Logit Lens visualisations?

TO DO 
dokoczenie jutro rano

### Interesting cases introduced in the article:

TO DO 
bo zle sie czuje

byeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee