Add a helper function to display vectors of logits nicely #112

neelnanda-io · 2022-12-19T14:22:51Z

Often you want to look at vectors over the vocabulary (eg the logits at a specific position). This is >50,000 dimensions and this is hard to interpret! I want there to be nice utils to visualize a vector like this.

An MVP would be a function mapping this to a pandas dataframe, with the token index, token string value, logit, log prob and probability. Either for just the top K, or for the entire vocab.

But I expect there's many ways to make something nice here! One option is to imitate nostalgebraist's graphing style for plot_logit_lens in `transformer_utils link. This takes a layer x position x d_vocab tensor, and visualises it as a layer x position heatmap, printing the string value of the top token in each cell, and colouring by the top token value.

The text was updated successfully, but these errors were encountered:

sheikheddy · 2023-03-02T11:11:42Z

I recommend http://circos.ca/intro/circular_approach/.

Python implementations: https://github.com/ponnhide/pyCircos or https://github.com/moshi4/pyCirclize

sheikheddy · 2023-03-02T13:49:56Z

Okay, I'm going to put down some rough thoughts:

Often you want to look at vectors over the vocabulary (eg the logits at a specific position). This is >50,000 dimensions and this is hard to interpret! I want there to be nice utils to visualize a vector like this.

A more explicit way to put it:

Encoding name	OpenAI models
`gpt2` (or `r50k_base`)	Most GPT-3 models (and GPT-2)
`p50k_base`	Code models, `text-davinci-002`, `text-davinci-003`
`cl100k_base`	`text-embedding-ada-002`

Let's start with this snippet from https://github.com/openai/tiktoken:

def gpt2():
    mergeable_ranks = data_gym_to_mergeable_bpe_ranks(
        vocab_bpe_file="https://openaipublic.blob.core.windows.net/gpt-2/encodings/main/vocab.bpe",
        encoder_json_file="https://openaipublic.blob.core.windows.net/gpt-2/encodings/main/encoder.json",
    )
    return {
        "name": "gpt2",
        "explicit_n_vocab": 50257,
        "pat_str": r"""'s|'t|'re|'ve|'m|'ll|'d| ?\p{L}+| ?\p{N}+| ?[^\s\p{L}\p{N}]+|\s+(?!\S)|\s+""",
        "mergeable_ranks": mergeable_ranks,
        "special_tokens": {"<|endoftext|>": 50256},
    }

For clarity, here are a few assumptions:

There is a static dictionary mapping token ids (int) to token values (string).
Tokens earlier in the sequence affect the likelihood of tokens later (attention)
We are interested in how simple local interactions affect complex global structure.
You can have up to 4096 tokens in the context window, each token has 50k+ choices.
We usually look at logits at the final layer but in principle can check them at any output layer.
You can normalize with softmax(logits) to get logprobs

Here's a few ideas:

If you have a really long wire, or a really long strip of film, then it will take up a lot of horizontal space, and you will only be able to see a small slice at once.
In the same way we wrap up wires into coils, and filmstrips into spools of cassette tapes, we can take our string of positions and bend them into a circle like a paperclip.
There are two main types of loops: in the first one, you touch the two ends together to make the outline of a circle. You can use the space inside to draw lines that show connections between different sections. And you can use the space outside for hierarchy.
In the second one, you wind it up really tightly, like a rope, or DNA. For these, we can imagine marking each cell with a color according to state (this makes me think about Turing Machines for some reason).
You can also animate it to unwind at a variable rate, where the speed is controlled by uncertainty, faster when probability mass is concentrated, slower when it is more spread out. (This makes me think about Fourier decompositions and spectrograms). Ideally this would autoplay, like https://gource.io/.
See this https://ourworldindata.org/technology-long-run and https://socks-studio.com/2021/11/03/constructing-knowledge-through-geometry-ramon-llulls-figures-in-ars-magna-1305/ for inspiration.
Analogy to video editing: the timeline is your 1d position vector, and you layer different effects and masks and footage together into one final rendered composite. For interpretability, you run this process in reverse: go back from a final video to the source.
Finally, interactivity would be nice to have. Bret Victor has written a lot about this from a user design point of view, e.g http://worrydream.com/MagicInk/#reducing_interaction, to make the programming a bit easier I'd recommend borrowing heavily from existing component libraries.

All of this sounds a bit overkill for a helper function, but if fully realized, I think it'd be a really neat tool.

sheikheddy · 2023-03-02T14:14:06Z

I'll try to put a prototype up this weekend

neelnanda-io · 2023-03-02T14:56:25Z

Thanks! I'll admit that those takes were too in depth for me to really get my head around them, but it sounded interesting and I would love to see a prototype

…

On Thu, 2 Mar 2023, 14:14 sheikheddy, ***@***.***> wrote: I'll try to put a prototype up this weekend — Reply to this email directly, view it on GitHub <#112 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASRPNKMHOVPD3QBVLYCGAQLW2CTLTANCNFSM6AAAAAATDNKYOQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

sheikheddy · 2023-03-05T23:44:35Z

Still working on this, have some links in the meantime

https://observablehq.com/@bstaats/graph-visualization-introduction
https://observablehq.com/@observablehq/why-use-a-radial-data-visualization
https://observablehq.com/@kerryrodden/equal-area-radial-matrix-of-lgbt-rights
https://observablehq.com/@mbostock/polar-clock

sheikheddy · 2023-03-07T15:05:36Z

Seems like this would be a contribution to https://github.com/alan-cooney/CircuitsVis/blob/main/python/circuitsvis/logits.py, not TransformerLens?

neelnanda-io · 2023-03-07T16:18:17Z

Ah, yes, if you're imagining a real interactive visualisation, putting it in CircuitsVis seems more natural. It's set up to be easy to integrate Javascript code and Python.

…

On Tue, 7 Mar 2023 at 15:05, sheikheddy ***@***.***> wrote: Seems like this would be a contribution to https://github.com/alan-cooney/CircuitsVis/blob/main/python/circuitsvis/logits.py, not TransformerLens? — Reply to this email directly, view it on GitHub <#112 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASRPNKM76Q6E4OHCIHJN353W25FEZANCNFSM6AAAAAATDNKYOQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

jbloomAus · 2023-03-27T07:49:41Z

@sheikheddy @neelnanda-io What's the plan here? Do we need an interactive visualization or will something else do?

abdurraheemali · 2023-03-30T21:03:02Z

https://www.brendangregg.com/blog/2017-02-06/flamegraphs-vs-treemaps-vs-sunburst.html for a non-interactive visualization, flame graphs do pretty well

(I'm @sheikheddy from an alt-account)

neelnanda-io added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a helper function to display vectors of logits nicely #112

Add a helper function to display vectors of logits nicely #112

neelnanda-io commented Dec 19, 2022

sheikheddy commented Mar 2, 2023 •

edited

sheikheddy commented Mar 2, 2023 •

edited

sheikheddy commented Mar 2, 2023

neelnanda-io commented Mar 2, 2023 via email

sheikheddy commented Mar 5, 2023 •

edited

sheikheddy commented Mar 7, 2023

neelnanda-io commented Mar 7, 2023 via email

jbloomAus commented Mar 27, 2023

abdurraheemali commented Mar 30, 2023

Add a helper function to display vectors of logits nicely #112

Add a helper function to display vectors of logits nicely #112

Comments

neelnanda-io commented Dec 19, 2022

sheikheddy commented Mar 2, 2023 • edited

sheikheddy commented Mar 2, 2023 • edited

sheikheddy commented Mar 2, 2023

neelnanda-io commented Mar 2, 2023 via email

sheikheddy commented Mar 5, 2023 • edited

sheikheddy commented Mar 7, 2023

neelnanda-io commented Mar 7, 2023 via email

jbloomAus commented Mar 27, 2023

abdurraheemali commented Mar 30, 2023

sheikheddy commented Mar 2, 2023 •

edited

sheikheddy commented Mar 2, 2023 •

edited

sheikheddy commented Mar 5, 2023 •

edited