This is a tutorial (code copied and updated from https://github.com/neelnanda-io/EasyTransformer/blob/main/Hacky-Interactive-Lexoscope.ipynb) that explores how to interactively visualize a Transformer language model. We use @neelnanda-io's `Easy-Transformer` library.

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import os

In [3]:
!pip install gradio plotly
!pip install "git+https://github.com/neelnanda-io/Easy-Transformer.git"

Collecting git+https://github.com/neelnanda-io/Easy-Transformer.git
  Cloning https://github.com/neelnanda-io/Easy-Transformer.git to /tmp/pip-req-build-yjykeksm
  Running command git clone --filter=blob:none --quiet https://github.com/neelnanda-io/Easy-Transformer.git /tmp/pip-req-build-yjykeksm
  Resolved https://github.com/neelnanda-io/Easy-Transformer.git to commit ee0fc75942365dec64d476aa72219a0fca628059
  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting einops
  Downloading einops-0.5.0-py3-none-any.whl (36 kB)
Collecting fancy_einsum
  Downloading fancy_einsum-0.0.3-py3-none-any.whl (6.2 kB)
Collecting torchtyping
  Downloading torchtyping-0.1.4-py3-none-any.whl (17 kB)
Collecting typeguard>=2.11.1
  Downloading typeguard-2.13.3-py3-none-any.whl (17 kB)
Building wheels for collected packages: easy-transformer
  Building wheel for easy-transformer (setup.py) ... [?25ldone
[?25h  Created wheel for easy-transformer: filename=easy_transformer-0.1.0-py3-none-any.whl si

In [6]:
import gradio as gr
from easy_transformer import EasyTransformer
from easy_transformer.utils import to_numpy
from IPython.display import HTML

### GPT2-small (117M)

We use `gpt2-small` in this example. This is a language model with 117M parameters. It embeds some input tokens, contextualizes them, then predicts the next word, all while computing loss against a known target. It was trained on the WebText dataset, which emphasizes quality by filtering human-curated documents that have a karma score > 3 on Reddit, scraping them and preprocessing them using `Dragnet` and `https://github.com/codelucas/newspaper` content extractors.

### Extracting model activations

In [7]:
model_name = "gpt2-small"
model = EasyTransformer.from_pretrained(model_name)

Loading model: gpt2-small


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Using pad_token, but it is not set yet.


Moving model to device:  cuda
Finished loading pretrained model gpt2-small into EasyTransformer!


In [125]:
def get_activations(inp, layer, neuron_idx):
    """Use a hook to extract a particular activation (neuron) from a layer, this could be a callback in a different library.
    """
    cache = {}
    
    def caching_hook(act, hook):
        cache["activation"] = act[0, :, neuron_idx]
    
    model.run_with_hooks(
        inp, fwd_hooks=[(f"blocks.{layer}.mlp.hook_post", caching_hook)]
    )
    return to_numpy(cache["activation"])

In [89]:
# Testing the outputs of the function above

In [90]:
default_layer = 9
default_neuron_idx = 652
default_inp = "The following is a list of powers of 10: 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000"
print(model.to_str_tokens(default_inp))
print(get_activations(default_inp, default_layer, default_neuron_idx))

['<|endoftext|>', 'The', ' following', ' is', ' a', ' list', ' of', ' powers', ' of', ' 10', ':', ' 1', ',', ' 10', ',', ' 100', ',', ' 1000', ',', ' 10000', ',', ' 100', '000', ',', ' 100', '0000', ',', ' 100', '00000']
[-0.08643487 -0.14071974 -0.10398163 -0.12390741 -0.04058979 -0.110649
 -0.0518984  -0.11276124 -0.06905475 -0.11189392 -0.03059199 -0.10336907
 -0.04322352  1.5935557  -0.14205763  2.5116613  -0.13316426  2.51967
 -0.11360867  3.0765228  -0.1163746   0.53939086  2.3499641  -0.1495217
 -0.16476327  1.9449065  -0.1369017  -0.0880252   2.1848845 ]


### Visualizing Model Activations

Here we will implement a simple interface that visualizes the neurons of a GPT language model on some text.
Each token element is coloured according to the intensity of the selected neuron's activation. We normalize the activations to be between [0,1]

Question: How does one tell if neuron 562 in layer 9 activates strongly on powers of 10?

This visualization is sensitive to `max_val` and `min_val`. When doing research on neuron activations, these values can be tuned to whatever seems reasonable for the distribution of neurons in question. The defaults come from Nanda, min_val=0, max_val is the max activation across the dataset.

In [91]:
# We need css to give each token a thin gray border, to make it easy to see token separation
style = """"""

In [92]:
def calculate_color(val, max_val, min_val):
    """Normalize val to [0,1] and return a colour which interpolates between slightly off-white and red (0 = white, 1 = red)
    Returns a string of the form "rgb(240,240,240) which CSS recognizes"
    """
    normalized_val = (val - abs(min_val)) / max_val
    st = f"rgb(240, {240*(1-normalized_val)}, {240*(1-normalized_val)})"
    # print(st)
    return st


In [106]:
def neuron_viz(inp, layer, neuron_idx, max_val=None, min_val=None):
    """
    inp: Input text to be visualised
    layer: layer index into the model
    neuron_idx: neuron index
    max_val: Top range for our activation range, defaults to max activation across the dataset
    min_val: Lower end of our activation range, defaults to 0
    
    Returns: a string of HTML that displays the text with each token colored according to its activation
    It is useful to be able to input a layer's fixed max_val and min_val, otherwise the colors change as you edit the text, which is annoying.
    """
    if layer is None:
        return "Please select a layer"
    
    if neuron_idx is None:
        return "Please select a neuron"
    
    acts = get_activations(inp, layer, neuron_idx)
    act_max = acts.max()
    act_min = acts.min()
    
    # Override defaults if not set
    if max_val is None:
        max_val = act_max
    if min_val is None:
        min_val = act_min
    
    # Make a list of HTML markup to concat into our final HTML
    # First add style which adds a border to each token element
    markup = [style]
    
    # Add text to tell us which layer and neuron we're looking at.
    markup.append(f"""
<section>
Layer: <b>{layer}</b>.

Neuron Index: <b>{neuron_idx}</b> </section>""")
    # Add a line telling us the limits of our range
    markup.append(f"""
<section>
Max Range: <b>{max_val:.4f}</b>
    
Min Range: <b>{min_val:.4f}</b> </section>""")
    
    # If we added a custom range, print a line telling us the range of our activations too
    if act_max != max_val or act_min != min_val:
        markup.append(f"""
<section>Custom range: 

Max Act: <b>{act_max:.4f}</b>
        
Min Act: <b>{act_min:.4f}</b>

</section>""")
    
    # Tokenize input text
    toks = model.to_str_tokens(inp)
    for tok, act in zip(toks, acts):
        # set bgcolor to the colour calculated from the activation
        # Set the span
        markup.append(f"""<span class="token" style="background-color:{calculate_color(act, act_max, act_min)}">{tok}</span>""")
    return "".join(markup)

In [107]:
default_max = 4.0
default_min = 0.0
default_markup = neuron_viz(
    default_inp,
    default_layer,
    default_neuron_idx,
    max_val=default_max,
    min_val=default_min,
)

display(HTML(default_markup))

### Interactive UI: Building a gradio app

We introduce a few elements: TextBoxes, Numbers etc that  return strings and numbers. We also define elements that display things. We call `input.change(update_function, inputs, output)`.
This says that "if that input element changes, run the update function on the value of each of the `inputs` and set the value of `output`  to the output of the function.

In [113]:
with gr.Blocks() as app:
    gr.HTML(value=f"Interactive Token Visualizer for {model_name}")
    
    #input elements
    with gr.Row():
        with gr.Column():
            text = gr.Textbox(label="Input Text", value=default_inp)
            # Precision=0 makes it an int, otherwise its a float
            # Value sets the initial default value
            layer = gr.Number(label="Layer", value=default_layer, precision=0)
            neuron_idx = gr.Number(label="Neuron Index", value=default_neuron_idx, precision=0)
            
            # If empty, these two map to None
            max_val = gr.Number(label="Max Value", value=default_max)
            min_val = gr.Number(label="Min Value", value=default_min)
            inputs = [text, layer, neuron_idx, max_val, min_val]
        with gr.Column():
            # output Element
            out = gr.HTML(label="Neuron Activations", value=default_markup)
    for inp in inputs:
        inp.change(neuron_viz, inputs, out)

In [114]:
app.launch(share=True, height=1000)

Running on local URL:  http://127.0.0.1:7863
Running on public URL: https://6870c2a1d159526b.gradio.app

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces


(<gradio.routes.App at 0x7fe4d134bfa0>,
 'http://127.0.0.1:7863/',
 'https://6870c2a1d159526b.gradio.app')