<img src="https://nnsight.net/_images/nnsight_logo.svg" alt="drawing" width="200"/>

# **NNsight**
## The API for a transparent science on black-box AI

In this era of large-scale deep learning, the most interesting AI models are massive black boxes that are hard to run. Ordinary commercial inference service APIs let you interact with huge models, but not model internals.

The NNsight library is different: it gives you (yes you) full access to all the neural network internals. When used together with a remote service like the [National Deep Inference Facility](https://thevisible.net/docs/NDIF-proposal.pdf) (NDIF), it lets you run complex experiments on huge open source models easily, with fully transparent access.

Our team wants to enable entire labs and independent researchers alike, as we believe a large, passionate, and collaborative community will produce the next big insights on a profoundly important field.




# But first, let's start small





## The Tracing Context

To demonstrate the core funtionality and syntax of NNsight, we'll define and use a toy two layer neural network.

In [1]:
# # Install nnsight
# !pip install git+https://github.com/JadenFiotto-Kaufman/nnsight.git@dev

# from IPython.display import clear_output

# clear_output()

Our little model here is composed of two two linear layers. We specify the sizes of each of these modules, and create some complementary example input.

In [2]:
from collections import OrderedDict
import torch

from rich import print as rprint


input_size = 5
hidden_dims = 10
output_size = 2

net = torch.nn.Sequential(OrderedDict([
    ('layer1', torch.nn.Linear(input_size, hidden_dims)),
    ('layer2', torch.nn.Linear(hidden_dims, output_size)),
])).requires_grad_(False)

input = torch.rand((1, input_size))

The core object of the nnsight package is `NNsight`. This wraps around a given PyTorch model to enable the NNsight's capabilities.

In [3]:
from nnsight import NNsight

model = NNsight(net)

  from .autonotebook import tqdm as notebook_tqdm


When printed, PyTorch models show a named hierarchy of modules. This is a useful reference when for accessing sub-components directly, and NNsight models work just the same.

In [4]:
print(model)

Sequential(
  (layer1): Linear(in_features=5, out_features=10, bias=True)
  (layer2): Linear(in_features=10, out_features=2, bias=True)
)


Before we actually get to using the model we just created, let's talk about what a `context` is in Python.

Often enough, when coding, you want to create some object, or initiate some logic, that you later want to destroy or conclude.

The most common application is opening files like the following example:

```python
with open('myfile.txt', 'r') as file:
  text = file.read()
```

Python uses the `with` keyword to enter a context-like object. This object defines logic to be ran at the start of the with block, and logic to be ran when exiting. In this case, entering the context opens a file and exiting the context closes it. Being within the context means we can read from file. Simple enough! Now we can discuss how NNsight uses contexts to enable powerful and intuitive access into the internals of model computation.


Introducing the tracing context. Just like before, something happens upon entering the tracing context, something happens when exiting, and being inside enables some functionality.

We enter the tracing context by calling `.trace(<input>)` on the `NNsight` model we created before. Entering it denotes we want to run the model given our input... but not yet! The model is only ran upon exiting the tracing context.

In [5]:
with model.trace(input) as tracer:
  pass

But where's the output? To get that, we'll have to learn how to request it from within the tracing conext.

## Getting

Earlier, when we wrapped our toy model with the `NNsight` class, this added a couple properties to each module in the model (including the root model itself). The ones we care about are `.input` and `.output`.

```python
model.input
model.output
```

The names are pretty self explanatory. They correspond to the inputs and outputs of their respective modules during some forward pass of an input through the model. These are what we're going to interact with in the tracing context.

However, the model isnt executed until the end of the tracing context, so we can't access the actual inputs and outputs of components within the context. For example, if we tried to print a module's output within a context, we'd just see an InterventionProxy.

`.input` and `.output` are Proxies for the eventual inputs and outputs of a module. In other words, when you access `model.output`, you're telling `NNsight`: 
> "When you compute the output of `model`, please grab it for me and put the value into its corresponding Proxy object's `.value` attribute."

Let's try just that.

In [6]:
# with model.trace(input) as tracer:

#   output = model.output

# print(output.value)

Oh no, an error! `"Accessing Proxy value before it's been set."`

If `.value` isn't filled in after leaving the tracing context, accessing the value will give you this error.  In reality however, the value was filled in, it was just immediately removed. Why?

Proxy objects track their listeners (as in other Proxy object that rely on it), and when their listeners are all complete, it deletes the `.value` associated with the Proxy in order to save memory. To prevent this, we call `.save()` on the Proxy objects we want to access outisde of the tracing context:

In [7]:
with model.trace(input) as tracer:

  output = model.output.save()

print(output.value)

tensor([[-0.2326,  0.0592]])


Success! We now have the model output meaning ou just completed your first intervention request using Proxies.

These requests are handled at the soonest possible moment they can be completed. In this case, right after the model's output was computed. We call this process `interleaving`.

What else can we request? There's nothing special about the model itself vs it's submodules. Just like we saved the output of the model as a whole, we can save the output of any of it's submodules. To get to them we use normal Python attribute syntax, and we know where the modules are becuase we printed out the model earlier:

In [8]:
print(model)

Sequential(
  (layer1): Linear(in_features=5, out_features=10, bias=True)
  (layer2): Linear(in_features=10, out_features=2, bias=True)
)


In [9]:
with model.trace(input) as tracer:

  l1_output = model.layer1.output.save()

print(l1_output.value)

tensor([[-0.1175,  0.1461, -0.1080, -0.2595,  0.1440, -0.0546, -0.0097,  0.8252,
          0.1391, -0.4482]])


Let's do the same for the input of layer2. While we're at it, let's also drop the `as tracer`, as we won't be needing the tracer object itself for a few sections:

In [10]:
with model.trace(input):

  l2_input = model.layer2.input.save()

print(l2_input.value)

((tensor([[-0.1175,  0.1461, -0.1080, -0.2595,  0.1440, -0.0546, -0.0097,  0.8252,
          0.1391, -0.4482]]),), {})


<details>
  <summary>On module inputs</summary>

  ---

  Notice how the value for l2_input, was not just a single tensor.
  The type/shape of values from .input is in the form of:

      tuple(tuple(args), dictionary(kwargs))

  Where the first index of the tuple is itself a tuple of all positional arguments, and the second index is a dictionary of the keyword arguments.

  ---

</details>


Now that we can access activations, we also want to do some post-processing on it. Let's find out which dimension of layer1's output has the highest value.

## Functions, Methods, and Operations

We could do this by calling `torch.argmax(...)` after the tracing context... or we can just leverage the fact that `nnsight` handles functions and methods within the tracing context, by creating a Proxy request for it:

In [11]:
with model.trace(input):

  # Note we don't need to call .save() on the output,
  # as we're only using it's value within the tracing context.
  l1_output = model.layer1.output

  l1_amax = torch.argmax(l1_output, dim=1).save()

print(l1_amax[0])

tensor(7)


Nice! That worked seamlessly, but hold on, how come we didn't need to call `.value[0]` on the result? In previous sections, we were just being explicit to get an understaing of Proxies and their value. In practice however, `nnsight` knows that when outside of the tracing context we only care about the actual value, and so printing, indexing, and applying functions all immediately return and reflect the data in `.value`. So for the rest of the tutorial we won't use it.

The same principles work for methods and operations as well

In [12]:
with model.trace(input):

  value = (model.layer1.output.sum() + model.layer2.output.sum()).save()

print(value)

tensor(0.0834)


Getting and analyzing the activations from various points in a model can be really insightful, and a number of ML techniques do exactly that. However, often times we not only want to view the computation of a model, but influence it as well.

## Setting

To demonstrate editing the flow of infomration through the model, let's set the first dimension of the first layer's output to 0. `NNsight` makes this really easy using the familiar assignment `=` operator:

In [13]:
with model.trace(input):

  # Save the output before the edit to compare.
  # Notice we
  l1_output_before = model.layer1.output.clone().save()

  # Access the 0th index of the hidden state dimension ans set it to 0.
  model.layer1.output[:, 0] = 0

  # Save the output after to see our edit.
  l1_output_after = model.layer1.output.save()

print(l1_output_before.value)
print(l1_output_after.value)

tensor([[-0.1175,  0.1461, -0.1080, -0.2595,  0.1440, -0.0546, -0.0097,  0.8252,
          0.1391, -0.4482]])
tensor([[ 0.0000,  0.1461, -0.1080, -0.2595,  0.1440, -0.0546, -0.0097,  0.8252,
          0.1391, -0.4482]])


Note the use of `.clone()` to save the output of layer 1 before we applied an edit. Because `.save()` returns a reference to values *after* computation, we use `.clone()` to copy the value before edits are applied.

## Gradients

Awesome! Now we know how to access and intervene on a simple model's forward pass. What about the backward pass?

Lets declare the same toy model, this time with `.requires_grad_(True)`.

In [14]:
net = torch.nn.Sequential(OrderedDict([
    ('layer1', torch.nn.Linear(input_size, hidden_dims)),
    ('layer2', torch.nn.Linear(hidden_dims, output_size)),
]))

model = NNsight(net)

As a simple example, let's do a backward pass on the sum of the logits. We can save gradients just as we do activations, calling `.save()` on the gradients `.grad`.

In [15]:
with model.trace(input):
    l1_output = model.layer1.output.save()
    
    l1_gradients = model.layer1.output.grad.save()

    model.output.sum().backward()

print(l1_output)
print(l1_gradients)

tensor([[ 0.3012, -0.5661, -0.3582, -0.5639,  0.8994,  0.7014,  0.0738,  0.2688,
         -0.2155,  0.1100]], grad_fn=<AddmmBackward0>)
tensor([[ 0.0437, -0.4045,  0.3466, -0.0019,  0.4575, -0.0447,  0.4492, -0.3388,
          0.1754, -0.2691]])


Note two things about the example above:
1. Inference mode is off by default.
2. We can call `.backward()` on a value within the tracing context just like you normally would.

With that, we've learned the basics of getting and setting activations and gradients.

# Bigger

## LanguageModel

*I thought you said one trillion parameter models!*

Yes! NNsight provides the `LanguageModel` class which allows us to load models by their HuggingFace repo-id.

<details>
  <summary>On custom models</summary>

  ---

  You can also pass in custom models! Note that you'll have to pass in a tokenizer as well as the model. 

    model = LanguageModel(<CUSTOM_MODEL>, tokenizer=<TOKENIZER>)

  ---

</details>

In [16]:
from nnsight import LanguageModel

model = LanguageModel('openai-community/gpt2', device_map="auto")

## Tokenization

We can access the models tokenizer with `.tokenizer`, and it will return the default HuggingFace tokenizer for that `AutoModel` class. 

A couple of useful methods for encoding and decoding are listed below. See the official HuggingFace page [here](https://huggingface.co/docs/transformers/main_classes/tokenizer) for more arguments and methods.

In [28]:
tokenizer = model.tokenizer

prompt = "John and Mary went to the store. John handed the milk to"

tokens = tokenizer.encode(prompt) # str -> list[int]
token_tensor = tokenizer(prompt, return_tensors="pt")["input_ids"] # str -> Tensor[int]

str_tokens = tokenizer.convert_ids_to_tokens(tokens) # list[int] -> list[str]
decoded = tokenizer.decode(tokens) # list[int] -> str

print("Tokens:", tokens)
print("Token Tensor:", token_tensor)
print("String Tokens:", str_tokens)
print("Decoded:", decoded)

Tokens: [7554, 290, 5335, 1816, 284, 262, 3650, 13, 1757, 10158, 262, 7545, 284]
Token Tensor: tensor([[ 7554,   290,  5335,  1816,   284,   262,  3650,    13,  1757, 10158,
           262,  7545,   284]])
String Tokens: ['John', 'Ġand', 'ĠMary', 'Ġwent', 'Ġto', 'Ġthe', 'Ġstore', '.', 'ĠJohn', 'Ġhanded', 'Ġthe', 'Ġmilk', 'Ġto']
Decoded: John and Mary went to the store. John handed the milk to


Not only can `LanguageModel` take a tensor as input, but it will automatically tokenize text or batches of text. Let's try passing in a batch of text, then decoding an output. 

Setting `trace=False` tells NNsight that we don't plan to access or intervene on any hidden states; this simplifies syntax quite a bit and directly returns the model's output.

<details>
  <summary>Passing arguments to tracer</summary>

  ---

  If you pass input directly to the `trace` context, you can include arguments for tokenization in `invoker_args`. They will be passed downstream to the invoker and then to _prepare_inputs. See the later section on Batching for more information about the invoker context.

    model.trace(<TEXT>, invoker_args={"key","value"}):

  ---

</details>

Let's run our prompt through the model. Setting `trace=False` tells NNsight that we don't plan to access or intervene on any hidden states; this simplifies syntax quite a bit and directly returns the model's output.

<details>
  <summary>Passing arguments to tracer</summary>

  ---

  If you pass input directly to the `trace` context, you can include arguments for tokenization in `invoker_args`. They will be passed downstream to the invoker and then to _prepare_inputs. See the later section on Batching for more information about the invoker context.

    model.trace(<TEXT>, invoker_args={"key","value"}):

  ---

</details>

In [29]:
output = model.trace(prompt, trace=False)
tokens = output.logits.softmax(-1).argmax(-1)

print("Next token prediction:", tokenizer.decode(tokens[0,-1]))

Next token prediction:  Mary


Not only can `LanguageModel` take a tensor as input, but it will automatically tokenize text or batches of text.

In [34]:
batch_of_text = [prompt, "John handed the milk to"]

with model.trace(batch_of_text):
    prepared_input = model.input[1]["input_ids"].save()

print(prepared_input)

tensor([[ 7554,   290,  5335,  1816,   284,   262,  3650,    13,  1757, 10158,
           262,  7545,   284],
        [50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256,  7554, 10158,
           262,  7545,   284]], device='cuda:0')


Note how the shorter prompt is padded. By default, LanguageModel will prepare text inputs with left sided padding. It will not prepend a `BOS` token.

For easier indexing, you can use `.t[<idx>]` to index into tokens from the back.

In [35]:
toks = tokenizer(prompt, return_tensors="pt")["input_ids"]

with model.trace(toks) as tracer:
    token_0 = model.transformer.input[0][0].t[0].save()

# Check that the value from t[0] is the same as the 0th token in the input.
print("Traced token:", token_0)
print("Original token:", toks[:,0])

Traced token: tensor([7554], device='cuda:0')
Original token: tensor([7554])


From this example forward, we'll be using a tokenized version of our text input to make it easier to refer to and compare shapes.

## Getting and Setting

Everything that we've learned so far works exactly the same with LanguageModel. LanguageModels tend to be quite a bit more complicated than the simple model we declared above. We can print the model to figure out how to correctly access its components.

In [20]:
print(model)

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
  (generator): WrapperModule()
)


To demonstrate, let's ablate over a token position at an arbitrary layer.

In [21]:
with model.trace(toks):
    # Intervene on a module
    model.transformer.h[-3].attn.output[0].t[0] = 0.

    # Save the model's outputs
    logits = model.output.logits.save()

# Just print the first token's logits for visual clarity.
print(logits[:,0])

tensor([[-34.9196, -34.4503, -37.5173,  ..., -42.6724, -41.8168, -34.7129]],
       device='cuda:0', grad_fn=<SelectBackward0>)


How can we compare the difference in the clean (unedited) versus edited output? Let's try saving outputs with `.clone()` like we did earlier.

In [22]:
with model.trace(toks):

    output_one = model.output.logits.clone().save()

    model.transformer.h[-3].attn.output[0].t[0] = 0.

    output_two = model.output.logits.save()

print(output_one[:,0])
print(output_two[:,0])

tensor([[-34.9196, -34.4503, -37.5173,  ..., -42.6724, -41.8168, -34.7129]],
       device='cuda:0', grad_fn=<SelectBackward0>)
tensor([[-34.9196, -34.4503, -37.5173,  ..., -42.6724, -41.8168, -34.7129]],
       device='cuda:0', grad_fn=<SelectBackward0>)


Wait, they're the same? Since the trace context operates on a single forward pass, you can't get the value of some later module and set it as the value of an earlier model.

Instead, we can use batching and multi invoke contexts!

## Batching

Remember that `tracer` object that we dropped earlier? Well, we can call `.invoke()` on the tracer object to create an `invoke` context. Each `invoke` context takes an input, and those inputs are combined in a single forward pass through the model. However, all the operations we declare under each `invoke` context will only apply to the context's input.

Let's see how this works in practice by solving our problem from before.

In [None]:
with model.trace() as tracer:
    with tracer.invoke(toks):
        output_one = model.lm_head.output.save()

    with tracer.invoke(toks):
        model.transformer.h[-3].attn.output[0][:] = 0.

        output_two = model.lm_head.output.save()

print(output_one[:,0])
print(output_two[:,0])

Notice how we're calling `model.lm_head.output` instead of `model.output` here. Since operations under each `invoke` context are acting on a slice of the larger `trace` batch, we can only operate on tensors or other collections. `model.output` is a `CausalLMOutput` object which can't be indexed.

One cool thing we can do is pass values across invoke contexts. Let's say we wanted to use the output of the last layer as the input to the first layer. We can do that as such:

In [None]:
with model.trace() as tracer:
    with tracer.invoke(toks):
        final_layer_output = model.transformer.h[-1].output[0]

    with tracer.invoke(toks):
        model.transformer.h[0].attn.output[0][:] = final_layer_output

        output = model.output.logits.save()

print(output[:,0])

## .next()

We've talked a lot about accessing and intervening on a single forward pass. But the generative capabilities of modern language models is part of what makes them so impactful. NNsight enables users to access and intervene at every step of generation.

Using the same LanguageModel as before, we can declare a new context, `.generate`. The keyword argument, `max_new_tokens` determines how many tokens we'll generate. To get the generated output, call `.model.generator.output.save()`.

In [None]:
with model.generate(toks, max_new_tokens=3) as tracer:
    output = model.generator.output.save()

print("Original Shape:", toks.shape)
print("Original + Generated Tokens:", output.shape)

By default, operations within the generate context will act on the original prompt.

In [None]:
with model.generate(toks, max_new_tokens=3) as tracer:
    print(model.transformer.h[0].output[0].shape)

To access the hidden states of generated token sequences, use `.next()`.

In [None]:
new_tokens = 3
with model.generate(toks, max_new_tokens=new_tokens) as tracer:
    hidden_states1 = model.transformer.h[-1].output[0].save()

    tracer.next()
    
    hidden_states2 = model.transformer.h[-1].output[0].save()

    tracer.next()
    
    hidden_states3 = model.transformer.h[-1].output[0].save()

    out = model.generator.output.save()

print(hidden_states1.shape)
print(hidden_states2.shape)
print(hidden_states3.shape)
print(out.shape)

## Logit Lens

Let's put what we've learned so far into practice. As an example, let’s track the probability with which the model outputs “Mary” as we move through layers. One technique for this is the logit lens: we’ll treat the language model’s decoder applied to the intermediate activations as giving a crude approximation to this probability.

In [None]:
prompt = "John and Mary went to the store. John handed the milk to"

In [None]:
def decoder(x):
  # decoder consists of final layer norm + unembedding
  return model.lm_head(model.transformer.ln_f(x))

all_probs = []
with model.trace(prompt) as tracer:
  for layer in model.transformer.h:
    logits = decoder(layer.output[0]) # apply decoder to hidden state
    logits = logits[0,-1,:] # only over the final token
    probs = logits.softmax(dim=-1).save() # apply softmax to get probabilities and save
    all_probs.append(probs)
    # all_probs[i].value now stores the probability distribution over tokens after layer i

In [None]:
import plotly.express as px

mary_token = model.tokenizer(" Mary").input_ids[0] # token id for Mary

px.line(
    [layer_probs.value[mary_token].item() for layer_probs in all_probs],
    title="Probability of Mary after each layer, according to logit lens",
    labels={"value":"Probability", "index":"Layer"}
)

# I thought you said huge models?

## Remote execution

#Next Steps

# Advanced

Link to IOI patching
Link to function vectors

## IOI Patching

## Mamba Attention

#NOTES

In Graph, check module_proxy.proxy_value for tracer.invoker. If None, return node.value anyway. either its done or error!

Pass in input and immediately trace like .invoke from before.
No more tracer.output. Just access the output of the model as a proxy.

describe attribute access to get modules and call .output and .input

describe model is ran only at end of with block and proies are populated

describe you need .save() or else value is destroyed before you could access and it throws an error.

(note that .input is a tuple of (<args as tuple>, <kwargs as dict>)

In [None]:
with model.trace(input1):

  l2_input = model.layer2.input.save()

  output = model.output.save()

print(l2_input)
print(output)

Notice no .value. You could still call it if you want to.
It's still a proxy, but when you do an operation on a completed proxy, it dosent create a new proxy it just returns the resultant value.

In [None]:
with model.trace(input1):

  output = model.output.save()

  print(type(output))
  print(output)

output = output * 2

print(type(output))
print(output)

The Tracer object also allows attribute access for the model. THis is nice when you want to use NNsight inline to quickly trace a model.

In [None]:
with NNsight(net).trace(input1) as tracer:

  output = tracer.output.save()

print(output)

Can still do the full-featured multi-invoke version of trace like .forward before

Can use proxy variable in between invokes
It handles batching them for you and the shapes of values in them will be only of the size of the batch.

In [None]:
with model.trace() as tracer:

  with tracer.invoke(input2):

    l2_input = model.layer2.input

  with tracer.invoke(input1):

    model.layer2.input = l2_input

    output = model.layer2.output.save()

print(output)




If you just want to run the model and get the output, do trace=False

To tracing context, just output

(note if you pass inout directly to .trace, you can pass a dict of invoker_args for input preprocessing options like from tracer.invoke(...))

---



In [None]:
output = model.trace(input1, trace=False)

print(output)

The NNsight class is designed to be exended in order to enable tracing functionality for more complex models.

LangugeModel is one such extention.

Use hf repo_id to load and trace model

device_map is an underlying HF transformer arg to auto assign parameters to gpu

In [None]:
from nnsight import LanguageModel

In [None]:
model = LanguageModel('gpt2', device_map="auto")

print(model)

Works just like the simple case

In [None]:
with model.trace("Hello World!"):


  logits = model.lm_head.output.save()

One extention LanguageModel adds is to have the exact same tracing features as before but instead using the .generate method on HF transformer models

You can call .next() on modules to move the "pointer" to which iterations of .output and .input refer to.

In [None]:
with model.generate("The Eiffel Tower is in the city of", max_new_tokens=3) as tracer:

  logits_1 = model.lm_head.output.save()

  logits_2 = model.lm_head.next().output.save()

  logits_3 = model.lm_head.next().output.save()

print(logits1)
print(logits2)
print(logits3)