<img src="https://nnsight.net/_images/nnsight_logo.svg" alt="drawing" width="200"/>

# **NNsight**
## The API for a transparent science on black-box AI

In this era of large-scale deep learning, the most interesting AI models are massive black boxes that are hard to run. Ordinary commercial inference service APIs let you interact with huge models, but they do not let you access model internals.

The nnsight library is different: it gives you (yes you) full access to all the neural network internals. When used together with a remote service like the [National Deep Inference Facility](https://thevisible.net/docs/NDIF-proposal.pdf) (NDIF), it lets you run complex experiments on huge open source models easily, with fully transparent access.

Our team wants to enable entire labs and independent researchers alike, as we believe a large, passionate, and collaborative community will produce the next big insights on a profoundly important field.




# But first, let's start small





## The Tracing Context

To demonstrate the core funtionality and syntax of nnsight, we'll define and use a tiny two layer neural network.

In [2]:
# # Install nnsight
# !pip install git+https://github.com/JadenFiotto-Kaufman/nnsight.git@dev

# from IPython.display import clear_output

# clear_output()

Our little model here is composed of four sub-modules, two linear layers ('layer1', 'layer2') and two activation functions ('sigma1', 'sigma2'). We spcecify the sizes of each of these modules, and create some complementary example input.

In [3]:
from collections import OrderedDict
import torch

input_size = 5
hidden_dims = 10
output_size = 2

net = torch.nn.Sequential(OrderedDict([
    ('layer1', torch.nn.Linear(input_size, hidden_dims)),
    ('layer2', torch.nn.Linear(hidden_dims, output_size)),
])).requires_grad_(False)

input = torch.rand((1, input_size))

The core object of the nnsight package is `NNsight`. This wraps around a given pytorch model to enable the capabilites nnsight provides.

In [4]:
from nnsight import NNsight

model = NNsight(net)

  from .autonotebook import tqdm as notebook_tqdm


Pytorch models when printed, show a named hierarchy of modules which is very useful when accessing sub-components directly. NNsight models work just the same.

In [5]:
print(model)

Sequential(
  (layer1): Linear(in_features=5, out_features=10, bias=True)
  (layer2): Linear(in_features=10, out_features=2, bias=True)
)


Before we actually get to using the model we just created, let's talk about what a `context` is in Python.

Often enough when coding, you want to create some object, or initiate some logic, that you later want to destroy or conclude.

The most common application is opening files like the following example:

```python
with open('myfile.txt', 'r') as file:
  text = file.read()
```

Python uses the `with` keyword to enter a context-like object. This object defines logic to be ran at the start of the with block, and logic to be ran when exiting. In this case, entering the context opens a file and exiting the context closes it. Being within the context means we can read from file. Simple enough! Now we can discuss how `nnsight` uses contexts to enable powerful and intuitive access into the internals of model computation.


Introducing the tracing context. Just like before, something happens upon entering the tracing context, something happens when exiting, and being inside enables some functionality.

We enter the tracing context by calling `.trace(<input>)` on the `NNsight` model we created before. Entering it denotes we want to run the model given our input... but not yet! The model is only ran upon exiting the tracing context.

In [6]:
with model.trace(input) as tracer:
  pass

But where's the output? To get that, we'll have to learn how to request it from within the tracing conext.

## Getting

Earlier, when we wrapped out little net with the `NNsight` class, this added a couple properties on to each module in the model (including the root model itself). The ones we care about are `.input` and `.output`.

```python
model.input
model.output
```

The names are pretty self explainatory. They correspond to the inputs and outputs of their respective modules during some forward pass of an input through the model. These are what we're going to interact with in the tracing context.

However, remember how the model isnt executed until the end of the tracing context. So how can we access their inputs and outputs during computation from within the context? Well, we can't.

`.input` and `.output` are Proxies for the eventual inputs and outputs of a module. In other words, when you access `model.output` what you're communicating to `nnsight` is "When you compute the output of `model`, please grab it for me and put the value into it's corresponding Proxy object's `.value` attribute." Let's try just that:

In [7]:
# with model.trace(input) as tracer:

#   output = model.output

# print(output.value)

Oh no an error! "Accessing Proxy value before it's been set."

If `.value` isn't filled in after leaving the tracing context, accessing the value will give you this error.  In reality however, the value was filled in, it was just immediately removed. Why?

Proxy objects track their listeners (as in other Proxy object that rely on it), and when their listeners are all complete, it deletes the `.value` associated with the Proxy in order to save memory. To prevent this, we call `.save()` on the Proxy objects we want to access outisde of the tracing context:

In [8]:
with model.trace(input) as tracer:

  output = model.output.save()

print(output.value)

tensor([[-0.3527, -0.3015]])


Success! We now have the model output meaning ou just completed your first intervention request using Proxies.

These requests are handled at the soonest possible moment they can be completed. In this case, right after the model's output was computed. We call this process `interleaving`.

What else can we request? There's nothing special about the model itself vs it's submodules. Just like we saved the output of the model as a whole, we can save the output of any of it's submodules. To get to them we use normal Python attribute syntax, and we know where the modules are becuase we printed out the model earlier:

In [9]:
print(model)

Sequential(
  (layer1): Linear(in_features=5, out_features=10, bias=True)
  (layer2): Linear(in_features=10, out_features=2, bias=True)
)


In [10]:
with model.trace(input) as tracer:

  l1_output = model.layer1.output.save()

print(l1_output.value)

tensor([[ 0.2428,  0.8037, -0.5060, -0.5668, -0.0290,  0.8558, -0.5593, -0.1622,
          0.1340,  0.1576]])


Let's do the same for the input of layer2. While we're at it, let's also drop the `as tracer`, as we won't be needing the tracer object itself for a few sections:

In [11]:
with model.trace(input):

  l2_input = model.layer2.input.save()

print(l2_input.value)

((tensor([[ 0.2428,  0.8037, -0.5060, -0.5668, -0.0290,  0.8558, -0.5593, -0.1622,
          0.1340,  0.1576]]),), {})


<details>
  <summary>On module inputs</summary>

  ---

  Notice how the value for l2_input, was not just a single tensor.
  The type/shape of values from .input is in the form of:

      tuple(tuple(args), dictionary(kwargs))

  Where the first index of the tuple is itself a tuple of all positional arguments, and the second index is a dictionary of the keyword arguments.

  ---

</details>


Now that we can access activations, we also want to do some post-processing on it. Let's find out which dimension of layer1's output has the highest value.

## Functions, Methods, and Operations

We could do this by calling `torch.argmax(...)` after the tracing context... or we can just leverage the fact that `nnsight` handles functions and methods within the tracing context, by creating a Proxy request for it:

In [12]:
with model.trace(input):

  # Note we don't need to call .save() on the output,
  # as we're only using it's value within the tracing context.
  l1_output = model.layer1.output

  l1_amax = torch.argmax(l1_output, dim=1).save()

print(l1_amax[0])

tensor(5)


Nice! That worked seamlessly, but hold on, how come we didn't need to call `.value[0]` on the result? In previous sections, we were just being explicit to get an understaing of Proxies and their value. In practice however, `nnsight` knows that when outside of the tracing context we only care about the actual value, and so printing, indexing, and applying functions all immediately return and reflect the data in `.value`. So for the rest of the tutorial we won't use it.

The same principles work for methods and operations as well

In [13]:
with model.trace(input):

  value = (model.layer1.output.sum() + model.layer2.output.sum()).save()

print(value)

tensor(-0.2837)


Getting and analyzing the activations from various points in a model can be really insightful, and a number of ML techniques do exactly that. However, often times we not only want to view the computation of a model, but influence it as well.

## Setting

To demonstrate editing the flow of infomration through the model, let's set the first dimension of the first layer's output to 0. `NNsight` makes this really easy using the familiar assignment `=` operator:

In [14]:
with model.trace(input):

  # Save the output before the edit to compare.
  # Notice we
  l1_output_before = model.layer1.output.clone().save()

  # Access the 0th index of the hidden state dimension ans set it to 0.
  model.layer1.output[:, 0] = 0

  # Save the output after to see our edit.
  l1_output_after = model.layer1.output.save()

print(l1_output_before.value)
print(l1_output_after.value)

tensor([[ 0.2428,  0.8037, -0.5060, -0.5668, -0.0290,  0.8558, -0.5593, -0.1622,
          0.1340,  0.1576]])
tensor([[ 0.0000,  0.8037, -0.5060, -0.5668, -0.0290,  0.8558, -0.5593, -0.1622,
          0.1340,  0.1576]])


Note the use of `.clone()` to save the output of layer 1 before we applied an edit. Because `.save()` returns a reference to values *after* computation, we use `.clone()` to copy the value before edits are applied.

## Gradients

Awesome! Now we know how to access and intervent on a simple model's forward pass. How about the backward pass?

Lets declare the same toy model, this time with `.requires_grad_(True)`.

In [15]:
net = torch.nn.Sequential(OrderedDict([
    ('layer1', torch.nn.Linear(input_size, hidden_dims)),
    ('layer2', torch.nn.Linear(hidden_dims, output_size)),
]))

model = NNsight(net)

As a simple example, let's do a backward pass on the sum of the logits. We can save gradients just as we do activations, calling `.save()` on the gradients `.grad`.

In [18]:
with model.trace(input):
    l1_output = model.layer1.output.save()
    
    l1_gradients = model.layer1.output.grad.save()

    model.output.sum().backward()

print(l1_output)
print(l1_gradients)

tensor([[ 0.1573,  0.8419, -0.1930,  0.2304, -0.2569,  0.1245,  0.2125, -0.4922,
         -0.2421, -0.2142]], grad_fn=<AddmmBackward0>)
tensor([[ 0.3589,  0.2165,  0.5089, -0.5849,  0.0326,  0.1254,  0.0956, -0.2620,
          0.1203,  0.2614]])


Note two things about the example above:
1. Inference mode is off by default.
2. We can call `.backward()` on a value within the tracing context just like you normally would.

With that, we've learned the basics of getting and setting activations and gradients.

# Bigger

## LanguageModel

*I thought you said one trillion parameter models!*

Yes! NNsight provides the `LanguageModel` class which allows us to load models by their HuggingFace repo-id.

<details>
  <summary>On custom models</summary>

  ---

  You can also pass in custom models! Note that you'll have to pass in a tokenizer as well as the model. 

    model = LanguageModel(<CUSTOM_MODEL>, tokenizer=<TOKENIZER>, device_map="auto")

  ---

</details>

In [19]:
from nnsight import LanguageModel

model = LanguageModel('openai-community/gpt2', device_map="auto")

We can access the models tokenizer with `.tokenizer`, and it will return the default HuggingFace tokenizer for that `AutoModel` class. 

In [24]:
tokenizer = model.tokenizer

text = "NNsight is an API for transparent science on black box AI."

tokens = tokenizer.encode(text)
str_tokens = tokenizer.convert_ids_to_tokens(tokens)
decoded = tokenizer.decode(tokens)

print(tokens)
print(str_tokens)
print(decoded)

[6144, 18627, 318, 281, 7824, 329, 13245, 3783, 319, 2042, 3091, 9552, 13]
['NN', 'sight', 'Ġis', 'Ġan', 'ĠAPI', 'Ġfor', 'Ġtransparent', 'Ġscience', 'Ġon', 'Ġblack', 'Ġbox', 'ĠAI', '.']
NNsight is an API for transparent science on black box AI.


By default, the `LanguageModel` class will tokenize text inputs. Note that it does not prepend a `BOS` token. Let's test it out, and decode logits into tokens.

In [28]:
with model.trace(text):
    logits = model.output.logits
    tokens = logits.softmax(-1).argmax(-1).save()

decoded = tokenizer.decode(tokens[0])
print(decoded)

.. a open for thely. the holes..



Everything that we've done so far works exactly the same with LanguageModel - and across any NNsight model.

In [17]:
print(model)

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
  (generator): WrapperModule()
)


In [19]:
input = "NNsight is an API for transparent science on black box AI."

with model.trace(input):
    
    # Save attn output at layer 3 to 0
    l2_out = model.transformer.h[2].attn.output[0].save()

    # Set the input to l0 as l2_out 
    model.transformer.h[0].attn.input[0][0][:] = l2_out

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Note how we saved the output of a later layer, and set it as the value of an earlier layer's input. You can think of the operations within a context block as an unordered of instructions that execute on a model's forward pass. 

There are limits to this, however. Let's say we want to compare effect of an intervention on the model's logits.

In [40]:
with model.trace("test"):
    out_one = model.lm_head.output.clone().save()
    model.transformer.h[0].attn.output[0][:] += 0.5
    out_two = model.lm_head.output.save()

Even if we called `.clone()` on `model.output`, the intervention would be applied anyways because the unembed, `lm_head`, occurs after the layers.

We can use multiple invokes instead!

## Batching

Remember that `tracer` object that we dropped earlier? Well, we can call `.invoke()` on the `tracer` object within a context, and run multiple forward passes. 

This solves the problem from before! 

In [43]:
with model.trace() as tracer:
    with tracer.invoke("input") as invoker:
        out_one = model.lm_head.output.save()

    with tracer.invoke("input_tw") as invoker:
        model.transformer.h[0].attn.output[0][:] += 0.5
        out_two = model.lm_head.output.save()

print(out_one)
print(out_two)

tensor([[[-12.5967, -11.0741, -14.2588,  ..., -19.6397, -19.2086, -11.7618],
         [-16.2394, -11.5920, -14.8595,  ..., -24.8357, -23.7196, -16.7910],
         [-38.9201, -34.3693, -40.0272,  ..., -46.5929, -47.3829, -37.1189]]],
       device='cuda:0', grad_fn=<SliceBackward0>)
tensor([[[-31.6922, -30.7873, -34.1027,  ..., -39.5088, -39.5361, -32.2880],
         [-69.0204, -68.2261, -68.7214,  ..., -80.1794, -77.7888, -71.0367],
         [-84.8164, -82.7565, -86.1548,  ..., -95.8974, -94.3171, -85.1944]]],
       device='cuda:0', grad_fn=<SliceBackward0>)


The ability to call multiple invokes is actually pretty powerful. Tracer handles all operations as proxies, so we can freely pass values between `invoke` contexts.

In [29]:
input_one = "Three word phrase."
input_two = "Two words."

print(tokenizer.encode(input_one))
print(tokenizer.encode(input_two))

[12510, 1573, 9546, 13]
[7571, 2456, 13]


In [30]:
with model.trace() as tracer:

  with tracer.invoke(input_one):

    l2_output = model.transformer.h[2].attn.output[0]
    
    print(l2_output.shape)

  with tracer.invoke(input_two):

    model.transformer.h[2].attn.output[0][:] = l2_output
    
    print(model.transformer.h[2].attn.output[0].shape)

    output = model.output.save()

print(output.logits[:,-1,:])

torch.Size([1, 4, 768])
torch.Size([1, 3, 768])
tensor([[-115.6758, -115.0759, -115.5405,  ..., -123.3979, -123.5300,
         -109.5017],
        [  11.8462,   13.1548,    8.6907,  ...,   -1.0914,    0.4976,
           12.3454]], device='cuda:0', grad_fn=<SliceBackward0>)


## .next()

## Logit Lens

# I thought you said huge models?

## Remote execution

#Next Steps

# Advanced

Link to IOI patching
Link to function vectors

## IOI Patching

## Mamba Attention

#NOTES

In Graph, check module_proxy.proxy_value for tracer.invoker. If None, return node.value anyway. either its done or error!

Pass in input and immediately trace like .invoke from before.
No more tracer.output. Just access the output of the model as a proxy.

describe attribute access to get modules and call .output and .input

describe model is ran only at end of with block and proies are populated

describe you need .save() or else value is destroyed before you could access and it throws an error.

(note that .input is a tuple of (<args as tuple>, <kwargs as dict>)

In [None]:
with model.trace(input1):

  l2_input = model.layer2.input.save()

  output = model.output.save()

print(l2_input)
print(output)

((tensor([[0.4196, 0.4998, 0.6265, 0.3975, 0.6541, 0.6097, 0.4248, 0.3506, 0.6187,
         0.4576]], grad_fn=<SigmoidBackward0>),), {})
tensor([[0.6444, 0.4773]], grad_fn=<SigmoidBackward0>)


Notice no .value. You could still call it if you want to.
It's still a proxy, but when you do an operation on a completed proxy, it dosent create a new proxy it just returns the resultant value.

In [None]:
with model.trace(input1):

  output = model.output.save()

  print(type(output))
  print(output)

output = output * 2

print(type(output))
print(output)

<class 'nnsight.intervention.InterventionProxy'>
InterventionProxy (argument_1): FakeTensor(..., size=(1, 2), grad_fn=<SigmoidBackward0>)
<class 'torch.Tensor'>
tensor([[1.2888, 0.9547]], grad_fn=<MulBackward0>)


The Tracer object also allows attribute access for the model. THis is nice when you want to use NNsight inline to quickly trace a model.

In [None]:
with NNsight(net).trace(input1) as tracer:

  output = tracer.output.save()

print(output)

tensor([[0.6444, 0.4773]], grad_fn=<SigmoidBackward0>)


Can still do the full-featured multi-invoke version of trace like .forward before

Can use proxy variable in between invokes
It handles batching them for you and the shapes of values in them will be only of the size of the batch.

In [None]:
with model.trace() as tracer:

  with tracer.invoke(input2):

    l2_input = model.layer2.input

  with tracer.invoke(input1):

    model.layer2.input = l2_input

    output = model.layer2.output.save()

print(output)




tensor([[ 0.5843, -0.0754]], grad_fn=<SliceBackward0>)


If you just want to run the model and get the output, do trace=False

To tracing context, just output

(note if you pass inout directly to .trace, you can pass a dict of invoker_args for input preprocessing options like from tracer.invoke(...))

---



In [None]:
output = model.trace(input1, trace=False)

print(output)

tensor([[0.6444, 0.4773]], grad_fn=<SigmoidBackward0>)


The NNsight class is designed to be exended in order to enable tracing functionality for more complex models.

LangugeModel is one such extention.

Use hf repo_id to load and trace model

device_map is an underlying HF transformer arg to auto assign parameters to gpu

In [None]:
from nnsight import LanguageModel

In [None]:
model = LanguageModel('gpt2', device_map="auto")

print(model)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)


Works just like the simple case

In [None]:
with model.trace("Hello World!"):


  logits = model.lm_head.output.save()

One extention LanguageModel adds is to have the exact same tracing features as before but instead using the .generate method on HF transformer models

You can call .next() on modules to move the "pointer" to which iterations of .output and .input refer to.

In [None]:
with model.generate("The Eiffel Tower is in the city of", max_new_tokens=3) as tracer:

  logits_1 = model.lm_head.output.save()

  logits_2 = model.lm_head.next().output.save()

  logits_3 = model.lm_head.next().output.save()

print(logits1)
print(logits2)
print(logits3)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


IndexError: list index out of range