<a href="https://colab.research.google.com/github/ndif-team/nnsight/blob/main/NNsight_Walkthrough.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<p align="center">
  <img src="https://nnsight.net/_static/images/nnsight_logo.svg" alt="nnsight" width="300"/>
</p>

# **NNsight Walkthrough**

## Interpret and Manipulate the Internals of Deep Learning Models

**nnsight** is a Python library that gives you full access to the internals of neural networks during inference. Whether you're running models locally or remotely via [NDIF](https://ndif.us/), nnsight lets you:

- **Access activations** at any layer during forward passes
- **Modify activations** to study causal effects  
- **Compute gradients** with respect to intermediate values
- **Batch interventions** across multiple inputs efficiently

This walkthrough will teach you nnsight from the ground up, starting with the core mental model and building to advanced features.

## Table of Contents

1. [Getting Started](#getting-started) - Setup and your first trace
2. [Accessing Activations](#accessing-activations) - Reading module outputs
3. [Modifying Activations](#modifying-activations) - Intervention basics
4. [Invokers and Batching](#invokers-and-batching) - Multiple inputs, serial interventions
5. [Multi-Token Generation](#multi-token-generation) - Iterating over generation steps
6. [Gradients](#gradients) - Accessing and modifying gradients
7. [Advanced Features](#advanced-features) - Source tracing, caching, barriers, and more
8. [Model Editing](#model-editing) - Persistent modifications
9. [Remote Execution](#remote-execution) - Running on NDIF

<a name="getting-started"></a>
# 1. Getting Started

Let's set up nnsight and run our first trace.

## Installation

In [None]:
# Install nnsight
!pip install nnsight
!pip install --upgrade transformers torch

from IPython.display import clear_output
clear_output()

## A Tiny Model

We'll start with a simple two-layer neural network to demonstrate the core concepts clearly.

In [None]:
from collections import OrderedDict
import torch

input_size = 5
hidden_dims = 10
output_size = 2

net = torch.nn.Sequential(
    OrderedDict([
        ("layer1", torch.nn.Linear(input_size, hidden_dims)),
        ("layer2", torch.nn.Linear(hidden_dims, output_size)),
    ])
).requires_grad_(False)

## Wrapping with NNsight

The `NNsight` class wraps any PyTorch model to enable intervention on its internals. It reflects the same module hierarchy as the original model:

In [None]:
import nnsight
from nnsight import NNsight

model = NNsight(net)

Printing a PyTorch model shows a named hierarchy of modules, which is very useful for knowing how to access sub-components directly. NNsight reflects the same hierarchy:


In [None]:
print(model)


## Python Contexts

Before we actually get to using the model, let's talk about Python contexts.

Python contexts define a scope using the `with` statement and are often used to create some object, or initiate some logic, that you later want to destroy or conclude.

The most common application is opening files:

```python
with open('myfile.txt', 'r') as file:
    text = file.read()
```

Python uses the `with` keyword to enter a context-like object. This object defines logic to be run at the start of the `with` block, as well as logic to be run when exiting. When using `with` for a file, entering the context opens the file and exiting the context closes it. Being within the context means we can read from the file.

Simple enough! Now we can discuss how nnsight uses contexts to enable intuitive access into the internals of a neural network.


<a name="accessing-activations"></a>
# 2. Accessing Activations

Now let's access the model's internals using the tracing context.

## The Tracing Context

The main tool in nnsight is a context for tracing. We enter the tracing context by calling `model.trace(<input>)` on an NNsight model, which defines how we want to run the model. Inside the context, we will be able to customize how the neural network runs. The model is actually run upon exiting the tracing context:

In [None]:
input = torch.rand((1, input_size))

with model.trace(input):
    # Your intervention code goes here
    # The model runs when the context exits
    pass

But where's the output? To get that, we'll have to learn how to request it from within the tracing context.

## The `.input` and `.output` Properties

When we wrapped our neural network with the `NNsight` class, this added a couple of properties to each module in the model (including the root model itself). The two most important ones are `.input` and `.output`:

```python
model.input   # The input to the model
model.output  # The output from the model
```

The names are self-explanatory. They correspond to the inputs and outputs of their respective modules during a forward pass. We can use these attributes inside the `with` block to access values at any point in the network.

Let's try accessing the model's output:

In [None]:
with model.trace(input):
    output = model.output

print(output)

Oh no, an error! "Accessing value before it's been set."

Why doesn't our `output` have a value? Values accessed inside a trace only exist during the trace. They will only persist after the context if we call `.save()` on them. This helps reduce memory costs - we only keep what we explicitly ask for.

## Saving Values with `.save()`

Adding `.save()` fixes the error:

In [None]:
with model.trace(input):
    output = model.output.save()

print(output)


Success! We now have the model output. We just completed our first intervention using nnsight.

The `.save()` method tells nnsight "I want to use this value after the trace ends."

> **üí° Tip:** There's also `nnsight.save(value)` which is the preferred alternative. It works on any value and doesn't require the object to have a `.save()` method:
> ```python
> output = nnsight.save(model.output)
> ```
> Both approaches work, but `nnsight.save()` is more explicit and works in more cases.


## Accessing Submodule Outputs

Just like we saved the model's output, we can access any submodule's output. Remember when we printed the model earlier? That showed us `layer1` and `layer2` - we can access those directly:

In [None]:
with model.trace(input):
    layer1_output = model.layer1.output.save()
    layer2_output = model.layer2.output.save()

print("Layer 1 output:", layer1_output)
print("Layer 2 output:", layer2_output)

## Accessing Module Inputs

We can also access the inputs to any module using `.input`:

| Property | Returns |
|----------|---------|
| `.output` | The module's return value |
| `.input` | The first positional argument to the module |
| `.inputs` | All inputs as `(args_tuple, kwargs_dict)` |

In [None]:
with model.trace(input):
    layer2_input = model.layer2.input.save()

print("Layer 2 input:", layer2_input)
print("(Notice it equals layer1 output!)")

## Operations on Values

Since you're working with real tensors, you can apply any PyTorch operations:

In [None]:
with model.trace(input):
    layer1_out = model.layer1.output
    
    # Apply operations - these are real tensor operations!
    max_idx = torch.argmax(layer1_out, dim=1).save()
    total = (model.layer1.output.sum() + model.layer2.output.sum()).save()

print("Max index:", max_idx)
print("Total:", total)

## The Core Paradigm: Interleaving

When you write intervention code inside a `with model.trace(...)` block, here's what actually happens:

1. **Your code is captured** - nnsight extracts the code inside the `with` block
2. **The code is compiled** into an executable function  
3. **Your code runs in parallel with the model** - as the model executes its forward pass, your intervention code runs alongside it
4. **Your code waits for values** - when you access `.output`, your code pauses until the model reaches that point
5. **The model provides values via hooks** - PyTorch hooks inject values into your waiting code
6. **Your code can modify values** - before the forward pass continues, you can change activations

This process is called **interleaving** - your intervention code and the model's forward pass take turns executing, synchronized at specific points (module inputs and outputs).

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Forward Pass (main)              Intervention Code (your code)     ‚îÇ
‚îÇ  ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ            ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ     ‚îÇ
‚îÇ                                                                     ‚îÇ
‚îÇ  model(input)                     # Your code starts                ‚îÇ
‚îÇ       ‚îÇ                                    ‚îÇ                        ‚îÇ
‚îÇ       ‚ñº                                    ‚ñº                        ‚îÇ
‚îÇ  layer1.forward()                 hs = model.layer1.output          ‚îÇ
‚îÇ       ‚îÇ                                    ‚îÇ                        ‚îÇ
‚îÇ       ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ hook provides value ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫‚îÇ                        ‚îÇ
‚îÇ       ‚îÇ                                    ‚îÇ                        ‚îÇ
‚îÇ       ‚îÇ‚óÑ‚îÄ‚îÄ‚îÄ your code continues ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ ‚îÇ                        ‚îÇ
‚îÇ       ‚îÇ     (can modify value)             ‚îÇ                        ‚îÇ
‚îÇ       ‚ñº                                    ‚ñº                        ‚îÇ
‚îÇ  layer2.forward()                 out = model.layer2.output         ‚îÇ
‚îÇ       ‚îÇ                                    ‚îÇ                        ‚îÇ
‚îÇ       ‚ñº                                    ‚ñº                        ‚îÇ
‚îÇ  return output                    # Your code finishes              ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

**Key insight:** 

Because your code waits for values as the forward pass progresses, you **must access modules in the order they execute**.

‚úÖ **Correct:** Access layer 0, then layer 5
```python
with model.trace("Hello"):
    layer0_out = model.layers[0].output.save()  # Waits for layer 0
    layer5_out = model.layers[5].output.save()  # Then waits for layer 5
```

‚ùå **Wrong:** Access layer 5, then layer 0
```python
with model.trace("Hello"):
    layer5_out = model.layers[5].output.save()  # Waits for layer 5
    layer0_out = model.layers[0].output.save()  # ERROR! Layer 0 already executed
    # Raises OutOfOrderError
```

When you try to access a module that has already executed, nnsight raises an `OutOfOrderError`. This is because the forward pass has already moved past that point - you missed your chance to intercept that value.

<a name="modifying-activations"></a>
# 3. Modifying Activations

Now for the powerful part - modifying activations during the forward pass.

## In-Place Modification

Use indexing with `[:]` for in-place modifications:

In [None]:
with model.trace(input):
    # Save original (clone first since we'll modify in-place)
    before = model.layer1.output.clone().save()
    
    # Zero out the first dimension
    model.layer1.output[:, 0] = 0
    
    # Save modified
    after = model.layer1.output.save()

print("Before:", before)
print("After: ", after)

## Replacement

You can also replace an output entirely:

In [None]:
with model.trace(input):
    original = model.layer1.output.clone()
    
    # Add noise to the activation
    noise = 0.1 * torch.randn_like(original)
    model.layer1.output = original + noise
    
    modified = model.layer1.output.save()

print("Modified output:", modified)

## Error Handling

If you make an error (like invalid indexing), nnsight provides clear error messages with line numbers:

In [None]:
# This will fail because hidden_dims=10, so valid indices are 0-9
try:
    with model.trace(input):
        model.layer1.output[:, hidden_dims] = 0  # Index 10 is out of bounds!
except IndexError as e:
    print("Caught error:", e)

You can toggle detailed error messages with `nnsight.CONFIG.APP.DEBUG`:

```python
nnsight.CONFIG.APP.DEBUG = True  # Enable (default)
nnsight.CONFIG.APP.DEBUG = False  # Disable
```

<a name="invokers-and-batching"></a>
# 4. Invokers and Batching

So far we've traced single inputs. But often we want to:
1. Process multiple inputs efficiently in one forward pass (batching)
2. Define separate sets of intervention logic that run serially

Both are achieved with **invokers**.

## The Invoker Context

When you call `.trace(input)`, nnsight actually creates two contexts:
1. The **tracer context** - manages the overall trace
2. The **invoker context** - defines the input and interventions for that input

You can create multiple invokers explicitly:

In [None]:
with model.trace() as tracer:  # No input here
    
    with tracer.invoke(input):  # First invoker with input
        out1 = model.output.save()
    
    with tracer.invoke(input * 2):  # Second invoker with different input
        out2 = model.output.save()

print("Output 1:", out1)
print("Output 2:", out2)

## How Invokers Work: Serial Execution

Each invoker runs its intervention code **serially** - one after another. They wait for values and execute in order:

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Invoke 1 starts           ‚îÇ  Invoke 2 starts (after 1 finishes) ‚îÇ
‚îÇ       ‚îÇ                    ‚îÇ       ‚îÇ                             ‚îÇ
‚îÇ       ‚ñº                    ‚îÇ       ‚ñº                             ‚îÇ
‚îÇ  Wait for layer1.output    ‚îÇ  Wait for layer1.output             ‚îÇ
‚îÇ       ‚îÇ                    ‚îÇ       ‚îÇ                             ‚îÇ
‚îÇ       ‚ñº                    ‚îÇ       ‚ñº                             ‚îÇ
‚îÇ  Wait for layer2.output    ‚îÇ  Wait for layer2.output             ‚îÇ
‚îÇ       ‚îÇ                    ‚îÇ       ‚îÇ                             ‚îÇ
‚îÇ       ‚ñº                    ‚îÇ       ‚ñº                             ‚îÇ
‚îÇ  Invoke 1 finishes         ‚îÇ  Invoke 2 finishes                  ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

This serial execution means you can **reference values from earlier invokes**:

In [None]:
with model.trace() as tracer:
    
    with tracer.invoke(input):
        # Save layer1 output from first input
        layer1_out = model.layer1.output.save()
    
    with tracer.invoke(input):
        # Use it in the second invoker!
        model.layer1.output[:] = layer1_out * 0  # Zero it out
        modified_out = model.output.save()
    
    with tracer.invoke(input):
        original_out = model.output.save()

print("Original output:", original_out)
print("Modified output:", modified_out)

## Batching: Efficient Multi-Input Processing

When you have multiple invokers, their inputs are **batched together** into one forward pass. This is much more efficient than running separate traces:

```python
# Inefficient: 2 forward passes
with model.trace(input1):
    out1 = model.output.save()
with model.trace(input2):
    out2 = model.output.save()

# Efficient: 1 forward pass (batched)
with model.trace() as tracer:
    with tracer.invoke(input1):
        out1 = model.output.save()
    with tracer.invoke(input2):
        out2 = model.output.save()
```

Each invoker's intervention code only sees its own slice of the batch.

## Prompt-less Invokers: Operating on the Full Batch

You can call `.invoke()` with no arguments to create an invoker that operates on **all inputs**:

In [None]:
with model.trace() as tracer:
    with tracer.invoke(input):
        pass  # First input
    
    with tracer.invoke(input * 2):
        pass  # Second input
    
    # No-arg invoke: sees ALL inputs
    with tracer.invoke():
        all_outputs = model.output.save()

print("All outputs shape:", all_outputs.shape)  # Batch size = 2

Prompt-less invokers are useful when you want to:
- Apply the same intervention to all inputs
- Collect all outputs together
- Add another serial intervention set without changing the input

<a name="multi-token-generation"></a>
# 5. Multi-Token Generation

Let's scale up to a real language model and handle multi-token generation.

## Loading a Language Model

`LanguageModel` is a subclass of `NNsight` with built-in tokenization and generation support:

In [None]:
from nnsight import LanguageModel

llm = LanguageModel("openai-community/gpt2", device_map="auto", dispatch=True)

print(llm)

With `LanguageModel`, you can pass strings directly - tokenization happens automatically:

In [None]:
with llm.trace("The Eiffel Tower is in the city of"):
    hidden_states = llm.transformer.h[-1].output[0].save()
    logits = llm.lm_head.output.save()

print("Hidden states shape:", hidden_states.shape)
print("Predicted token:", llm.tokenizer.decode(logits[0, -1].argmax()))

## Generation with `.generate()`

For multi-token generation, use `.generate()` instead of `.trace()`:

In [None]:
with llm.generate("The Eiffel Tower is in", max_new_tokens=3) as tracer:
    output = llm.generator.output.save()

print(llm.tokenizer.decode(output[0]))

## Iterating Over Generation Steps

During generation, the same modules are called multiple times (once per token). Use `tracer.iter` to iterate over these steps:

In [None]:
with llm.generate("The Eiffel Tower is in", max_new_tokens=3) as tracer:
    tokens_per_step = list().save()
    
    # Iterate over ALL generation steps
    with tracer.iter[:]:
        token = llm.lm_head.output[0, -1].argmax(dim=-1)
        tokens_per_step.append(token)

print("Tokens:", llm.tokenizer.batch_decode(tokens_per_step))

## Iteration Patterns

`tracer.iter` accepts slices, indices, or lists:

| Pattern | Meaning |
|---------|---------|
| `tracer.iter[:]` | All steps |
| `tracer.iter[0]` | First step only |
| `tracer.iter[1:3]` | Steps 1 and 2 |
| `tracer.iter[::2]` | Every other step |

## Conditional Per-Step Interventions

Use `as step_idx` to get the current step index for conditional logic:

In [None]:
with llm.generate("Hello", max_new_tokens=5) as tracer:
    hidden_states = list().save()
    
    with tracer.iter[:] as step_idx:
        # Only intervene on step 2
        if step_idx == 2:
            llm.transformer.h[0].output[0][:] = 0
        
        hidden_states.append(llm.transformer.h[-1].output[0].clone())

print(f"Collected {len(hidden_states)} hidden states")

## Shorthand: `tracer.all()`

`tracer.all()` is equivalent to `tracer.iter[:]`:

In [None]:
with llm.generate("Hello", max_new_tokens=3) as tracer:
    tokens = list().save()
    
    with tracer.all():
        tokens.append(llm.lm_head.output[0, -1].argmax(dim=-1))

print("All tokens:", llm.tokenizer.batch_decode(tokens))

## Manual Step Advancement: `.next()`

For finer control, use `.next()` to manually advance through steps:

In [None]:
with llm.generate("Hello", max_new_tokens=3) as tracer:
    # Step 0 (automatic)
    hs0 = llm.transformer.h[-1].output[0].save()
    
    tracer.next()  # Move to step 1
    
    hs1 = llm.transformer.h[-1].output[0].save()

print("Step 0 shape:", hs0.shape)
print("Step 1 shape:", hs1.shape)

<a name="gradients"></a>
# 6. Gradients

nnsight supports gradient access and modification through a backward tracing context.

## The `with tensor.backward():` Context

To access gradients, use the tensor's `.backward()` as a context manager:

In [None]:
with llm.trace("Hello"):
    # Get hidden states and enable gradients
    hs = llm.transformer.h[-1].output[0]
    hs.requires_grad_(True)
    
    # Compute loss
    logits = llm.lm_head.output
    loss = logits.sum()
    
    # Access gradients inside backward context
    with loss.backward():
        grad = hs.grad.save()

print("Gradient shape:", grad.shape)

**Important rules for gradients:**

1. `.grad` is only accessible **inside** a `with tensor.backward():` context
2. `.grad` is a property of **tensors**, not modules
3. Access the tensor's `.output` **before** entering the backward context

## Modifying Gradients

You can modify gradients just like activations:

In [None]:
with llm.trace("Hello"):
    hs = llm.transformer.h[-1].output[0]
    hs.requires_grad_(True)
    
    logits = llm.lm_head.output
    loss = logits.sum()
    
    with loss.backward():
        # Save original gradient
        original_grad = hs.grad.clone().save()
        
        # Modify gradient
        hs.grad[:] = 0
        
        # Save modified
        modified_grad = hs.grad.save()

print("Original grad mean:", original_grad.mean().item())
print("Modified grad mean:", modified_grad.mean().item())

<a name="advanced-features"></a>
# 7. Advanced Features

Let's explore some powerful advanced features.

## 7.1 Source Tracing

Sometimes you need to access values **inside** a module's forward pass, not just its inputs and outputs. The `.source` property enables this by rewriting the forward method to hook every operation.

In [None]:
# Print source to discover available operations
print(llm.transformer.h[0].attn.source)

In [None]:
with llm.trace("Hello"):
    # Access an internal operation by name
    attn_output = llm.transformer.h[0].attn.source.attention_interface_0.output.save()

print("Attention output type:", type(attn_output))

Source operations have the same interface as modules: `.output`, `.input`, `.inputs`. You can even trace into nested calls with recursive `.source`.

## 7.2 Caching Activations

Use `tracer.cache()` to automatically save all module outputs:

In [None]:
with llm.trace("Hello") as tracer:
    cache = tracer.cache()

# Access cached values after the trace
print("Layer 0 output shape:", cache['model.transformer.h.0'].output[0].shape)

# Attribute-style access also works
print("Same thing:", cache.model.transformer.h[0].output[0].shape)

Cache options:
```python
cache = tracer.cache(
    include_inputs=True,   # Also cache inputs
    include_output=True,   # Cache outputs (default)
    modules=[model.layer1, model.layer2],  # Specific modules only
)
```

## 7.3 Barriers for Synchronization

When sharing values between invokes, sometimes you need to ensure one invoke has computed a value before another uses it. Use `tracer.barrier()`:

In [None]:
with llm.trace() as tracer:
    barrier = tracer.barrier(2)  # Create barrier for 2 participants
    
    with tracer.invoke("The Eiffel Tower is in"):
        embeddings = llm.transformer.wte.output
        barrier()  # Wait here
    
    with tracer.invoke("_ _ _ _ _ _ _"):
        barrier()  # Wait here too
        llm.transformer.wte.output = embeddings  # Now safe to use
        out = llm.lm_head.output.argmax(dim=-1).save()

print("Prediction:", llm.tokenizer.decode(out[0, -1]))

## 7.4 Getting the Trace Result

Use `tracer.result()` to get the final output of the traced function:

In [None]:
with llm.trace("Hello") as tracer:
    hidden = llm.transformer.h[-1].output[0].save()
    result = tracer.result().save()

print("Result keys:", result.keys() if hasattr(result, 'keys') else type(result))

## 7.5 Module Skipping

Skip a module's execution entirely and substitute your own value:

In [None]:
with llm.trace("Hello"):
    # Get layer 0 output
    layer0_out = llm.transformer.h[0].output
    
    # Skip layer 1 - use layer 0's output instead
    llm.transformer.h[1].skip(layer0_out)
    
    # Layer 1's output now equals layer 0's output
    layer1_out = llm.transformer.h[1].output.save()

print("Skipped layer 1 - output substituted")

## 7.6 Early Stopping

Stop execution early when you only need early layers:

In [None]:
with llm.trace("Hello") as tracer:
    layer0 = llm.transformer.h[0].output[0].save()
    tracer.stop()  # Don't execute remaining layers

print("Early stop - only ran first layer")
print("Layer 0 shape:", layer0.shape)

## 7.7 Scanning (Shape Inference)

Get shapes without running the full model:

In [None]:
with llm.scan("Hello"):
    dim = llm.transformer.h[0].output[0].shape[-1]

print("Hidden dimension:", dim)

<a name="model-editing"></a>
# 8. Model Editing

Create persistent model modifications that apply to all future traces.

In [None]:
# First, get hidden states that predict "Paris"
with llm.trace("The Eiffel Tower is in the city of"):
    paris_hidden = llm.transformer.h[-1].output[0][:, -1, :].save()

# Create an edited model that always uses these hidden states
with llm.edit() as llm_edited:
    llm.transformer.h[-1].output[0][:, -1, :] = paris_hidden

# Original model: normal prediction
with llm.trace("Vatican is in the city of"):
    original = llm.lm_head.output.argmax(dim=-1).save()

# Edited model: always predicts "Paris"!
with llm_edited.trace("Vatican is in the city of"):
    modified = llm.lm_head.output.argmax(dim=-1).save()

print("Original:", llm.tokenizer.decode(original[0, -1]))
print("Edited:  ", llm.tokenizer.decode(modified[0, -1]))

## In-Place Editing

For edits that modify the original model:

```python
with llm.edit(inplace=True):
    llm.transformer.h[0].output[0][:] = 0
```

Clear in-place edits with:
```python
llm.clear_edits()
```

<a name="remote-execution"></a>
# 9. Remote Execution (NDIF)

nnsight can run interventions on large models hosted by the National Deep Inference Fabric (NDIF). Everything works the same - just add `remote=True`.

## Setup

Get your API key at https://login.ndif.us, then configure:

In [None]:
from nnsight import CONFIG

CONFIG.set_default_api_key("YOUR_API_KEY")

Check available models at https://nnsight.net/status/

## Remote Tracing

Load a large model and run remotely:

In [None]:
import os
os.environ['HF_TOKEN'] = "YOUR_HUGGING_FACE_TOKEN"

llama = LanguageModel("meta-llama/Meta-Llama-3.1-8B")

# Just add remote=True - everything else is the same!
with llama.trace("The Eiffel Tower is in the city of", remote=True):
    hidden_states = llama.model.layers[-1].output.save()
    output = llama.output.save()

print("Hidden states shape:", hidden_states[0].shape)

## Sessions

Group multiple traces for efficiency:

In [None]:
with llama.session(remote=True) as session:
    
    with llama.trace("The Eiffel Tower is in the city of"):
        hs = llama.model.layers[31].output[0][:, -1, :]
        t1_out = llama.lm_head.output.argmax(dim=-1).save()
    
    with llama.trace("Buckingham Palace is in the city of"):
        llama.model.layers[1].output[0][:, -1, :] = hs
        t2_out = llama.lm_head.output.argmax(dim=-1).save()

print("T1:", llama.tokenizer.decode(t1_out[0, -1]))
print("T2:", llama.tokenizer.decode(t2_out[0, -1]))

## Streaming with `nnsight.local()`

Execute operations locally during remote execution:

In [None]:
with llama.trace("Hello", remote=True) as tracer:
    
    with nnsight.local():
        # This runs on your machine, not the server
        hs = llama.model.layers[-1].output[0]
        print("Local computation:", hs[0, 0, 0])  # Runs locally
    
    out = llama.lm_head.output.save()

# Next Steps

Congratulations! You've learned the core concepts of nnsight:

- **The interleaving paradigm** - your code runs alongside the model, waiting for values
- **Accessing and modifying activations** - `.output`, `.input`, `.save()`
- **Invokers and batching** - efficient multi-input processing
- **Multi-token generation** - `.iter`, `.all()`, `.next()`
- **Gradients** - `with tensor.backward():`
- **Advanced features** - source tracing, caching, barriers, and more
- **Remote execution** - running on NDIF

For more tutorials implementing classic interpretability techniques, visit [nnsight.net/tutorials](https://nnsight.net/tutorials).

For deep technical details, see the [NNsight.md](https://github.com/ndif-team/nnsight/blob/main/NNsight.md) document.

# Getting Involved!

Both nnsight and NDIF are in active development. Join us:

- **Discord:** [discord.gg/6uFJmCSwW7](https://discord.gg/6uFJmCSwW7)
- **Forum:** [discuss.ndif.us](https://discuss.ndif.us/)
- **Twitter/X:** [@ndif_team](https://x.com/ndif_team)
- **LinkedIn:** [National Deep Inference Fabric](https://www.linkedin.com/company/national-deep-inference-fabric/)

We'd love to hear about your work using nnsight! üíü