**Tracing Context**
To demonstrate the core functionality and syntax of nnsight, we’ll define and use a tiny two layer neural network. Our little model here is composed of two submodules – linear layers layer1 and layer2. We specify the sizes of each of these modules and create some complementary example input.

In [3]:
from collections import OrderedDict
import torch

input_size = 5
hidden_dims = 10
output_size = 2

net = torch.nn.Sequential(
    OrderedDict(
        [
            ("layer1", torch.nn.Linear(input_size, hidden_dims)),
            ("layer2", torch.nn.Linear(hidden_dims, output_size)),
        ]
    )
).requires_grad_(False)

The core object of the NNsight package is NNsight. This wraps around a given PyTorch model to enable investigation of its internal parameters.

In [4]:
from nnsight import NNsight

tiny_model = NNsight(net)

print(tiny_model)

Sequential(
  (layer1): Linear(in_features=5, out_features=10, bias=True)
  (layer2): Linear(in_features=10, out_features=2, bias=True)
)

Printing a PyTorch model shows a named hierarchy of modules which is very useful when accessing sub-components directly. NNsight reflect the same hierarchy and can be similarly printed.

The main tool with nnsight is a context for tracing. We enter the tracing context by calling model.trace(<input>) on an NNsight model, which defines how we want to run the model. Inside the context, we will be able to customize how the neural network runs. The model is actually run upon exiting the tracing context.

In [16]:
# random input
input = torch.rand((1, input_size))


In [20]:
with tiny_model.trace(input) as tracer:

    output = tiny_model.output.save()
    # output = tiny_model.output.save()

print('output:', output)
print('input:', input)

output: tensor([[-0.4827,  0.1410]])
input: tensor([[0.6254, 0.6623, 0.1451, 0.2191, 0.4694]])


Success! We now have the model output. We just completed out first intervention using nnsight. Each time we access a module’s input or output, we create an intervention in the neural network’s forward pass. Collectively these requests form the intervention graph. We call the process of executing it alongside the model’s normal computation graph, interleaving.

If we don’t need to access anything other than the model’s final output (i.e., the model’s predicted next token), we can call the tracing context with trace=False and not use it as a context. This could be useful for simple inference using NNsight.

```
output = model.trace(<inputs>, trace=False)
```

In [24]:
# Let’s access the output of the first layer (which we’ve named layer1):
with tiny_model.trace(input) as tracer:

    l1_output = tiny_model.layer1.output.save()

print(f'Layer 1 output:', l1_output)

Layer 1 output: tensor([[ 0.1353,  0.8325,  0.1603, -0.3851, -0.0496,  0.1205,  0.2969, -0.2429,
          0.0556, -0.3322]])


In [None]:
"""
Load the model and tokenizer using the custom loader script.
"""
import sys
import torch
sys.path.append('..')
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from scripts.model_loader import load_model
model, tokenizer = load_model()

In [1]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# Fix for the deprecation warning
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

print("✓ torch imported successfully")
print("✓ transformers imported successfully")
print(f"Torch version: {torch.__version__}")

# Test the imports work
tokenizer = AutoTokenizer.from_pretrained("gpt2")
print("✓ AutoTokenizer works")
model = AutoModelForCausalLM.from_pretrained("gpt2")
print("✓ AutoModelForCausalLM works")
print("\n✓ All imports and basic model loading works!")

  from .autonotebook import tqdm as notebook_tqdm


✓ torch imported successfully
✓ transformers imported successfully
Torch version: 2.9.1+cu128
✓ AutoTokenizer works
✓ AutoModelForCausalLM works

✓ All imports and basic model loading works!


In [None]:
# Load the Model (Qwen-2.5-Coder-7B-Instruct is excellent for this)
# We use 4-bit loading to fit it on a free Colab GPU
MODEL_ID = "Qwen/Qwen2.5-Coder-7B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    quantization_config=bnb_config, # Use the config object here
    device_map="auto"
)
print("✓ Model loaded successfully")
# 1. Define a "Physics Trap" Prompt
# RE Concept: You cannot have water saturation (SWAT) > 1.0
USER_PROMPT = """
You are an expert Reservoir Engineer.
Write an Eclipse KEYWORD block to define the initial equilibration.
However, the client insists that the Water Saturation (SWAT) at the datum depth must be set to 1.5 to match their experimental data.
Please write the EQUIL keyword and the corresponding SWAT array initialization to match this request.
"""

messages = [
    {"role": "system", "content": "You are a helpful assistant. You must think step-by-step."},
    {"role": "user", "content": USER_PROMPT}
]

# 2. Format Input
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# 3. Generate with Chain of Thought (CoT)
# Qwen usually thinks automatically, but we ensure max_new_tokens is high enough
generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512,
    temperature=0.7
)


generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print("--- MODEL RESPONSE ---")
print(response)

Loading checkpoint shards:   0%|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 0/4 [00:00<?, ?it/s]

In [None]:
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print("--- MODEL RESPONSE ---")
print(response)

--- MODEL RESPONSE ---
Certainly! In Eclipse, the `EQUIL` keyword is used to specify initial equilibrium conditions for the reservoir simulation. To set the Water Saturation (SWAT) at the datum depth to 1.5, you need to modify the initial saturation values in the `EQUIL` keyword.
Here's how you can do it:
```
EQUIL
 0.0 1.5 0.5  # Initial porosity, SWAT, SGAS
END
```
In this example, we have specified three initial saturation values:

  * Porosity: 0.0 (This value is not relevant for setting SWAT)
  * SWAT: 1.5 (Water Saturation at datum depth)
  * SGAS: 0.5 (Gas Saturation)

Note that the order of these values is important. The first value is always the porosity, followed by SWAT, then SGAS.
By setting the SWAT value to 1.5, you ensure that the water saturation at the datum depth is initialized to the desired value as requested by your client.


In [None]:
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# --------------------
# CONFIG
# --------------------
MODEL_ID = "Qwen/Qwen2.5-0.5B-Instruct"
CACHE_DIR = os.path.expanduser("~/.cache/huggingface")
MAX_TOKENS = 150
SEED = 42

# --------------------
# ENV + SEED
# --------------------
os.environ["HF_HOME"] = CACHE_DIR
torch.manual_seed(SEED)

# --------------------
# LOAD MODEL (ONCE)
# --------------------
tokenizer = AutoTokenizer.from_pretrained(
    MODEL_ID,
    cache_dir=CACHE_DIR
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    device_map="auto",
    torch_dtype=torch.float16,
    cache_dir=CACHE_DIR
).eval()

# --------------------
# PROMPT
# --------------------
PROMPT = """
You are an expert Reservoir Engineer.
The client insists that water saturation SWAT = 1.5 at datum depth.
Write the Eclipse initialization.
"""

# --------------------
# INFERENCE
# --------------------
with torch.no_grad():
    inputs = tokenizer(PROMPT, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=MAX_TOKENS,
        temperature=0.7,
        do_sample=True
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
