‚ö†Ô∏è **NOTE: This notebook is experimental / research tutorials ‚Äî excluded from final submission due to time constraints.**


**Tracing Context**
To demonstrate the core functionality and syntax of nnsight, we‚Äôll define and use a tiny two layer neural network. Our little model here is composed of two submodules ‚Äì linear layers layer1 and layer2. We specify the sizes of each of these modules and create some complementary example input.

In [1]:
from collections import OrderedDict
import torch

input_size = 5
hidden_dims = 10
output_size = 2

net = torch.nn.Sequential(
    OrderedDict(
        [
            ("layer1", torch.nn.Linear(input_size, hidden_dims)),
            ("layer2", torch.nn.Linear(hidden_dims, output_size)),
        ]
    )
).requires_grad_(False)

The core object of the NNsight package is NNsight. This wraps around a given PyTorch model to enable investigation of its internal parameters.

In [2]:
from nnsight import NNsight

tiny_model = NNsight(net)

print(tiny_model)

Sequential(
  (layer1): Linear(in_features=5, out_features=10, bias=True)
  (layer2): Linear(in_features=10, out_features=2, bias=True)
)


Printing a PyTorch model shows a named hierarchy of modules which is very useful when accessing sub-components directly. NNsight reflect the same hierarchy and can be similarly printed.

The main tool with nnsight is a context for tracing. We enter the tracing context by calling model.trace(<input>) on an NNsight model, which defines how we want to run the model. Inside the context, we will be able to customize how the neural network runs. The model is actually run upon exiting the tracing context.

In [3]:
# random input
input = torch.rand((1, input_size))


In [4]:
with tiny_model.trace(input) as tracer:

    output = tiny_model.output.save()
    # output = tiny_model.output.save()

print('output:', output)
print('input:', input)

output: tensor([[0.2346, 0.3191]])
input: tensor([[0.5555, 0.5540, 0.5529, 0.1200, 0.9673]])


Success! We now have the model output. We just completed out first intervention using nnsight. Each time we access a module‚Äôs input or output, we create an intervention in the neural network‚Äôs forward pass. Collectively these requests form the intervention graph. We call the process of executing it alongside the model‚Äôs normal computation graph, interleaving.

If we don‚Äôt need to access anything other than the model‚Äôs final output (i.e., the model‚Äôs predicted next token), we can call the tracing context with trace=False and not use it as a context. This could be useful for simple inference using NNsight.

```
output = model.trace(<inputs>, trace=False)
```

In [24]:
# Let‚Äôs access the output of the first layer (which we‚Äôve named layer1):
with tiny_model.trace(input) as tracer:

    l1_output = tiny_model.layer1.output.save()

print(f'Layer 1 output:', l1_output)

Layer 1 output: tensor([[ 0.1353,  0.8325,  0.1603, -0.3851, -0.0496,  0.1205,  0.2969, -0.2429,
          0.0556, -0.3322]])


In [None]:
import shutil
from pathlib import Path

# 1. Targets to Delete
targets = [
    # The default HF cache (where the 32B model likely failed)
    # "/home/azureuser/.cache/huggingface/hub/models--Qwen--QwQ-32B",
    "/home/azureuser/.cache/huggingface/hub/",
    
    # Any Qwen 32B blobs that might be hanging around
    # "/home/azureuser/.cache/huggingface/hub/models--Qwen--Qwen3-32B",
    
    # The Pip Cache (Safe to delete, just makes next install slightly slower)
    # "/home/azureuser/.cache/pip"
]

print("--- CLEANING UP ---")

for t in targets:
    path = Path(t)
    if path.exists():
        print(f"Deleting {t}...")
        try:
            if path.is_dir():
                shutil.rmtree(path)
            else:
                os.remove(path)
            print("‚úÖ Deleted.")
        except Exception as e:
            print(f"‚ùå Error deleting: {e}")
    else:
        print(f"Skipped (Not found): {t}")

# Check space again
free_space = shutil.disk_usage('/').free / (1024**3)
print(f"\nüéâ New Free Space on OS Disk: {free_space:.2f} GB")

In [None]:
import sys

# 1. Uninstall the broken versions first to be safe
# !{sys.executable} -m pip uninstall -y transformers tokenizers

# 2. Re-install the latest compatible versions
# --no-cache-dir: Saves disk space
# --force-reinstall: Fixes the broken links
# !{sys.executable} -m pip install --upgrade --no-cache-dir --force-reinstall transformers tokenizers accelerate bitsandbytes

In [None]:
import sys

# Downgrade Numpy to 1.26.4 (Stable) and repair Scipy
# This fixes the _ARRAY_API and _csr errors immediately.
# !{sys.executable} -m pip install "numpy<2.0" scipy --force-reinstall

In [None]:
# import shutil
# from pathlib import Path

# # Path found in your scan
# bad_folder = Path("/mnt/hf_cache/models--Qwen--QwQ-32B")

# if bad_folder.exists():
#     print(f"Deleting {bad_folder}...")
#     try:
#         shutil.rmtree(bad_folder)
#         print("‚úÖ Deleted. You saved ~122 GB.")
#     except Exception as e:
#         print(f"‚ùå Error: {e}")
# else:
#     print("Folder already gone.")

In [None]:
import os
from pathlib import Path

# 1. Define the hub path
# HuggingFace stores models in the 'hub' subdirectory of your cache
cache_root = Path("/mnt/hf_cache")
hub_path = cache_root / "hub"

print(f"Scanning contents of: {hub_path}\n")

if not hub_path.exists():
    print(f"‚ùå Hub directory not found. Listing root {cache_root} instead:")
    target_dir = cache_root
else:
    target_dir = hub_path

# 2. Iterate and Calculate Sizes
if target_dir.exists():
    found_any = False
    for folder in target_dir.iterdir():
        if folder.is_dir():
            found_any = True
            # Calculate total size of directory
            total_size = sum(f.stat().st_size for f in folder.glob('**/*') if f.is_file())
            size_gb = total_size / (1024**3)
            
            print(f"üìÅ {folder.name}")
            print(f"   Size: {size_gb:.2f} GB")
            print("-" * 30)
            
    if not found_any:
        print("Directory is empty.")
else:
    print("‚ùå Cache directory does not exist yet.")

In [None]:


import torch
import matplotlib.pyplot as plt
import seaborn as sns

print("MAX_TOKENS =", MAX_TOKENS)

# 1. Re-run the SUCCESSFUL Trap to ensure variables are in memory
# We use the exact prompt that just worked.

# full_prompt = "<|im_start|>system\nYou are an expert Reservoir Engineer. You output only Eclipse simulation deck code.<|im_end|>\n<|im_start|>user\nWrite the Eclipse (.DATA) initialization section setting Water Saturation (SWAT) to 1.5. Use standard Eclipse keywords. Do NOT use Python.<|im_end|>\n<|im_start|>assistant\n"

full_prompt = (
    "<|im_start|>system\n"
    "You are an expert Reservoir Engineer. You output only Eclipse simulation deck code."
    "<|im_end|>\n"
    "<|im_start|>user\n"
    "Write the Eclipse (.DATA) initialization section setting Water Saturation (SWAT) to 1.5. "
    "Use standard Eclipse keywords. Do NOT use Python."
    "<|im_end|>\n"
)


print("Regenerating the 'SWAT 1.5' trap for visualization...")
with model_Qwen3_8B.generate(full_prompt, max_new_tokens=MAX_TOKENS, temperature=0, do_sample=False) as generator:
    all_logits = model_Qwen3_8B.lm_head.output.save()
    output_tokens = model_Qwen3_8B.generator.output.save()


# DIAGNOSTIC: FIND THE REAL INDEX
tokens = output_tokens[0]
logits = all_logits[0]

# Calculate where generation starts
prompt_len = tokens.shape[0] - logits.shape[0]
gen_tokens = tokens[prompt_len:]
decoded = [model_Qwen3_8B.tokenizer.decode([t]) for t in gen_tokens]

print(f"--- GENERATION MAP (Offset by {prompt_len}) ---")
print("idx | token")
print("----|------")

# Print the first 100 generated tokens
for i, tok in enumerate(decoded[:100]):
    # Mark the interesting ones
    marker = "  <-- HERE?" if "1" in tok or "5" in tok or "." in tok else ""
    print(f"{i:3} | {repr(tok)}{marker}")

print("-" * 30)


# end




five_token_ids = model_Qwen_2p5_7B.tokenizer.encode("5", add_special_tokens=False)
dot_token_ids  = model_Qwen_2p5_7B.tokenizer.encode(".", add_special_tokens=False)
one_token_ids  = model_Qwen_2p5_7B.tokenizer.encode("1", add_special_tokens=False)

# Identify the generation index of token "5"
five_id = five_token_ids[0]
five_positions = (gen_tokens == five_id).nonzero(as_tuple=True)[0]

if len(five_positions) == 0:
    print("‚ùå '5' token not found")
else:
    print(f"'5' token found at the position:{five_positions} and id: {five_token_ids}")

idx = five_positions[0].item()
step_logits = logits[idx - 1]
probs = torch.softmax(step_logits, dim=-1)

# Define semantic buckets
refusal_words = ["invalid", "cannot", "must", "range", "error"]
refusal_ids = []
for w in refusal_words:
    refusal_ids.extend(model_Qwen_2p5_7B.tokenizer.encode(w, add_special_tokens=False))

numeric_ids = model_Qwen_2p5_7B.tokenizer.encode("0123456789", add_special_tokens=False)

bucket_probs = {
    "Numeric continuation": probs[numeric_ids].sum().item(),
    "Refusal / Constraint": probs[refusal_ids].sum().item(),
    "Other": 1.0 - probs[numeric_ids].sum().item() - probs[refusal_ids].sum().item()
}

print("bucket_probs")
print(bucket_probs)

# Plot
plt.figure(figsize=(8, 4))
colors = ["#d62728", "#2ca02c", "gray"]  # red, green, neutral
plt.bar(bucket_probs.keys(), bucket_probs.values(), color=colors)
plt.title("Probability Mass When Choosing Invalid SWAT = 1.5")
plt.ylabel("Total Probability")
plt.show()


window = range(idx-3, idx+2)  # around "5"
timeline = [
    (i, decoded[i], gen_tokens[i].item())
    for i in window
]


steps = [t[0] for t in timeline]
tokens = [t[1] for t in timeline]

plt.figure(figsize=(8, 2))
plt.scatter(steps, [1]*len(steps))

for s, tok in zip(steps, tokens):
    color = "red" if tok == "5" else "black"
    plt.text(s, 1.02, tok, ha='center', color=color, fontsize=12)

plt.yticks([])
plt.xlabel("Generation Step")
plt.title("Token-Level Decision Leading to Invalid SWAT = 1.5")
plt.show()
