# Motif Instance Visualization Demo

This notebook demonstrates the motif instance finding and Neuronpedia-style graph drawing.
We find specific motif instances in attribution graphs and draw the full graph
with the motif highlighted, showing clerp (auto-interp) labels on the key nodes.

**Layout mimics the [Neuronpedia graph viewer](https://www.neuronpedia.org/gemma-2-2b/graph):**
- X-axis = token position, Y-axis = transformer layer
- Embeddings (squares) at bottom, logits (pentagons) at top
- Layer labels on the left axis, prompt tokens along the bottom
- Motif nodes highlighted with role colors, clerp annotations
- Green edges = excitatory, red edges = inhibitory

In [None]:
import sys
sys.path.insert(0, "..")

from src.graph_loader import load_attribution_graph, graph_summary
from src.motif_census import (
    find_motif_instances,
    MOTIF_FFL, MOTIF_CHAIN, MOTIF_FAN_IN, MOTIF_FAN_OUT,
    TRIAD_LABELS, MOTIF_ROLES,
)
from src.visualization import plot_graph_with_motif, plot_top_motif

import matplotlib.pyplot as plt
%matplotlib inline

## 1. Load the Dallas multi-hop graph

In [None]:
g = load_attribution_graph("../data/raw/multihop/capital-state-dallas.json")

s = graph_summary(g)
print(f"Prompt: {s['prompt']}")
print(f"Nodes:  {s['n_nodes']}, Edges: {s['n_edges']}")
print(f"Node types: {s['node_type_counts']}")

## 2. Find feedforward loop (030T) instances

The feedforward loop (A->B, A->C, B->C) is the most important motif in biological
regulatory networks. In attribution graphs, it represents convergent evidence:
a regulator influences the target both directly and through a mediator.

In [None]:
ffl_instances = find_motif_instances(g, MOTIF_FFL, sort_by="weight")

print(f"Found {len(ffl_instances)} feedforward loop instances")
print(f"\nTop 5 by total edge weight:")
for i, inst in enumerate(ffl_instances[:5]):
    nodes_str = ", ".join(
        f"{g.vs[n]['clerp'][:30]} (L{g.vs[n]['layer']})"
        for n in inst.node_indices
    )
    print(f"  #{i+1}: weight={inst.total_weight:.1f}  [{nodes_str}]")

## 3. Plot the highest-weight FFL — full graph

The full graph has 1,700+ edges, so context edges form a dense cloud.
Motif nodes and edges are drawn on top with higher z-order.

In [None]:
fig, top_ffl = plot_top_motif(
    g, MOTIF_FFL, rank=0,
    title='Highest-Weight FFL — Full Graph (83 nodes, 1719 edges)',
    figsize=(18, 14),
)

print(f"Motif: {top_ffl.label}, weight: {top_ffl.total_weight:.1f}")
for node_idx, role in top_ffl.node_roles.items():
    print(f"  {role:12s}: {g.vs[node_idx]['clerp']} (layer {g.vs[node_idx]['layer']})")

plt.show()

## 4. Pruned graph — clearer motif visibility

Like Neuronpedia's pruning threshold slider, we can load the graph with
an edge weight threshold to remove weak connections. This makes the
motif structure much more visible.

In [None]:
# Prune to edges with |weight| >= 2.0
g_pruned = load_attribution_graph(
    "../data/raw/multihop/capital-state-dallas.json",
    weight_threshold=2.0,
)
print(f"Pruned graph: {g_pruned.vcount()} nodes, {g_pruned.ecount()} edges")

fig, inst_pruned = plot_top_motif(
    g_pruned, MOTIF_FFL, rank=0,
    title='Highest-Weight FFL — Pruned (weight threshold=2.0)',
    figsize=(18, 14),
)

print(f"Motif: {inst_pruned.label}, weight: {inst_pruned.total_weight:.1f}")
for node_idx, role in inst_pruned.node_roles.items():
    print(f"  {role:12s}: {g_pruned.vs[node_idx]['clerp']} (layer {g_pruned.vs[node_idx]['layer']})")

plt.show()

## 5. Chain (021C) instance

The chain motif (A->B->C) represents sequential processing — the
step-by-step information flow characteristic of multi-hop reasoning.

In [None]:
chain_instances = find_motif_instances(g_pruned, MOTIF_CHAIN, sort_by="weight")
print(f"Found {len(chain_instances)} chain instances in pruned graph")

fig, top_chain = plot_top_motif(
    g_pruned, MOTIF_CHAIN, rank=0,
    title='Highest-Weight Chain (021C) — Pruned',
    figsize=(18, 14),
)

print(f"Weight: {top_chain.total_weight:.1f}")
for node_idx, role in top_chain.node_roles.items():
    print(f"  {role:12s}: {g_pruned.vs[node_idx]['clerp']} (layer {g_pruned.vs[node_idx]['layer']})")

plt.show()

## 6. Safety refusal circuit

Safety refusal circuits have different structural patterns.
The bomb-baseline graph is sparser, making motif structure easier to see.

In [None]:
g_safety = load_attribution_graph("../data/raw/safety/bomb-baseline.json")
print(f"Safety graph: {g_safety.vcount()} nodes, {g_safety.ecount()} edges")
print(f"Prompt: {g_safety['prompt']}")

safety_ffls = find_motif_instances(g_safety, MOTIF_FFL, sort_by="weight")
print(f"FFL instances: {len(safety_ffls)}")

if safety_ffls:
    fig, inst = plot_top_motif(
        g_safety, MOTIF_FFL, rank=0,
        title="Highest-Weight FFL in Safety Refusal Circuit",
        figsize=(18, 14),
    )
    print(f"\nTop FFL weight: {inst.total_weight:.1f}")
    for node_idx, role in inst.node_roles.items():
        print(f"  {role:12s}: {g_safety.vs[node_idx]['clerp']} (layer {g_safety.vs[node_idx]['layer']})")
    plt.show()
else:
    print("No FFL instances found.")

## 7. Motif instance counts across task categories

In [None]:
graphs = {
    "multihop/dallas": "../data/raw/multihop/capital-state-dallas.json",
    "factual/opposite-small": "../data/raw/factual_recall/opposite_of_small.json",
    "safety/bomb-baseline": "../data/raw/safety/bomb-baseline.json",
    "creative/rabbit-poem": "../data/raw/creative/rabbit-poem.json",
}

print(f"{'Graph':<25s}  {'Nodes':>5s}  {'Edges':>5s}  {'FFLs':>5s}  {'Chains':>6s}  {'Fan-in':>6s}  {'Fan-out':>7s}")
print("-" * 75)

for name, path in graphs.items():
    try:
        gi = load_attribution_graph(path)
        n_ffl = len(find_motif_instances(gi, MOTIF_FFL))
        n_chain = len(find_motif_instances(gi, MOTIF_CHAIN))
        n_fan_in = len(find_motif_instances(gi, MOTIF_FAN_IN))
        n_fan_out = len(find_motif_instances(gi, MOTIF_FAN_OUT))
        print(f"{name:<25s}  {gi.vcount():>5d}  {gi.ecount():>5d}  {n_ffl:>5d}  {n_chain:>6d}  {n_fan_in:>6d}  {n_fan_out:>7d}")
    except Exception as e:
        print(f"{name:<25s}  Error: {e}")

## 8. Second-ranked FFL — comparing instances

In [None]:
fig, second_ffl = plot_top_motif(
    g_pruned, MOTIF_FFL, rank=1,
    title="Second-Highest-Weight FFL in Dallas Graph (pruned)",
    figsize=(18, 14),
)

print(f"Weight: {second_ffl.total_weight:.1f} (vs top: {inst_pruned.total_weight:.1f})")
for node_idx, role in second_ffl.node_roles.items():
    print(f"  {role:12s}: {g_pruned.vs[node_idx]['clerp']}")

plt.show()