# PropFlow Analyzer: Complete Tutorial

This notebook provides a comprehensive guide to using PropFlow's analyzer module for capturing, analyzing, and visualizing belief propagation dynamics.

## Table of Contents
1. [Setup and Imports](#setup)
2. [Basic Snapshot Recording](#basic-recording)
3. [Creating a Factor Graph](#factor-graph)
4. [Recording BP Execution](#recording-execution)
5. [Analyzing Snapshots](#analyzing-snapshots)
6. [Visualizing Argmin Trajectories](#visualization)
7. [Advanced Analysis: Message Flow](#message-flow)
8. [Convergence Detection](#convergence)
9. [Saving and Loading Snapshots](#persistence)
10. [Comparative Analysis](#comparative-analysis)

## 1. Setup and Imports <a name="setup"></a>

First, let's import all the necessary modules from PropFlow and the analyzer.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import json
from pathlib import Path
import pandas as pd


# PropFlow core imports
from propflow import (
    FactorGraph,
    VariableAgent,
    FactorAgent,
    BPEngine,
    DampingEngine,
    SplitEngine,
    FGBuilder,
    CTFactory,
)

# Analyzer imports
from analyzer.snapshot_recorder import EngineSnapshotRecorder, MessageSnapshot
from analyzer.snapshot_visualizer import SnapshotVisualizer

# Set random seed for reproducibility
np.random.seed(42)

# Configure matplotlib
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['figure.dpi'] = 100

print("✓ All imports successful!")

## 2. Basic Snapshot Recording <a name="basic-recording"></a>

Let's start with a simple example to understand how the `EngineSnapshotRecorder` works.

In [None]:
# Create a simple 2-variable problem
v1 = VariableAgent("X1", domain=3)
v2 = VariableAgent("X2", domain=3)

# Define a simple cost table that prefers matching values
cost_table = np.array([
    [0, 5, 10],   # X1=0 with X2=0,1,2
    [5, 0, 5],    # X1=1 with X2=0,1,2
    [10, 5, 0]    # X1=2 with X2=0,1,2
])

def fixed_table(**kwargs):
    return cost_table

factor = FactorAgent("F_12", domain=3, ct_creation_func=fixed_table)
fg_simple = FactorGraph([v1, v2], [factor], edges={factor: [v1, v2]})

print("Simple factor graph created:")
print(f"  Variables: {[v.name for v in fg_simple.variables]}")
print(f"  Factors: {[f.name for f in fg_simple.factors]}")
print(f"\nCost table (prefers matching values):")
print(cost_table)

## 3. Creating a Factor Graph <a name="factor-graph"></a>

Let's create a more interesting factor graph - a cycle graph with 6 variables.

In [None]:
# Build a cycle (ring) graph
num_vars = 6
domain_size = 4

fg_cycle = FGBuilder.build_cycle_graph(
    num_vars=num_vars,
    domain_size=domain_size,
    ct_factory=CTFactory.random_int.fn,
    ct_params={"low": 0, "high": 20}
)

print(f"Cycle graph created:")
print(f"  Number of variables: {len(fg_cycle.variables)}")
print(f"  Number of factors: {len(fg_cycle.factors)}")
print(f"  Domain size: {domain_size}")
print(f"\nVariables: {[v.name for v in fg_cycle.variables]}")
print(f"Factors: {[f.name for f in fg_cycle.factors]}")

## 4. Recording BP Execution <a name="recording-execution"></a>

Now let's run belief propagation and record every step of the execution.

In [None]:
# Create a BP engine
engine = BPEngine(factor_graph=fg_cycle)

# Wrap the engine with the recorder
recorder = EngineSnapshotRecorder(engine)

# Record 20 iterations
max_steps = 20
snapshots = recorder.record_run(max_steps=max_steps)

print(f"✓ Recorded {len(snapshots)} iterations")
print(f"\nFirst snapshot structure:")
print(f"  Keys: {list(snapshots[0].keys())}")
print(f"  Step: {snapshots[0]['step']}")
print(f"  Number of messages: {len(snapshots[0]['messages'])}")
print(f"  Assignments: {snapshots[0]['assignments']}")
print(f"  Cost: {snapshots[0]['cost']:.2f}")

## 5. Analyzing Snapshots <a name="analyzing-snapshots"></a>

Let's explore the captured data in detail.

In [None]:
# Extract cost trajectory
costs = [snap['cost'] for snap in snapshots]
steps = [snap['step'] for snap in snapshots]

# Plot cost convergence
plt.figure(figsize=(10, 5))
plt.plot(steps, costs, marker='o', linewidth=2, markersize=6)
plt.xlabel('Iteration', fontsize=12)
plt.ylabel('Global Cost', fontsize=12)
plt.title('Cost Convergence Over Time', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"Initial cost: {costs[0]:.2f}")
print(f"Final cost: {costs[-1]:.2f}")
print(f"Cost reduction: {costs[0] - costs[-1]:.2f} ({100*(costs[0]-costs[-1])/costs[0]:.1f}%)")

### 5.1 Message Statistics

Analyze the message flow throughout the execution.

In [None]:
# Collect message statistics
message_counts = [len(snap['messages']) for snap in snapshots]
neutral_counts = [snap['neutral_messages'] for snap in snapshots]
neutral_ratio = [n/m if m > 0 else 0 for n, m in zip(neutral_counts, message_counts)]

# Create a DataFrame for easy viewing
df_stats = pd.DataFrame({
    'Step': steps,
    'Cost': costs,
    'Total Messages': message_counts,
    'Neutral Messages': neutral_counts,
    'Neutral Ratio': neutral_ratio
})

print("Message Statistics Summary:")
print(df_stats.head(10))
print("\n...\n")
print(df_stats.tail(5))

In [None]:
# Plot neutral message ratio
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Neutral messages over time
axes[0].plot(steps, neutral_counts, marker='o', color='coral', label='Neutral Messages')
axes[0].plot(steps, message_counts, marker='s', color='skyblue', alpha=0.6, label='Total Messages')
axes[0].set_xlabel('Iteration')
axes[0].set_ylabel('Message Count')
axes[0].set_title('Message Counts Over Time')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Neutral ratio
axes[1].plot(steps, neutral_ratio, marker='o', color='purple')
axes[1].set_xlabel('Iteration')
axes[1].set_ylabel('Neutral Message Ratio')
axes[1].set_title('Proportion of Neutral Messages')
axes[1].grid(True, alpha=0.3)
axes[1].set_ylim([0, 1.05])

plt.tight_layout()
plt.show()

### 5.2 Assignment Trajectory

Track how variable assignments change over iterations.

In [None]:
# Extract assignment trajectories
variables = sorted(snapshots[0]['assignments'].keys())
assignment_trajectories = {var: [] for var in variables}

for snap in snapshots:
    for var in variables:
        assignment_trajectories[var].append(snap['assignments'].get(var, None))

# Plot assignment changes
plt.figure(figsize=(12, 6))
for var in variables:
    plt.plot(steps, assignment_trajectories[var], marker='o', label=var, linewidth=2, markersize=5)

plt.xlabel('Iteration', fontsize=12)
plt.ylabel('Assignment Value', fontsize=12)
plt.title('Variable Assignments Over Time', fontsize=14, fontweight='bold')
plt.legend(loc='best', ncol=2)
plt.grid(True, alpha=0.3)
plt.yticks(range(domain_size))
plt.tight_layout()
plt.show()

# Check for convergence (stable assignments)
stable_from = None
for i in range(1, len(snapshots)):
    if all(assignment_trajectories[var][i] == assignment_trajectories[var][i-1] for var in variables):
        if stable_from is None:
            stable_from = i
    else:
        stable_from = None

if stable_from is not None:
    print(f"\n✓ Assignments stabilized at iteration {stable_from}")
else:
    print(f"\n⚠ Assignments did not stabilize within {max_steps} iterations")

## 6. Visualizing Argmin Trajectories <a name="visualization"></a>

Use the `SnapshotVisualizer` to visualize belief argmin trajectories.

In [None]:
# Create visualizer from snapshots
visualizer = SnapshotVisualizer.from_object(snapshots)

# Get list of variables
viz_variables = visualizer.variables()
print(f"Variables available for visualization: {viz_variables}")

# Get argmin series
argmin_series = visualizer.argmin_series()

print("\nArgmin trajectories:")
for var, series in argmin_series.items():
    print(f"  {var}: {series}")

In [None]:
# Plot argmin trajectories per variable
visualizer.plot_argmin_per_variable(show=True, figsize=(12, 8))

### 6.1 Selective Variable Visualization

You can also visualize specific variables of interest.

In [None]:
# Visualize only first 3 variables
selected_vars = viz_variables[:3]
print(f"Visualizing: {selected_vars}")

visualizer.plot_argmin_per_variable(
    vars_filter=selected_vars,
    show=True,
    figsize=(12, 6)
)

## 7. Advanced Analysis: Message Flow <a name="message-flow"></a>

Dive deeper into the message passing dynamics.

In [None]:
# Analyze a specific iteration in detail
iteration = 5
snap = snapshots[iteration]

print(f"\n=== Detailed Analysis of Iteration {iteration} ===")
print(f"\nGlobal Cost: {snap['cost']:.2f}")
print(f"Assignments: {snap['assignments']}")
print(f"\nMessage Flow:")

# Group messages by flow direction
v2f_messages = [msg for msg in snap['messages'] if msg['flow'] == 'variable_to_factor']
f2v_messages = [msg for msg in snap['messages'] if msg['flow'] == 'factor_to_variable']

print(f"  Variable → Factor: {len(v2f_messages)} messages")
print(f"  Factor → Variable: {len(f2v_messages)} messages")

# Show sample messages
print(f"\nSample Factor → Variable messages:")
for i, msg in enumerate(f2v_messages[:3]):
    print(f"\n  Message {i+1}:")
    print(f"    {msg['sender']} → {msg['recipient']}")
    print(f"    Values: {[f'{v:.2f}' for v in msg['values']]}")
    print(f"    Argmin: {msg['argmin_index']} (value: {msg['values'][msg['argmin_index']]:.2f})")
    print(f"    Neutral: {msg['neutral']}")

In [None]:
# Visualize message value distributions at a specific iteration
iteration = 10
snap = snapshots[iteration]

fig, axes = plt.subplots(2, 1, figsize=(12, 8))

# Variable to Factor messages
v2f_msgs = [msg for msg in snap['messages'] if msg['flow'] == 'variable_to_factor']
for i, msg in enumerate(v2f_msgs[:4]):  # Show first 4 messages
    axes[0].plot(msg['values'], marker='o', label=f"{msg['sender']}→{msg['recipient']}")
axes[0].set_xlabel('Domain Value')
axes[0].set_ylabel('Message Value')
axes[0].set_title(f'Variable → Factor Messages (Iteration {iteration})')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Factor to Variable messages
f2v_msgs = [msg for msg in snap['messages'] if msg['flow'] == 'factor_to_variable']
for i, msg in enumerate(f2v_msgs[:4]):  # Show first 4 messages
    axes[1].plot(msg['values'], marker='s', label=f"{msg['sender']}→{msg['recipient']}")
axes[1].set_xlabel('Domain Value')
axes[1].set_ylabel('Message Value')
axes[1].set_title(f'Factor → Variable Messages (Iteration {iteration})')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 8. Convergence Detection <a name="convergence"></a>

Analyze convergence patterns and detect when the algorithm stabilizes.

In [None]:
# Detect cost convergence
cost_changes = [abs(costs[i] - costs[i-1]) for i in range(1, len(costs))]

plt.figure(figsize=(12, 5))
plt.semilogy(range(1, len(costs)), cost_changes, marker='o', linewidth=2)
plt.xlabel('Iteration', fontsize=12)
plt.ylabel('|Cost Change| (log scale)', fontsize=12)
plt.title('Cost Change Magnitude Over Time', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, which='both')
plt.axhline(y=0.01, color='r', linestyle='--', label='Threshold = 0.01')
plt.legend()
plt.tight_layout()
plt.show()

# Find convergence point
threshold = 0.01
converged_at = None
for i, change in enumerate(cost_changes):
    if change < threshold:
        converged_at = i + 1
        break

if converged_at:
    print(f"\n✓ Cost converged (threshold={threshold}) at iteration {converged_at}")
else:
    print(f"\n⚠ Cost did not converge below threshold {threshold}")

## 9. Saving and Loading Snapshots <a name="persistence"></a>

Persist snapshots to disk for later analysis.

In [None]:
# Create results directory
results_dir = Path("notebook_results")
results_dir.mkdir(exist_ok=True)

# Save snapshots to JSON
snapshot_path = results_dir / "cycle_snapshots.json"
recorder.to_json(snapshot_path)
print(f"✓ Saved snapshots to: {snapshot_path}")

# Load snapshots back
with open(snapshot_path, 'r') as f:
    loaded_snapshots = json.load(f)

print(f"✓ Loaded {len(loaded_snapshots)} snapshots from disk")
print(f"  First step: {loaded_snapshots[0]['step']}")
print(f"  Last step: {loaded_snapshots[-1]['step']}")

In [None]:
# Create visualizer from loaded JSON
viz_from_file = SnapshotVisualizer.from_json(snapshot_path)

# Save visualization plots
plot_path = results_dir / "argmin_trajectories.png"
combined_path = results_dir / "argmin_combined.png"

viz_from_file.plot_argmin_per_variable(
    show=False,
    savepath=str(plot_path),
    combined_savepath=str(combined_path)
)

print(f"✓ Saved plots:")
print(f"  Per-variable: {plot_path}")
print(f"  Combined: {combined_path}")

## 10. Comparative Analysis <a name="comparative-analysis"></a>

Compare different BP variants using the analyzer.

In [None]:
# Build a test problem
np.random.seed(42)
test_fg = FGBuilder.build_cycle_graph(
    num_vars=8,
    domain_size=5,
    ct_factory=CTFactory.random_int.fn,
    ct_params={"low": 0, "high": 30}
)

# Define engine configurations to compare
engine_configs = [
    ("Standard BP", BPEngine(factor_graph=test_fg)),
    ("Damping (0.9)", DampingEngine(factor_graph=test_fg, damping_factor=0.9)),
    ("Damping (0.5)", DampingEngine(factor_graph=test_fg, damping_factor=0.5)),
]

# Run and record each engine
max_iters = 30
comparison_results = {}

for name, engine in engine_configs:
    recorder = EngineSnapshotRecorder(engine)
    snapshots = recorder.record_run(max_steps=max_iters)
    comparison_results[name] = {
        'snapshots': snapshots,
        'costs': [s['cost'] for s in snapshots],
        'recorder': recorder
    }
    print(f"✓ Recorded {name}: {len(snapshots)} iterations")

print("\nComparison complete!")

In [None]:
# Plot cost comparison
plt.figure(figsize=(12, 6))

for name, result in comparison_results.items():
    plt.plot(range(len(result['costs'])), result['costs'], 
             marker='o', linewidth=2, markersize=4, label=name)

plt.xlabel('Iteration', fontsize=12)
plt.ylabel('Global Cost', fontsize=12)
plt.title('Cost Convergence Comparison', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Print final costs
print("\nFinal Costs:")
for name, result in comparison_results.items():
    final_cost = result['costs'][-1]
    initial_cost = result['costs'][0]
    reduction = 100 * (initial_cost - final_cost) / initial_cost
    print(f"  {name:20s}: {final_cost:7.2f} (reduction: {reduction:5.1f}%)")

In [None]:
# Compare argmin trajectories for one variable
target_var = "V_0"  # First variable

plt.figure(figsize=(12, 6))

for name, result in comparison_results.items():
    viz = SnapshotVisualizer.from_object(result['snapshots'])
    series = viz.argmin_series([target_var])
    plt.plot(range(len(series[target_var])), series[target_var], 
             marker='o', linewidth=2, markersize=5, label=name)

plt.xlabel('Iteration', fontsize=12)
plt.ylabel('Argmin Value', fontsize=12)
plt.title(f'Argmin Trajectory Comparison for {target_var}', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Summary

This notebook demonstrated the complete workflow for using PropFlow's analyzer module:

1. **Recording**: Capture every iteration of BP execution with `EngineSnapshotRecorder`
2. **Analysis**: Extract and analyze costs, assignments, and message statistics
3. **Visualization**: Plot argmin trajectories using `SnapshotVisualizer`
4. **Message Flow**: Dive deep into message passing dynamics
5. **Convergence**: Detect and analyze convergence patterns
6. **Persistence**: Save and load snapshots for reproducible analysis
7. **Comparison**: Compare different BP variants side-by-side

### Key Takeaways

- The analyzer is **non-invasive** - it wraps engines without modifying them
- **Complete capture** - every message, assignment, and cost is recorded
- **Flexible analysis** - snapshots are JSON-compatible for integration with other tools
- **Visual insights** - built-in visualization for quick understanding of dynamics

### Next Steps

- Experiment with different graph topologies (random, grid, tree)
- Try other engine variants (SplitEngine, CostReductionEngine)
- Analyze larger problems with snapshot sampling
- Build custom analysis tools using the snapshot data