[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/swarm-ai-safety/swarm/blob/main/examples/mvp_demo.ipynb)

# SWARM MVP Demo: 5-Agent Simulation

This notebook demonstrates the core SWARM simulation loop with **5 agents** (3 honest, 1 opportunistic, 1 deceptive) interacting over 10 epochs. It validates the MVP v0 success criteria from the implementation plan.

## What you'll learn

| Step | What happens | Key concept |
|------|-------------|-------------|
| 1 | Create 5 agents of different behavioral types | Agent archetypes: honest, opportunistic, deceptive |
| 2 | Run a 10-epoch simulation with soft probabilistic labels | Toxicity, welfare, quality gap metrics |
| 3 | Visualize ecosystem health over time | Per-agent payoffs and time-series diagnostics |
| 4 | Validate MVP success criteria | Automated pass/fail checks |

**No API keys required. Runs entirely locally (or in Colab).**

## Key terms

- **p**: Probability an interaction is beneficial, always in [0, 1]
- **Toxicity rate**: Expected harm among accepted interactions -- `E[1-p | accepted]`
- **Quality gap**: Whether the system preferentially accepts low-quality interactions (adverse selection)
- **Welfare**: System-wide surplus minus costs

In [None]:
# --- Setup ---
# This cell handles installation automatically.
# In Colab: clones the repo and installs SWARM.
# Locally: assumes you've already run `pip install -e ".[runtime]"`.
import os

if os.getenv("COLAB_RELEASE_TAG"):
    !git clone --depth 1 https://github.com/swarm-ai-safety/swarm.git /content/swarm
    %pip install -q -e "/content/swarm[runtime]"
    os.chdir("/content/swarm")
    print("Installed SWARM from GitHub. Ready to go!")
else:
    print("Local environment detected \u2014 using existing install.")

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

from swarm.agents.deceptive import DeceptiveAgent
from swarm.agents.honest import HonestAgent
from swarm.agents.opportunistic import OpportunisticAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig
from swarm.core.payoff import PayoffConfig

# Configure simulation
config = OrchestratorConfig(
    n_epochs=10,
    steps_per_epoch=10,
    seed=42,
    payoff_config=PayoffConfig(
        s_plus=2.0,
        s_minus=1.0,
        h=2.0,
        theta=0.5,
        rho_a=0.0,
        rho_b=0.0,
        w_rep=1.0,
    ),
)

# Create orchestrator and register 5 agents
orchestrator = Orchestrator(config=config)

agents = [
    HonestAgent(agent_id="honest_1"),
    HonestAgent(agent_id="honest_2"),
    HonestAgent(agent_id="honest_3"),
    OpportunisticAgent(agent_id="opportunistic_1"),
    DeceptiveAgent(agent_id="deceptive_1"),
]

print("Registering agents...")
for agent in agents:
    state = orchestrator.register_agent(agent)
    print(f"  - {agent.agent_id} ({agent.agent_type.value}): reputation={state.reputation:.2f}")

print(f"\nReady: {len(agents)} agents, {config.n_epochs} epochs x {config.steps_per_epoch} steps")

## Run the Simulation

The orchestrator runs the simulation loop:

1. Each epoch, agents are scheduled to interact in pairs
2. The **ProxyComputer** converts observable signals into `p = P(v = +1)`
3. The **SoftPayoffEngine** computes payoffs using soft labels: `S_soft = p * s_plus - (1-p) * s_minus`
4. **SoftMetrics** aggregates toxicity, quality gap, and welfare per epoch
5. Agent reputations and resources update based on interaction outcomes

The deceptive agent tries to exploit the system while appearing cooperative. The opportunistic agent takes advantage when possible but does not actively deceive.

In [None]:
# Run simulation
metrics_history = orchestrator.run()

# Build a metrics DataFrame
df = pd.DataFrame([
    {
        "epoch": m.epoch,
        "interactions": m.total_interactions,
        "accepted": m.accepted_interactions,
        "toxicity": m.toxicity_rate,
        "quality_gap": m.quality_gap,
        "welfare": m.total_welfare,
        "acceptance_rate": m.accepted_interactions / max(m.total_interactions, 1),
    }
    for m in metrics_history
])

# Per-agent payoff summary
agent_data = []
for agent in agents:
    state = orchestrator.state.get_agent(agent.agent_id)
    agent_data.append({
        "agent_id": state.agent_id,
        "type": state.agent_type.value,
        "payoff": state.total_payoff,
        "reputation": state.reputation,
        "resources": state.resources,
    })
df_agents = pd.DataFrame(agent_data)

print("=== Epoch-by-Epoch Metrics ===")
print(df.to_string(index=False))
print()
print("=== Summary ===")
print(f"  Total interactions: {df['interactions'].sum()}")
print(f"  Total accepted:     {df['accepted'].sum()}")
print(f"  Avg toxicity:       {df['toxicity'].mean():.4f}")
print(f"  Avg quality gap:    {df['quality_gap'].mean():.4f}")
print(f"  Total welfare:      {df['welfare'].sum():.2f}")

## Metrics Explained

| Metric | Formula | Healthy range | What it means |
|--------|---------|---------------|---------------|
| **Toxicity** | `E[1-p \| accepted]` | < 0.30 | Low = most accepted interactions are beneficial |
| **Quality gap** | `E[p \| accepted] - E[p \| rejected]` | > 0 | Positive = system correctly accepts better interactions |
| **Welfare** | Sum of agent payoffs | Growing | Indicates overall ecosystem health |
| **Acceptance rate** | `accepted / total` | 0.5 - 1.0 | Too low = over-filtering; too high = under-filtering |

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(10, 7))
fig.suptitle("MVP Demo: 5-Agent Simulation (3 honest, 1 opportunistic, 1 deceptive)", fontsize=14)

# Toxicity over time
axes[0, 0].plot(df["epoch"], df["toxicity"], "o-", color="#d62728")
axes[0, 0].set_ylabel("Toxicity rate")
axes[0, 0].set_xlabel("Epoch")
axes[0, 0].set_title("Toxicity")
axes[0, 0].set_ylim(0, 1)

# Welfare over time
axes[0, 1].plot(df["epoch"], df["welfare"], "s-", color="#2ca02c")
axes[0, 1].set_ylabel("Total welfare")
axes[0, 1].set_xlabel("Epoch")
axes[0, 1].set_title("Welfare")

# Acceptance rate over time
axes[1, 0].plot(df["epoch"], df["acceptance_rate"], "^-", color="#1f77b4")
axes[1, 0].set_ylabel("Acceptance rate")
axes[1, 0].set_xlabel("Epoch")
axes[1, 0].set_title("Acceptance Rate")
axes[1, 0].set_ylim(0, 1.05)

# Per-agent payoff bar chart (colored by type)
type_colors = {
    "honest": "#2ca02c",
    "opportunistic": "#ff7f0e",
    "deceptive": "#d62728",
}
colors = [type_colors.get(t, "#999999") for t in df_agents["type"]]
axes[1, 1].bar(df_agents["agent_id"], df_agents["payoff"], color=colors)
axes[1, 1].set_ylabel("Total payoff")
axes[1, 1].set_title("Per-Agent Payoff")
axes[1, 1].tick_params(axis="x", rotation=45)

# Add legend for agent types
from matplotlib.patches import Patch
legend_elements = [Patch(facecolor=c, label=t) for t, c in type_colors.items()]
axes[1, 1].legend(handles=legend_elements, loc="upper right", fontsize=8)

plt.tight_layout()
plt.show()

## Agent State Analysis

After the simulation, we can inspect each agent's final state: reputation, accumulated resources, and total payoff. This reveals whether the system correctly differentiates between honest and deceptive behavior over time.

In [None]:
# Display final agent states
print("=== Final Agent States ===")
print(df_agents.to_string(index=False))
print()

# MVP v0 Success Criteria Validation
print("=" * 60)
print("MVP v0 Success Criteria Validation")
print("=" * 60)

success = True

# Criterion 1: 5 agents interact over 10+ epochs
if len(agents) >= 5 and len(metrics_history) >= 10:
    print("[PASS] 5 agents completed 10+ epochs")
else:
    print(f"[FAIL] Only {len(agents)} agents or {len(metrics_history)} epochs")
    success = False

# Criterion 2: Toxicity and conditional loss metrics computed per epoch
if all(isinstance(m.toxicity_rate, float) for m in metrics_history):
    print("[PASS] Toxicity metrics computed per epoch")
else:
    print("[FAIL] Toxicity metrics not computed")
    success = False

if all(isinstance(m.quality_gap, float) for m in metrics_history):
    print("[PASS] Quality gap metrics computed per epoch")
else:
    print("[FAIL] Quality gap metrics not computed")
    success = False

# Criterion 3: Observable coordination patterns
total_interactions = df["interactions"].sum()
if total_interactions > 0:
    print(f"[PASS] Observable interactions: {total_interactions}")
else:
    print("[WARN] No interactions observed (may need more steps)")

print()
if success:
    print("MVP v0 Success Criteria: ALL PASSED")
else:
    print("MVP v0 Success Criteria: SOME FAILED")

## Summary

This notebook demonstrated the core SWARM simulation loop:

1. **Agent creation** -- 5 agents with different behavioral archetypes (honest, opportunistic, deceptive)
2. **Simulation execution** -- 10 epochs of interactions with soft probabilistic labels
3. **Metrics collection** -- Per-epoch toxicity, welfare, quality gap, and acceptance rate
4. **Visualization** -- Time-series diagnostics and per-agent payoff comparison
5. **Validation** -- Automated MVP success criteria checks

## Next Steps

| Notebook / Example | What it does |
|---|---|
| [`quickstart.ipynb`](quickstart.ipynb) | Baseline vs adversarial scenario comparison with governance |
| `illusion_delta_minimal.py` | Replay-based incoherence detection with 3 agents |
| `run_scenario.py` | Run any YAML scenario from the CLI |
| `parameter_sweep.py` | Sweep governance parameters and compare outcomes |
| `run_redteam.py` | Red-team evaluation across 8 attack vectors |

**Repository:** [github.com/swarm-ai-safety/swarm](https://github.com/swarm-ai-safety/swarm)