[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/swarm-ai-safety/swarm/blob/main/examples/quickstart.ipynb)

# SWARM Quickstart: Multi-Agent Safety in 5 Minutes

**SWARM** (System-Wide Assessment of Risk in Multi-agent systems) is a simulation framework for studying distributional safety in multi-agent AI ecosystems. Instead of binary safe/unsafe labels, SWARM uses **soft probabilistic labels** `p = P(v = +1)` to capture the continuous, uncertain nature of interaction quality.

**No API keys required. Runs entirely locally (or in Colab).**

## What you'll learn

| Step | What happens | Key concept |
|------|-------------|-------------|
| 1 | Run a **baseline** cooperative scenario | Soft metrics: toxicity, welfare, acceptance |
| 2 | Visualize ecosystem health | Per-agent payoffs and time-series diagnostics |
| 3 | Inject **adversaries** and watch collapse | Phase transition at ~50% adversarial fraction |

## Prerequisites

- **Python 3.10+** (Colab satisfies this automatically)
- Basic familiarity with multi-agent systems or AI safety concepts is helpful but not required

## Key terms

- **p**: Probability an interaction is beneficial, always in [0, 1]
- **Toxicity rate**: Expected harm among accepted interactions — `E[1-p | accepted]`
- **Quality gap**: Whether the system preferentially accepts low-quality interactions (adverse selection)
- **Welfare**: System-wide surplus minus costs

In [None]:
# --- Setup ---
# This cell handles installation automatically.
# In Colab: clones the repo and installs SWARM.
# Locally: assumes you've already run `pip install -e ".[runtime]"`.
import os

if os.getenv("COLAB_RELEASE_TAG"):
    !git clone --depth 1 https://github.com/swarm-ai-safety/swarm.git /content/swarm
    %pip install -q -e "/content/swarm[runtime]"
    os.chdir("/content/swarm")
    print("Installed SWARM from GitHub. Ready to go!")
else:
    print("Local environment detected — using existing install.")

In [None]:
from pathlib import Path

import matplotlib.pyplot as plt
import pandas as pd

from swarm.scenarios import load_scenario, build_orchestrator

# Locate scenario files — works from repo root (Colab) or examples/ (local)
SCENARIOS_DIR = Path("scenarios") if Path("scenarios").is_dir() else Path("../scenarios")
print("Available scenarios:", sorted(p.stem for p in SCENARIOS_DIR.glob("*.yaml")))

## 1. Run a Baseline Scenario

The **baseline** scenario has 5 agents (3 honest, 1 opportunistic, 1 deceptive) interacting over 10 epochs with no governance. This establishes a reference for what a mostly-cooperative ecosystem looks like.

**What to look for:** Toxicity should stay low (~0.2-0.3) and welfare should grow steadily. The deceptive agent may earn slightly more by exploiting trust — that's expected and is exactly the kind of emergent dynamic SWARM is designed to surface.

In [None]:
# Load and run baseline
scenario = load_scenario(SCENARIOS_DIR / "baseline.yaml")
orchestrator = build_orchestrator(scenario)
baseline_history = orchestrator.run()

# Build a metrics DataFrame
df_baseline = pd.DataFrame([
    {
        "epoch": m.epoch,
        "toxicity": m.toxicity_rate,
        "welfare": m.total_welfare,
        "acceptance_rate": m.accepted_interactions / max(m.total_interactions, 1),
        "interactions": m.total_interactions,
    }
    for m in baseline_history
])

# Per-agent payoff summary
agent_data = []
for agent in orchestrator.get_all_agents():
    state = orchestrator.state.get_agent(agent.agent_id)
    agent_data.append({
        "agent_id": state.agent_id,
        "type": state.agent_type.value,
        "payoff": state.total_payoff,
        "reputation": state.reputation,
    })
df_agents_baseline = pd.DataFrame(agent_data)

print("=== Epoch Metrics ===")
print(df_baseline.to_string(index=False))
print()
print("=== Agent Summary ===")
print(df_agents_baseline.to_string(index=False))

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(10, 7))
fig.suptitle("Baseline Scenario (5 agents, no governance)", fontsize=14)

# Toxicity over time
axes[0, 0].plot(df_baseline["epoch"], df_baseline["toxicity"], "o-", color="#d62728")
axes[0, 0].set_ylabel("Toxicity rate")
axes[0, 0].set_xlabel("Epoch")
axes[0, 0].set_title("Toxicity")
axes[0, 0].set_ylim(0, 1)

# Welfare over time
axes[0, 1].plot(df_baseline["epoch"], df_baseline["welfare"], "s-", color="#2ca02c")
axes[0, 1].set_ylabel("Total welfare")
axes[0, 1].set_xlabel("Epoch")
axes[0, 1].set_title("Welfare")

# Acceptance rate over time
axes[1, 0].plot(df_baseline["epoch"], df_baseline["acceptance_rate"], "^-", color="#1f77b4")
axes[1, 0].set_ylabel("Acceptance rate")
axes[1, 0].set_xlabel("Epoch")
axes[1, 0].set_title("Acceptance Rate")
axes[1, 0].set_ylim(0, 1.05)

# Per-agent payoff bar chart
colors = ["#2ca02c" if t == "honest" else "#ff7f0e" if t == "opportunistic" else "#d62728"
          for t in df_agents_baseline["type"]]
axes[1, 1].bar(df_agents_baseline["agent_id"], df_agents_baseline["payoff"], color=colors)
axes[1, 1].set_ylabel("Total payoff")
axes[1, 1].set_title("Per-Agent Payoff")
axes[1, 1].tick_params(axis="x", rotation=45)

plt.tight_layout()
plt.show()

## 2. Add Adversaries

Now let's run the **adversarial red-team** scenario: 8 agents (4 honest, 2 adversarial, 2 adaptive adversaries) with governance enabled (staking, circuit breakers, audits, collusion detection). The adversarial fraction is 50% — right at the critical threshold our research identified.

**What to look for:** Compare the welfare and acceptance curves to the baseline above. Even with governance enabled, the system should collapse around epoch 12-14. This demonstrates a key finding: governance *delays* collapse but cannot prevent it when the adversarial fraction exceeds the critical threshold.

In [None]:
# Load and run adversarial scenario
adv_scenario = load_scenario(SCENARIOS_DIR / "adversarial_redteam.yaml")
adv_orchestrator = build_orchestrator(adv_scenario)
adv_history = adv_orchestrator.run()

df_adv = pd.DataFrame([
    {
        "epoch": m.epoch,
        "toxicity": m.toxicity_rate,
        "welfare": m.total_welfare,
        "acceptance_rate": m.accepted_interactions / max(m.total_interactions, 1),
    }
    for m in adv_history
])

# Overlay comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
fig.suptitle("Baseline vs Adversarial Red-Team", fontsize=14)

# Welfare comparison
ax1.plot(df_baseline["epoch"], df_baseline["welfare"], "s-", label="Baseline (20% adv)", color="#2ca02c")
ax1.plot(df_adv["epoch"], df_adv["welfare"], "v-", label="Red-Team (50% adv)", color="#d62728")
ax1.set_xlabel("Epoch")
ax1.set_ylabel("Total welfare")
ax1.set_title("Welfare Collapse")
ax1.legend()
ax1.axhline(y=0, color="gray", linestyle="--", alpha=0.5)

# Acceptance rate comparison
ax2.plot(df_baseline["epoch"], df_baseline["acceptance_rate"], "^-", label="Baseline", color="#1f77b4")
ax2.plot(df_adv["epoch"], df_adv["acceptance_rate"], "v-", label="Red-Team", color="#d62728")
ax2.set_xlabel("Epoch")
ax2.set_ylabel("Acceptance rate")
ax2.set_title("Acceptance Rate Decline")
ax2.set_ylim(0, 1.05)
ax2.legend()

plt.tight_layout()
plt.show()

# Find collapse epoch (first epoch where welfare hits 0)
collapse = df_adv[df_adv["welfare"] <= 0.0]
if not collapse.empty:
    print(f"\nCollapse detected at epoch {collapse.iloc[0]['epoch']:.0f}")
    print("Governance delayed but did not prevent ecosystem failure.")
else:
    print("\nNo collapse detected -- governance held.")

## What Just Happened?

You've just observed the **three regimes** that emerge from SWARM simulations:

| Regime | Adversarial % | Acceptance | Toxicity | Outcome |
|--------|--------------|------------|----------|---------|
| **Cooperative** | 0-20% | > 0.93 | < 0.30 | Stable welfare growth |
| **Contested** | 20-37.5% | 0.42-0.94 | 0.33-0.37 | Declining but survivable |
| **Collapse** | 50% | < 0.56 | ~0.30 | Welfare hits zero by epoch 12-14 |

Key findings from 11 scenario runs:
- **Critical threshold** at 37.5-50% adversarial fraction separates survival from collapse
- **Governance tuning** (audits, staking, circuit breakers) delays collapse by 2 epochs but doesn't prevent it
- **Collusion detection** is the critical differentiator: at 37.5% adversarial, it prevents collapse entirely
- **Welfare scales super-linearly** with cooperative agents (3 agents: 1.0, 6 agents: 5.7, 10 agents: 21.3)

## Next Steps

**Try these examples** (each runs standalone, no API keys needed):

| Example | What it does | How to run |
|---------|-------------|------------|
| `illusion_delta_minimal.py` | Replay-based incoherence detection with 3 agents | `python examples/illusion_delta_minimal.py` |
| `mvp_demo.py` | Full 5-agent simulation with metrics printout | `python examples/mvp_demo.py` |
| `run_scenario.py` | Run any YAML scenario from the CLI | `python examples/run_scenario.py scenarios/baseline.yaml` |
| `parameter_sweep.py` | Sweep governance parameters and compare outcomes | `python examples/parameter_sweep.py` |
| `run_redteam.py` | Red-team evaluation across 8 attack vectors | `python examples/run_redteam.py --mode quick` |

**Go deeper:**
- Read the full paper: `docs/papers/distributional_agi_safety.md`
- Write your own scenario YAML — see `scenarios/baseline.yaml` as a template
- Explore 55 built-in scenarios: `swarm list`
- Try LLM-backed agents (requires API key): `python examples/llm_demo.py`

**Repository:** [github.com/swarm-ai-safety/swarm](https://github.com/swarm-ai-safety/swarm)