[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/swarm-ai-safety/swarm/blob/main/examples/governance_mvp_sweep.ipynb)

# Governance MVP Sweep: 12 Mechanism Comparison

This notebook runs a **12-configuration governance sweep**, isolating each governance mechanism against a shared baseline to measure its effect on toxicity, welfare, quality gap, and per-agent-type payoffs.

Each run enables a single governance lever (or, in the final run, a layered combination) so you can directly compare how each mechanism shapes ecosystem outcomes.

**No API keys required. Runs entirely locally (or in Colab).**

| | |
|---|---|
| **Difficulty** | Intermediate |
| **Runtime** | ~2-5 minutes (Colab) |
| **Prerequisites** | Familiarity with the quickstart notebook |

In [None]:
# --- Setup ---
# This cell handles installation automatically.
# In Colab: clones the repo and installs SWARM.
# Locally: assumes you've already run `pip install -e ".[runtime]"`.
import os

if os.getenv("COLAB_RELEASE_TAG"):
    !git clone --depth 1 https://github.com/swarm-ai-safety/swarm.git /content/swarm
    %pip install -q -e "/content/swarm[runtime]"
    os.chdir("/content/swarm")
    print("Installed SWARM from GitHub. Ready to go!")
else:
    print("Local environment detected \u2014 using existing install.")

## The 12 Governance Mechanisms

| # | Mechanism | What it does |
|---|---|---|
| 1 | **Baseline (none)** | No governance â€” the control group |
| 2 | **Reputation decay** | Reputations decay 10% per epoch, forcing agents to continuously earn trust |
| 3 | **Vote normalization** | Caps vote influence per agent to prevent concentration of power |
| 4 | **Bandwidth caps** | Limits interactions per step, throttling high-volume exploiters |
| 5 | **Transparency ledger** | Public visibility of interaction quality with bonus for good actors |
| 6 | **Random audits** | Probabilistic quality checks with penalties for low-quality interactions |
| 7 | **Circuit breaker** | Freezes agents that exceed toxicity thresholds |
| 8 | **Transaction tax** | Per-interaction tax on surplus, redistributed to ecosystem |
| 9 | **Staking / bonding** | Minimum stake to participate; stake slashed for bad behavior |
| 10 | **Moderator agent** | Continuous oversight with moderate penalties for flagged interactions |
| 11 | **Collusion detection** | Detects and penalizes coordinated adversarial pairs |
| 12 | **Combined defense-in-depth** | Multiple mechanisms layered together |

In [None]:
from pathlib import Path
import sys

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

from swarm.scenarios import load_scenario

# Add examples/ to path so we can import the sweep runner
examples_dir = Path("examples") if Path("examples").is_dir() else Path("../examples")
sys.path.insert(0, str(examples_dir))
from governance_mvp_sweep import run_governance_sweep, print_summary

# Locate scenario files
SCENARIOS_DIR = Path("scenarios") if Path("scenarios").is_dir() else Path("../scenarios")
print("Scenarios directory:", SCENARIOS_DIR.resolve())

In [None]:
# Load base scenario and run the 12-configuration sweep.
# Using n_epochs=5 for faster execution in Colab; increase for more stable results.
base_scenario = load_scenario(SCENARIOS_DIR / "baseline.yaml")

results = run_governance_sweep(
    base_scenario,
    runs_per_config=1,
    seed_base=42,
    n_epochs=5,
)
print(f"\nCompleted {len(results)} runs.")

In [None]:
# Print the formatted summary table
print_summary(results)

In [None]:
# Convert results to a DataFrame for plotting
rows = []
for r in results:
    sr = r.sweep_result
    rows.append({
        "mechanism": r.label,
        "welfare": sr.total_welfare,
        "toxicity": sr.avg_toxicity,
        "quality_gap": sr.avg_quality_gap,
        "n_frozen": sr.n_frozen,
        "honest_payoff": sr.honest_avg_payoff,
        "opportunistic_payoff": sr.opportunistic_avg_payoff,
        "deceptive_payoff": sr.deceptive_avg_payoff,
        "adversarial_payoff": sr.adversarial_avg_payoff,
        "avg_reputation": sr.avg_reputation,
        "acceptance_rate": sr.accepted_interactions / max(sr.total_interactions, 1),
    })

df = pd.DataFrame(rows)
df

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle("Governance MVP Sweep: 12 Mechanism Comparison", fontsize=15, fontweight="bold")

# --- 1. Welfare by mechanism (horizontal bar) ---
ax = axes[0, 0]
colors = ["#d62728" if m == "baseline" else "#2ca02c" if m == "combined_defense_in_depth" else "#1f77b4"
          for m in df["mechanism"]]
ax.barh(df["mechanism"], df["welfare"], color=colors)
ax.set_xlabel("Total Welfare")
ax.set_title("Welfare by Mechanism")
ax.invert_yaxis()
ax.axvline(x=df.loc[df["mechanism"] == "baseline", "welfare"].values[0],
           color="#d62728", linestyle="--", alpha=0.6, label="Baseline")
ax.legend(fontsize=9)

# --- 2. Toxicity comparison ---
ax = axes[0, 1]
ax.barh(df["mechanism"], df["toxicity"], color=["#d62728" if t > df.loc[0, "toxicity"] else "#2ca02c"
        for t in df["toxicity"]])
ax.set_xlabel("Average Toxicity Rate")
ax.set_title("Toxicity by Mechanism")
ax.invert_yaxis()
ax.axvline(x=df.loc[df["mechanism"] == "baseline", "toxicity"].values[0],
           color="#d62728", linestyle="--", alpha=0.6, label="Baseline")
ax.legend(fontsize=9)

# --- 3. Honest vs Adversarial payoff ratio ---
ax = axes[1, 0]
# Combine deceptive + adversarial as "adversarial" payoff for comparison
adv_payoff = df[["deceptive_payoff", "adversarial_payoff"]].max(axis=1)
honest_payoff = df["honest_payoff"]
# Avoid division by zero
ratio = honest_payoff / adv_payoff.replace(0, np.nan)
ratio = ratio.fillna(0)
bar_colors = ["#2ca02c" if r > 1 else "#d62728" for r in ratio]
ax.barh(df["mechanism"], ratio, color=bar_colors)
ax.set_xlabel("Honest / Adversarial Payoff Ratio")
ax.set_title("Honest vs Adversarial Advantage")
ax.axvline(x=1.0, color="gray", linestyle="--", alpha=0.6, label="Parity")
ax.invert_yaxis()
ax.legend(fontsize=9)

# --- 4. Per-type payoff grouped bar ---
ax = axes[1, 1]
x = np.arange(len(df))
width = 0.2
ax.barh(x - 1.5*width, df["honest_payoff"], width, label="Honest", color="#2ca02c")
ax.barh(x - 0.5*width, df["opportunistic_payoff"], width, label="Opportunistic", color="#ff7f0e")
ax.barh(x + 0.5*width, df["deceptive_payoff"], width, label="Deceptive", color="#d62728")
ax.barh(x + 1.5*width, df["adversarial_payoff"], width, label="Adversarial", color="#9467bd")
ax.set_yticks(x)
ax.set_yticklabels(df["mechanism"])
ax.set_xlabel("Average Payoff")
ax.set_title("Per-Type Payoff by Mechanism")
ax.invert_yaxis()
ax.legend(fontsize=8, loc="lower right")

plt.tight_layout()
plt.show()

## Key Findings

**What to look for in the plots above:**

- **Welfare (top-left):** Which mechanisms increase total ecosystem welfare above the baseline? The combined defense-in-depth (green) typically outperforms any single mechanism.
- **Toxicity (top-right):** Lower is better. Mechanisms like circuit breakers and random audits directly target toxic interactions. Green bars indicate improvement over the baseline.
- **Honest vs Adversarial advantage (bottom-left):** A ratio above 1.0 (green) means honest agents earn more than adversarial ones -- the system rewards good behavior. Ratios below 1.0 (red) indicate the mechanism fails to deter exploitation.
- **Per-type payoffs (bottom-right):** The ideal governance mechanism increases honest payoffs while decreasing deceptive/adversarial payoffs.

### General patterns

1. **No single mechanism is sufficient.** Each lever addresses a different attack vector -- reputation gaming, bandwidth flooding, collusion, etc.
2. **Defense-in-depth works best.** Combining multiple mechanisms provides broader coverage and makes it harder for adversaries to find a single exploit path.
3. **Some mechanisms have trade-offs.** Transaction taxes reduce welfare for everyone (including honest agents). Circuit breakers can over-freeze in edge cases. Tuning matters.
4. **Collusion detection is a critical differentiator.** In scenarios with coordinated adversaries, collusion detection is often the mechanism that prevents ecosystem collapse.

## Next Steps

- **Increase `runs_per_config`** to 3-5 for more statistically robust results (averages over multiple seeds).
- **Increase `n_epochs`** to 10-20 for longer-horizon dynamics (some mechanisms only show effects after several epochs).
- **Try different base scenarios** -- run the sweep against `adversarial_redteam.yaml` to test mechanisms under stress.
- **Export results to CSV** for further analysis:

```python
from governance_mvp_sweep import export_csv
export_csv(results, Path("governance_sweep_results.csv"))
```

- **Run the full sweep from the CLI** with more seeds:

```bash
python examples/governance_mvp_sweep.py --runs-per-config 5 --epochs 10
```

**Repository:** [github.com/swarm-ai-safety/swarm](https://github.com/swarm-ai-safety/swarm)