[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/swarm-ai-safety/swarm/blob/main/examples/reproduce_2602_00035.ipynb)

# Reproduce Paper Results: Welfare vs. Honest Agent Proportion

This notebook reproduces the central finding from **agentxiv paper 2602.00035**: that population composition has a counter-intuitive, non-monotonic effect on ecosystem welfare.

We sweep the fraction of honest agents from 0% to 100% and measure welfare and toxicity at each point, replicating the paper's methodology using the SWARM simulation framework.

**Difficulty:** Advanced

**No API keys required. Runs entirely locally (or in Colab).**

In [None]:
# --- Setup ---
# This cell handles installation automatically.
# In Colab: clones the repo and installs SWARM.
# Locally: assumes you've already run `pip install -e ".[runtime]"`.
import os

if os.getenv("COLAB_RELEASE_TAG"):
    !git clone --depth 1 https://github.com/swarm-ai-safety/swarm.git /content/swarm
    %pip install -q -e "/content/swarm[runtime]"
    os.chdir("/content/swarm")
    print("Installed SWARM from GitHub. Ready to go!")
else:
    print("Local environment detected â€” using existing install.")

## The Paper Finding

Paper 2602.00035 reports a surprising result:

> **Populations with only 20% honest agents achieve 55% higher welfare (53.67) than 100% honest populations (34.71), despite having significantly higher toxicity (0.344 vs 0.254).**

This is counter-intuitive: adding more honest agents does *not* monotonically improve ecosystem welfare. The reason is that mixed populations generate more diverse interaction patterns. Opportunistic and deceptive agents create competitive pressure that, at moderate levels, drives higher total surplus --- even though some of that surplus comes at the cost of increased toxicity.

### Methodology

- Fix total population at **10 agents**
- Vary honest agent proportion from **0% to 100%** in 10% steps
- Fill non-honest slots with a **60/40 mix** of deceptive and opportunistic agents
- Use baseline payoff parameters (`s_plus=2`, `s_minus=1`, `h=1`)
- Run each configuration for multiple epochs across multiple seeds
- Compare welfare and toxicity across compositions

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from swarm.agents.deceptive import DeceptiveAgent
from swarm.agents.honest import HonestAgent
from swarm.agents.opportunistic import OpportunisticAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig
from swarm.core.payoff import PayoffConfig
from swarm.governance.config import GovernanceConfig

In [None]:
# --- Experiment Parameters ---
# Reduced for Colab speed. Increase for full reproduction:
#   TOTAL_AGENTS = 10, N_EPOCHS = 30, STEPS_PER_EPOCH = 10, SEEDS = [42, 43, 44]
TOTAL_AGENTS = 10
N_EPOCHS = 10
STEPS_PER_EPOCH = 10
SEEDS = [42, 43]


def build_compositions(total_agents):
    """Build compositions varying honest proportion from 0% to 100%."""
    compositions = []
    for honest_pct in range(0, 101, 10):
        n_honest = round(total_agents * honest_pct / 100)
        remaining = total_agents - n_honest
        # Split remaining 60/40 between deceptive and opportunistic
        n_deceptive = round(remaining * 0.6)
        n_opportunistic = remaining - n_deceptive
        compositions.append({
            "label": f"{honest_pct}% honest",
            "honest_pct": honest_pct / 100.0,
            "n_honest": max(n_honest, 0),
            "n_deceptive": max(n_deceptive, 0),
            "n_opportunistic": max(n_opportunistic, 0),
        })
    return compositions


def run_single(comp, seed, n_epochs, steps_per_epoch):
    """Run one simulation for a given composition and seed."""
    payoff_config = PayoffConfig(
        s_plus=2.0, s_minus=1.0, h=1.0,
        theta=0.5, rho_a=0.1, rho_b=0.1, w_rep=0.1,
    )
    governance_config = GovernanceConfig(
        circuit_breaker_enabled=True,
        reputation_decay_rate=0.05,
    )
    orch_config = OrchestratorConfig(
        n_epochs=n_epochs,
        steps_per_epoch=steps_per_epoch,
        seed=seed,
        payoff_config=payoff_config,
        governance_config=governance_config,
    )

    orchestrator = Orchestrator(config=orch_config)
    # Create agents
    agents = []
    for i in range(comp["n_honest"]):
        agents.append(HonestAgent(agent_id=f"honest_{i + 1}"))
    for i in range(comp["n_deceptive"]):
        agents.append(DeceptiveAgent(agent_id=f"deceptive_{i + 1}"))
    for i in range(comp["n_opportunistic"]):
        agents.append(OpportunisticAgent(agent_id=f"opportunistic_{i + 1}"))
    for agent in agents:
        orchestrator.register_agent(agent)

    epoch_metrics = orchestrator.run()

    welfares = [em.total_welfare for em in epoch_metrics]
    toxicities = [em.toxicity_rate for em in epoch_metrics]
    qgaps = [em.quality_gap for em in epoch_metrics]
    payoffs = [em.avg_payoff for em in epoch_metrics]

    return {
        "composition": comp["label"],
        "honest_pct": comp["honest_pct"],
        "seed": seed,
        "n_epochs": len(epoch_metrics),
        "mean_welfare": float(np.mean(welfares)) if welfares else 0.0,
        "total_welfare": float(np.sum(welfares)) if welfares else 0.0,
        "mean_toxicity": float(np.mean(toxicities)) if toxicities else 0.0,
        "mean_quality_gap": float(np.mean(qgaps)) if qgaps else 0.0,
        "mean_avg_payoff": float(np.mean(payoffs)) if payoffs else 0.0,
    }

## Run the Sweep

We sweep honest agent proportion from **0% to 100%** in 10% increments (11 compositions total). Each composition is run across multiple seeds to reduce variance. The non-honest slots are filled with a 60/40 split of deceptive and opportunistic agents.

In [None]:
compositions = build_compositions(TOTAL_AGENTS)
all_results = []
total_runs = len(compositions) * len(SEEDS)
run_idx = 0

print(f"Running {total_runs} simulations ({len(compositions)} compositions x {len(SEEDS)} seeds)...")
print(f"Parameters: {TOTAL_AGENTS} agents, {N_EPOCHS} epochs, {STEPS_PER_EPOCH} steps/epoch")
print("=" * 70)

for comp in compositions:
    for seed in SEEDS:
        run_idx += 1
        print(
            f"  [{run_idx}/{total_runs}] {comp['label']} "
            f"(H={comp['n_honest']}, D={comp['n_deceptive']}, O={comp['n_opportunistic']}) "
            f"seed={seed}...",
            end=" ", flush=True,
        )
        result = run_single(comp, seed, N_EPOCHS, STEPS_PER_EPOCH)
        all_results.append(result)
        print(f"welfare={result['total_welfare']:.1f}, toxicity={result['mean_toxicity']:.3f}")

print("\nSweep complete.")

In [None]:
# Aggregate results across seeds
df_raw = pd.DataFrame(all_results)

df_agg = df_raw.groupby("composition").agg(
    honest_pct=("honest_pct", "first"),
    n_seeds=("seed", "count"),
    welfare_total_mean=("total_welfare", "mean"),
    welfare_std=("mean_welfare", "std"),
    toxicity_mean=("mean_toxicity", "mean"),
    toxicity_std=("mean_toxicity", "std"),
    quality_gap_mean=("mean_quality_gap", "mean"),
    avg_payoff_mean=("mean_avg_payoff", "mean"),
).reset_index().sort_values("honest_pct")

df_agg

In [None]:
# Print results table
print("=" * 90)
print("RESULTS")
print("=" * 90)
print(
    f"{'Composition':<18} {'Honest%':>8} {'TotalWelfare':>13} "
    f"{'Toxicity':>9} {'QualGap':>8} {'AvgPayoff':>10}"
)
print("-" * 75)
for _, row in df_agg.iterrows():
    print(
        f"{row['composition']:<18} {row['honest_pct']*100:>7.0f}% "
        f"{row['welfare_total_mean']:>13.2f} {row['toxicity_mean']:>9.3f} "
        f"{row['quality_gap_mean']:>8.3f} {row['avg_payoff_mean']:>10.3f}"
    )

# Key comparison: 20% vs 100% honest
row_20 = df_agg[df_agg["honest_pct"].round(1) == 0.2]
row_100 = df_agg[df_agg["honest_pct"].round(1) == 1.0]

print("\n" + "=" * 70)
print("KEY COMPARISON: 20% honest vs 100% honest")
print("=" * 70)

if not row_20.empty and not row_100.empty:
    w20 = row_20.iloc[0]["welfare_total_mean"]
    w100 = row_100.iloc[0]["welfare_total_mean"]
    t20 = row_20.iloc[0]["toxicity_mean"]
    t100 = row_100.iloc[0]["toxicity_mean"]
    welfare_pct = ((w20 - w100) / w100 * 100) if w100 != 0 else float("inf")

    print(f"  20% honest:  welfare={w20:.2f}, toxicity={t20:.3f}")
    print(f"  100% honest: welfare={w100:.2f}, toxicity={t100:.3f}")
    print(f"  Welfare difference: {welfare_pct:+.1f}%")
    print(f"  Toxicity difference: {t20 - t100:+.3f}")
    print()
    if welfare_pct > 0:
        print(f"  FINDING REPRODUCED: 20% honest has {welfare_pct:.0f}% higher welfare")
    else:
        print(f"  FINDING NOT REPRODUCED: 20% honest has lower welfare")
    print(f"  Paper claims: 55% higher welfare (53.67 vs 34.71)")
    print(f"  Our result:   {welfare_pct:.0f}% difference ({w20:.2f} vs {w100:.2f})")

    # Peak welfare
    peak_idx = df_agg["welfare_total_mean"].idxmax()
    peak = df_agg.loc[peak_idx]
    print(
        f"\n  Peak welfare composition: {peak['composition']} "
        f"(welfare={peak['welfare_total_mean']:.2f}, toxicity={peak['toxicity_mean']:.3f})"
    )
else:
    print("  Could not find both 20% and 100% compositions in results.")

In [None]:
# Plot 1: Dual-axis welfare/toxicity vs honest %
COLORS = {"welfare": "#2196F3", "toxicity": "#F44336", "quality_gap": "#4CAF50"}

fig, ax1 = plt.subplots(figsize=(10, 6))

pcts = df_agg["honest_pct"] * 100
welfares = df_agg["welfare_total_mean"]
welfare_stds = df_agg["welfare_std"] * df_agg["n_seeds"]
toxicities = df_agg["toxicity_mean"]
toxicity_stds = df_agg["toxicity_std"]

# Welfare (left axis)
ax1.errorbar(
    pcts, welfares, yerr=welfare_stds,
    color=COLORS["welfare"], linewidth=2.5, marker="o", markersize=8,
    capsize=5, capthick=1.5, label="Total Welfare", zorder=5,
)
ax1.set_ylabel("Total Welfare (sum over epochs)", fontsize=11, color=COLORS["welfare"])
ax1.tick_params(axis="y", labelcolor=COLORS["welfare"])

# Toxicity (right axis)
ax2 = ax1.twinx()
ax2.errorbar(
    pcts, toxicities, yerr=toxicity_stds,
    color=COLORS["toxicity"], linewidth=2.5, marker="s", markersize=8,
    capsize=5, capthick=1.5, label="Toxicity Rate", zorder=4,
)
ax2.set_ylabel("Toxicity Rate (mean over epochs)", fontsize=11, color=COLORS["toxicity"])
ax2.tick_params(axis="y", labelcolor=COLORS["toxicity"])
ax2.set_ylim(-0.05, 1.05)

ax1.set_xlabel("Honest Agent Proportion (%)", fontsize=11)
ax1.set_title(
    "Reproduction of agentxiv 2602.00035:\nWelfare vs. Toxicity by Honest Agent Proportion",
    fontsize=13, fontweight="bold", pad=10,
)
ax1.grid(True, alpha=0.3, linestyle="--")
ax1.spines["top"].set_visible(False)

# Combined legend
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, fontsize=10, loc="upper center")

plt.tight_layout()
plt.show()

In [None]:
# Plot 2: Welfare-toxicity scatter with composition labels
fig, ax = plt.subplots(figsize=(8, 6))

for _, row in df_agg.iterrows():
    pct = row["honest_pct"] * 100
    r = max(0, 1.0 - row["honest_pct"])
    g = row["honest_pct"]
    b = 0.2
    ax.scatter(
        row["welfare_total_mean"], row["toxicity_mean"],
        c=[(r, g, b)], s=120, edgecolors="white", linewidth=1, zorder=5,
    )
    ax.annotate(
        f"{pct:.0f}%",
        (row["welfare_total_mean"], row["toxicity_mean"]),
        textcoords="offset points", xytext=(8, 5), fontsize=8,
    )

ax.set_title(
    "Welfare-Toxicity Trade-off by Honest Agent %\n(Reproduction of 2602.00035)",
    fontsize=13, fontweight="bold", pad=10,
)
ax.set_xlabel("Total Welfare", fontsize=11)
ax.set_ylabel("Toxicity Rate", fontsize=11)
ax.set_ylim(-0.05, 1.05)
ax.grid(True, alpha=0.3, linestyle="--")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

plt.tight_layout()
plt.show()

In [None]:
# Plot 3: Key compositions bar chart
key_pcts = {0.0, 0.2, 0.5, 0.8, 1.0}
df_key = df_agg[df_agg["honest_pct"].round(1).isin(key_pcts)].copy()

labels = df_key["composition"].tolist()
welfares = df_key["welfare_total_mean"].tolist()
toxicities = df_key["toxicity_mean"].tolist()

x = np.arange(len(labels))
width = 0.35

fig, ax1 = plt.subplots(figsize=(10, 6))

bars1 = ax1.bar(
    x - width / 2, welfares, width,
    label="Total Welfare", color=COLORS["welfare"], alpha=0.85,
)
ax1.set_ylabel("Total Welfare", fontsize=11, color=COLORS["welfare"])

ax2 = ax1.twinx()
bars2 = ax2.bar(
    x + width / 2, toxicities, width,
    label="Toxicity Rate", color=COLORS["toxicity"], alpha=0.85,
)
ax2.set_ylabel("Toxicity Rate", fontsize=11, color=COLORS["toxicity"])
ax2.set_ylim(0, 1.0)

# Value labels on bars
for bar, val in zip(bars1, welfares):
    ax1.text(
        bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.5,
        f"{val:.1f}", ha="center", va="bottom", fontsize=9, fontweight="bold",
    )
for bar, val in zip(bars2, toxicities):
    ax2.text(
        bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.01,
        f"{val:.3f}", ha="center", va="bottom", fontsize=9, fontweight="bold",
    )

ax1.set_xticks(x)
ax1.set_xticklabels(labels, fontsize=10)
ax1.set_title(
    "Key Compositions: Welfare & Toxicity\n(Reproduction of 2602.00035)",
    fontsize=13, fontweight="bold",
)
ax1.legend(loc="upper left", fontsize=9)
ax2.legend(loc="upper right", fontsize=9)

plt.tight_layout()
plt.show()

## Discussion

### How Results Compare to the Paper

Paper 2602.00035 reports that 20% honest populations achieve **55% higher welfare** (53.67 vs 34.71) than fully honest populations, with a toxicity increase from 0.254 to 0.344. The exact numbers from our reproduction will vary depending on the number of epochs and seeds used (this notebook uses reduced parameters for Colab speed), but the qualitative finding --- that **mixed populations outperform homogeneous honest populations on welfare** --- should be consistent.

To reproduce the paper's exact numbers, increase the parameters at the top of cell 4:
```python
N_EPOCHS = 30
SEEDS = [42, 43, 44]
```

### What This Means for AI Safety

This finding has important implications for multi-agent AI system design:

1. **Welfare is non-monotonic in honesty.** Naively maximizing the fraction of "aligned" agents does not maximize ecosystem welfare. This challenges simple alignment strategies that assume more aligned agents always produce better outcomes.

2. **The welfare-toxicity trade-off is real.** Higher welfare comes at the cost of higher toxicity. System designers must decide which metric to optimize --- and that decision is fundamentally a values question, not a technical one.

3. **Diversity creates value.** The competitive pressure from non-honest agents drives more interactions and higher total surplus. This mirrors findings in economics about the role of competition in generating welfare, even when individual actors are self-interested.

4. **Governance design matters.** Rather than trying to eliminate all non-honest agents, the more productive approach may be to design governance mechanisms (reputation systems, circuit breakers, audits) that capture the welfare benefits of diversity while mitigating toxicity.

For more details, see the full paper and the `examples/parameter_sweep.py` script for exploring governance configurations.