[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/swarm-ai-safety/swarm/blob/main/examples/ldt_composition_study.ipynb)

# LDT Composition Study: Logical Decision Theory Agents

This notebook compares **LDT (Logical Decision Theory)** agents against **honest** agents
to determine whether LDT-style reasoning produces better welfare and toxicity outcomes
across varying population compositions.

We run two parallel sweeps:
- **LDT sweep**: Vary LDT agents from 0% to 100%, filling remaining slots with deceptive (60%) + opportunistic (40%)
- **Honest sweep**: Same structure but with honest agents as the focal type

This allows direct comparison of how each agent type performs as its proportion in the population changes.

**Difficulty:** Advanced

**No API keys required. Runs entirely locally (or in Colab).**

In [None]:
# --- Setup ---
# This cell handles installation automatically.
# In Colab: clones the repo and installs SWARM.
# Locally: assumes you've already run `pip install -e ".[runtime]"`.
import os

if os.getenv("COLAB_RELEASE_TAG"):
    !git clone --depth 1 https://github.com/swarm-ai-safety/swarm.git /content/swarm
    %pip install -q -e "/content/swarm[runtime]"
    os.chdir("/content/swarm")
    print("Installed SWARM from GitHub. Ready to go!")
else:
    print("Local environment detected -- using existing install.")

## LDT Agents and Study Methodology

**Logical Decision Theory (LDT)** agents make decisions based on the logical consequences
of their decision algorithm being a certain type. Unlike CDT (causal) or EDT (evidential)
agents, LDT agents reason about what happens in worlds where agents with their source code
make a given choice. This can lead to more cooperative behavior even without direct
communication or repeated interaction.

### Methodology

- **Population size**: Fixed at a set number of agents per run
- **Focal agent sweep**: Vary the focal agent type (LDT or honest) from 0% to 100% in 10% increments
- **Background agents**: Remaining slots are filled with a mix of deceptive (60%) and opportunistic (40%) agents
- **Two parallel sweeps**: One sweeping LDT agents, one sweeping honest agents, for direct comparison
- **Payoff parameters**: Baseline settings (s_plus=2, s_minus=1, h=1)
- **Multiple seeds**: Each configuration is run with multiple random seeds and results are averaged

This design isolates the effect of agent decision-theoretic type on ecosystem outcomes.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from collections import defaultdict

from swarm.agents.deceptive import DeceptiveAgent
from swarm.agents.honest import HonestAgent
from swarm.agents.ldt_agent import LDTAgent
from swarm.agents.opportunistic import OpportunisticAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig
from swarm.core.payoff import PayoffConfig
from swarm.governance.config import GovernanceConfig

In [None]:
# --- Experiment Parameters ---
# These are reduced for Colab speed. For a full study, increase to:
#   TOTAL_AGENTS=10, N_EPOCHS=30, STEPS_PER_EPOCH=10, SEEDS=[42, 43, 44]
TOTAL_AGENTS = 6
N_EPOCHS = 10
STEPS_PER_EPOCH = 5
SEEDS = [42, 43]


def build_compositions(total_agents):
    """Build compositions for both LDT and honest sweeps.

    Each sweep varies the focal agent type from 0% to 100% in 10% steps.
    Remaining slots are filled with deceptive (60%) + opportunistic (40%).
    """
    compositions = []
    for sweep_type in ("ldt", "honest"):
        for focal_pct in range(0, 101, 10):
            n_focal = round(total_agents * focal_pct / 100)
            remaining = total_agents - n_focal
            n_deceptive = round(remaining * 0.6)
            n_opportunistic = remaining - n_deceptive

            if sweep_type == "ldt":
                compositions.append({
                    "label": f"LDT {focal_pct}%",
                    "sweep_type": "ldt",
                    "n_ldt": max(n_focal, 0),
                    "n_honest": 0,
                    "n_deceptive": max(n_deceptive, 0),
                    "n_opportunistic": max(n_opportunistic, 0),
                })
            else:
                compositions.append({
                    "label": f"Honest {focal_pct}%",
                    "sweep_type": "honest",
                    "n_ldt": 0,
                    "n_honest": max(n_focal, 0),
                    "n_deceptive": max(n_deceptive, 0),
                    "n_opportunistic": max(n_opportunistic, 0),
                })
    return compositions


def create_agents_from_config(comp):
    """Create agent instances from a composition dict."""
    agents = []
    for i in range(comp["n_ldt"]):
        agents.append(LDTAgent(agent_id=f"ldt_{i + 1}"))
    for i in range(comp["n_honest"]):
        agents.append(HonestAgent(agent_id=f"honest_{i + 1}"))
    for i in range(comp["n_deceptive"]):
        agents.append(DeceptiveAgent(agent_id=f"deceptive_{i + 1}"))
    for i in range(comp["n_opportunistic"]):
        agents.append(OpportunisticAgent(agent_id=f"opportunistic_{i + 1}"))
    return agents


def _mean_or_zero(vals):
    return float(np.mean(vals)) if vals else 0.0


def run_single(comp, seed, n_epochs, steps_per_epoch):
    """Run one simulation for a given composition and seed."""
    payoff_config = PayoffConfig(
        s_plus=2.0, s_minus=1.0, h=1.0,
        theta=0.5, rho_a=0.1, rho_b=0.1, w_rep=0.1,
    )
    governance_config = GovernanceConfig(
        circuit_breaker_enabled=True,
        reputation_decay_rate=0.05,
    )
    orch_config = OrchestratorConfig(
        n_epochs=n_epochs,
        steps_per_epoch=steps_per_epoch,
        seed=seed,
        payoff_config=payoff_config,
        governance_config=governance_config,
    )

    orchestrator = Orchestrator(config=orch_config)
    agents = create_agents_from_config(comp)
    for agent in agents:
        orchestrator.register_agent(agent)

    epoch_metrics = orchestrator.run()

    welfares = [em.total_welfare for em in epoch_metrics]
    toxicities = [em.toxicity_rate for em in epoch_metrics]
    qgaps = [em.quality_gap for em in epoch_metrics]
    payoffs = [em.avg_payoff for em in epoch_metrics]

    # Extract per-class payoffs
    class_payoffs = {"ldt": [], "honest": [], "deceptive": [], "opportunistic": []}
    for agent in orchestrator.get_all_agents():
        state = orchestrator.state.get_agent(agent.agent_id)
        if state is None:
            continue
        if isinstance(agent, LDTAgent):
            class_payoffs["ldt"].append(state.total_payoff)
        elif isinstance(agent, HonestAgent):
            class_payoffs["honest"].append(state.total_payoff)
        elif isinstance(agent, DeceptiveAgent):
            class_payoffs["deceptive"].append(state.total_payoff)
        elif isinstance(agent, OpportunisticAgent):
            class_payoffs["opportunistic"].append(state.total_payoff)

    total = comp["n_ldt"] + comp["n_honest"] + comp["n_deceptive"] + comp["n_opportunistic"]
    if comp["sweep_type"] == "ldt":
        focal_pct = comp["n_ldt"] / total if total else 0.0
    else:
        focal_pct = comp["n_honest"] / total if total else 0.0

    return {
        "composition": comp["label"],
        "sweep_type": comp["sweep_type"],
        "focal_pct": focal_pct,
        "seed": seed,
        "n_epochs": len(epoch_metrics),
        "mean_welfare": float(np.mean(welfares)) if welfares else 0.0,
        "total_welfare": float(np.sum(welfares)) if welfares else 0.0,
        "mean_toxicity": float(np.mean(toxicities)) if toxicities else 0.0,
        "mean_quality_gap": float(np.mean(qgaps)) if qgaps else 0.0,
        "mean_avg_payoff": float(np.mean(payoffs)) if payoffs else 0.0,
        "final_welfare": float(welfares[-1]) if welfares else 0.0,
        "final_toxicity": float(toxicities[-1]) if toxicities else 0.0,
        "ldt_avg_payoff": _mean_or_zero(class_payoffs["ldt"]),
        "honest_avg_payoff": _mean_or_zero(class_payoffs["honest"]),
        "deceptive_avg_payoff": _mean_or_zero(class_payoffs["deceptive"]),
        "opportunistic_avg_payoff": _mean_or_zero(class_payoffs["opportunistic"]),
    }


def aggregate_results(results):
    """Group by composition and compute mean/std across seeds."""
    groups = defaultdict(list)
    for r in results:
        groups[r["composition"]].append(r)

    aggs = []
    for label, runs in sorted(groups.items(), key=lambda x: (x[1][0]["sweep_type"], x[1][0]["focal_pct"])):
        welfares = [r["mean_welfare"] for r in runs]
        welfare_totals = [r["total_welfare"] for r in runs]
        toxicities = [r["mean_toxicity"] for r in runs]
        qgaps = [r["mean_quality_gap"] for r in runs]
        payoffs_all = [r["mean_avg_payoff"] for r in runs]

        aggs.append({
            "label": label,
            "sweep_type": runs[0]["sweep_type"],
            "focal_pct": runs[0]["focal_pct"],
            "n_seeds": len(runs),
            "welfare_mean": float(np.mean(welfares)),
            "welfare_std": float(np.std(welfares)),
            "welfare_total_mean": float(np.mean(welfare_totals)),
            "toxicity_mean": float(np.mean(toxicities)),
            "toxicity_std": float(np.std(toxicities)),
            "quality_gap_mean": float(np.mean(qgaps)),
            "avg_payoff_mean": float(np.mean(payoffs_all)),
            "ldt_payoff_mean": float(np.mean([r["ldt_avg_payoff"] for r in runs])),
            "honest_payoff_mean": float(np.mean([r["honest_avg_payoff"] for r in runs])),
            "deceptive_payoff_mean": float(np.mean([r["deceptive_avg_payoff"] for r in runs])),
            "opportunistic_payoff_mean": float(np.mean([r["opportunistic_avg_payoff"] for r in runs])),
        })
    return aggs


print(f"Configuration: {TOTAL_AGENTS} agents, {N_EPOCHS} epochs, {STEPS_PER_EPOCH} steps/epoch, {len(SEEDS)} seeds")

## Running the Study

We now run all compositions across both sweeps (LDT and honest) with multiple seeds.
This produces 22 compositions x N seeds = total runs.

In [None]:
compositions = build_compositions(TOTAL_AGENTS)
all_results = []
total_runs = len(compositions) * len(SEEDS)
run_idx = 0

print(f"Running {total_runs} simulations ({len(compositions)} compositions x {len(SEEDS)} seeds)...")
print()

for comp in compositions:
    for seed in SEEDS:
        run_idx += 1
        print(
            f"  [{run_idx}/{total_runs}] {comp['label']} "
            f"(L={comp['n_ldt']}, H={comp['n_honest']}, "
            f"D={comp['n_deceptive']}, O={comp['n_opportunistic']}) "
            f"seed={seed}...",
            end=" ", flush=True,
        )
        result = run_single(comp, seed, N_EPOCHS, STEPS_PER_EPOCH)
        all_results.append(result)
        print(f"welfare={result['total_welfare']:.1f}, toxicity={result['mean_toxicity']:.3f}")

print(f"\nCompleted {len(all_results)} runs.")

In [None]:
# Aggregate results across seeds
aggs = aggregate_results(all_results)

# Display as a table
df_aggs = pd.DataFrame(aggs)
display_cols = [
    "label", "sweep_type", "focal_pct", "welfare_total_mean", "welfare_std",
    "toxicity_mean", "toxicity_std", "ldt_payoff_mean", "honest_payoff_mean",
    "deceptive_payoff_mean", "opportunistic_payoff_mean",
]
print("=" * 80)
print("AGGREGATED RESULTS")
print("=" * 80)
print(df_aggs[display_cols].to_string(index=False))

# Key comparisons
ldt_aggs = [a for a in aggs if a["sweep_type"] == "ldt"]
honest_aggs = [a for a in aggs if a["sweep_type"] == "honest"]

print("\n" + "=" * 80)
print("KEY COMPARISONS")
print("=" * 80)

if ldt_aggs:
    ldt_peak = max(ldt_aggs, key=lambda a: a["welfare_total_mean"])
    print(f"  LDT peak welfare: {ldt_peak['label']} "
          f"(welfare={ldt_peak['welfare_total_mean']:.2f}, toxicity={ldt_peak['toxicity_mean']:.3f})")
if honest_aggs:
    honest_peak = max(honest_aggs, key=lambda a: a["welfare_total_mean"])
    print(f"  Honest peak welfare: {honest_peak['label']} "
          f"(welfare={honest_peak['welfare_total_mean']:.2f}, toxicity={honest_peak['toxicity_mean']:.3f})")

for pct in [0.2, 0.5, 1.0]:
    ldt_match = next((a for a in ldt_aggs if round(a["focal_pct"], 1) == pct), None)
    hon_match = next((a for a in honest_aggs if round(a["focal_pct"], 1) == pct), None)
    if ldt_match and hon_match:
        w_diff = ldt_match["welfare_total_mean"] - hon_match["welfare_total_mean"]
        t_diff = ldt_match["toxicity_mean"] - hon_match["toxicity_mean"]
        print(f"  At {pct*100:.0f}%: LDT welfare={ldt_match['welfare_total_mean']:.2f} vs "
              f"Honest welfare={hon_match['welfare_total_mean']:.2f} "
              f"(diff={w_diff:+.2f}), toxicity diff={t_diff:+.3f}")

In [None]:
# --- Plot 1: LDT vs Honest Welfare Comparison ---

COLORS = {
    "ldt": "#9C27B0",
    "honest": "#2196F3",
    "deceptive": "#F44336",
    "opportunistic": "#FF9800",
}

ldt_sorted = sorted([a for a in aggs if a["sweep_type"] == "ldt"], key=lambda a: a["focal_pct"])
hon_sorted = sorted([a for a in aggs if a["sweep_type"] == "honest"], key=lambda a: a["focal_pct"])

fig, ax = plt.subplots(figsize=(10, 6))

# LDT sweep
ldt_pcts = [a["focal_pct"] * 100 for a in ldt_sorted]
ldt_welfares = [a["welfare_total_mean"] for a in ldt_sorted]
ldt_stds = [a["welfare_std"] * a["n_seeds"] for a in ldt_sorted]
ax.errorbar(
    ldt_pcts, ldt_welfares, yerr=ldt_stds,
    color=COLORS["ldt"], linewidth=2.5, marker="D", markersize=8,
    capsize=5, capthick=1.5, label="LDT agents", zorder=5,
)

# Honest sweep
hon_pcts = [a["focal_pct"] * 100 for a in hon_sorted]
hon_welfares = [a["welfare_total_mean"] for a in hon_sorted]
hon_stds = [a["welfare_std"] * a["n_seeds"] for a in hon_sorted]
ax.errorbar(
    hon_pcts, hon_welfares, yerr=hon_stds,
    color=COLORS["honest"], linewidth=2.5, marker="o", markersize=8,
    capsize=5, capthick=1.5, label="Honest agents", zorder=4,
)

ax.set_title("LDT vs Honest: Total Welfare by Focal Agent Proportion", fontsize=13, fontweight="bold", pad=10)
ax.set_xlabel("Focal Agent Proportion (%)", fontsize=11)
ax.set_ylabel("Total Welfare (sum over epochs)", fontsize=11)
ax.grid(True, alpha=0.3, linestyle="--")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.legend(fontsize=10, loc="best")
plt.tight_layout()
plt.show()

In [None]:
# --- Plot 2: LDT vs Honest Toxicity Comparison ---

fig, ax = plt.subplots(figsize=(10, 6))

# LDT sweep
ldt_tox = [a["toxicity_mean"] for a in ldt_sorted]
ldt_tox_stds = [a["toxicity_std"] for a in ldt_sorted]
ax.errorbar(
    ldt_pcts, ldt_tox, yerr=ldt_tox_stds,
    color=COLORS["ldt"], linewidth=2.5, marker="D", markersize=8,
    capsize=5, capthick=1.5, label="LDT agents", zorder=5,
)

# Honest sweep
hon_tox = [a["toxicity_mean"] for a in hon_sorted]
hon_tox_stds = [a["toxicity_std"] for a in hon_sorted]
ax.errorbar(
    hon_pcts, hon_tox, yerr=hon_tox_stds,
    color=COLORS["honest"], linewidth=2.5, marker="o", markersize=8,
    capsize=5, capthick=1.5, label="Honest agents", zorder=4,
)

ax.set_title("LDT vs Honest: Toxicity Rate by Focal Agent Proportion", fontsize=13, fontweight="bold", pad=10)
ax.set_xlabel("Focal Agent Proportion (%)", fontsize=11)
ax.set_ylabel("Toxicity Rate (mean over epochs)", fontsize=11)
ax.set_ylim(-0.05, 1.05)
ax.grid(True, alpha=0.3, linestyle="--")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.legend(fontsize=10, loc="best")
plt.tight_layout()
plt.show()

In [None]:
# --- Plot 3: Welfare-Toxicity Scatter ---

fig, ax = plt.subplots(figsize=(10, 7))

for a in aggs:
    pct = a["focal_pct"] * 100
    if a["sweep_type"] == "ldt":
        color = COLORS["ldt"]
        marker = "D"
    else:
        color = COLORS["honest"]
        marker = "o"

    ax.scatter(
        a["toxicity_mean"], a["welfare_total_mean"],
        c=color, s=120, marker=marker, alpha=0.85,
        edgecolors="white", linewidth=1, zorder=5,
    )
    ax.annotate(
        f"{pct:.0f}%",
        (a["toxicity_mean"], a["welfare_total_mean"]),
        textcoords="offset points", xytext=(8, 5), fontsize=8,
    )

# Legend entries
ax.scatter([], [], c=COLORS["ldt"], s=100, marker="D", label="LDT sweep")
ax.scatter([], [], c=COLORS["honest"], s=100, marker="o", label="Honest sweep")

ax.set_title("Welfare-Toxicity Trade-off: LDT vs Honest Compositions", fontsize=13, fontweight="bold", pad=10)
ax.set_xlabel("Toxicity Rate", fontsize=11)
ax.set_ylabel("Total Welfare", fontsize=11)
ax.grid(True, alpha=0.3, linestyle="--")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.legend(fontsize=10, loc="best")
plt.tight_layout()
plt.show()

In [None]:
# --- Plot 4: Per-Class Payoff Breakdown at Key Compositions ---

key_pcts = [0.0, 0.2, 0.5, 0.8, 1.0]

ldt_selected = [a for a in ldt_sorted if round(a["focal_pct"], 1) in key_pcts]
hon_selected = [a for a in hon_sorted if round(a["focal_pct"], 1) in key_pcts]

fig, axes = plt.subplots(1, 2, figsize=(16, 7), sharey=True)

for ax, selected, title_prefix in [
    (axes[0], ldt_selected, "LDT Sweep"),
    (axes[1], hon_selected, "Honest Sweep"),
]:
    if not selected:
        continue
    labels = [a["label"] for a in selected]
    x = np.arange(len(labels))
    width = 0.18

    bars_data = [
        ("LDT", [a["ldt_payoff_mean"] for a in selected], COLORS["ldt"]),
        ("Honest", [a["honest_payoff_mean"] for a in selected], COLORS["honest"]),
        ("Deceptive", [a["deceptive_payoff_mean"] for a in selected], COLORS["deceptive"]),
        ("Opportunistic", [a["opportunistic_payoff_mean"] for a in selected], COLORS["opportunistic"]),
    ]

    for idx, (name, vals, color) in enumerate(bars_data):
        offset = (idx - 1.5) * width
        bars = ax.bar(x + offset, vals, width, label=name, color=color, alpha=0.85)
        for bar, val in zip(bars, vals):
            if val != 0.0:
                ax.text(
                    bar.get_x() + bar.get_width() / 2,
                    bar.get_height() + 0.2,
                    f"{val:.1f}",
                    ha="center", va="bottom", fontsize=7, fontweight="bold",
                )

    ax.set_xticks(x)
    ax.set_xticklabels(labels, fontsize=9, rotation=15, ha="right")
    ax.set_title(f"{title_prefix}: Per-Class Payoffs", fontsize=12, fontweight="bold")
    if ax == axes[0]:
        ax.set_ylabel("Average Total Payoff", fontsize=11)
    ax.legend(fontsize=8, loc="best")
    ax.grid(True, alpha=0.3, linestyle="--", axis="y")
    ax.spines["top"].set_visible(False)
    ax.spines["right"].set_visible(False)

fig.suptitle(
    "Per-Class Payoff Breakdown at Key Compositions",
    fontsize=14, fontweight="bold", y=1.02,
)
plt.tight_layout()
plt.show()

## Key Findings and Next Steps

### What to look for in the results

- **Welfare curves**: Does the LDT sweep produce higher total welfare than the honest sweep at equivalent focal agent proportions? LDT agents reason about logical consequences of their decision algorithm, which may lead to more cooperative equilibria.
- **Toxicity curves**: Does LDT reasoning reduce ecosystem toxicity more effectively than honest behavior?
- **Welfare-toxicity trade-off**: Which sweep achieves better Pareto-optimal points (high welfare, low toxicity)?
- **Per-class payoffs**: Do LDT agents earn more than honest agents? Do they also reduce deceptive/opportunistic agent payoffs (indicating better resistance to exploitation)?
- **0% sanity check**: At 0% focal agents, both sweeps should produce identical results (pure deceptive + opportunistic population).

### Comparisons

| Metric | LDT Advantage? | Why |
|--------|---------------|-----|
| Welfare | Check curves | LDT may coordinate better via logical reasoning |
| Toxicity | Check curves | LDT may avoid harmful interactions more reliably |
| Exploitation resistance | Check payoff breakdown | LDT may be harder for deceptive agents to exploit |

### Next steps

- **Increase parameters**: For publication-quality results, use `TOTAL_AGENTS=10`, `N_EPOCHS=30`, `STEPS_PER_EPOCH=10`, `SEEDS=[42, 43, 44]`
- **Run the full script**: `python examples/ldt_composition_study.py` for full-scale results with CSV export and saved plots
- **Try other agent types**: Swap LDT for other decision-theoretic types to compare CDT, EDT, etc.
- **Add governance**: Vary governance parameters (staking, audits) to see how LDT agents interact with institutional mechanisms

**Repository:** [github.com/swarm-ai-safety/swarm](https://github.com/swarm-ai-safety/swarm)