Releases: swarm-ai-research/swarm
v1.9.0
What's Changed
- ablation: τ* sweep — binary's 'blindness' is mostly threshold placement by @rsavitt in #460
- ablation: threshold-drift — once-calibrated alarms fail silently, AUROC can't see it by @rsavitt in #461
- blog: add Limitations section — what the τ*/drift ablations revised by @rsavitt in #462
- blog: add adaptive-adversary finding to Limitations (four ablations) by @rsavitt in #464
- ablation: adaptive adversary — soft constrains the mean, thresholds can be shaped around by @rsavitt in #463
- fix(api): deterministic 404 in compare endpoint (fixes flaky test_compare_nonexistent_run_404) by @rsavitt in #465
- test(api): isolate default API DB to temp for the session (close get_store real-DB hazard) by @rsavitt in #466
- Add resource negotiation game handler with multi-round bargaining by @rsavitt in #415
- feat(bridges): add Aeon agent-first ledger bridge by @rsavitt in #467
- Release v1.9.0: finalize CHANGELOG, bump version by @rsavitt in #469
- fix(release): install full test extras so PyPI publish isn't skipped by @rsavitt in #470
- feat(agentgit): capability enforcement from delegation + OS-level isolation (7ge5) by @rsavitt in #468
- fix(packaging): drop direct git dep so PyPI publish succeeds by @rsavitt in #473
- feat(agentgit): enriched, tamper-evident provenance block (8ll9) by @rsavitt in #474
- fix(release): run release tests in parallel with a timeout by @rsavitt in #475
Full Changelog: v1.8.0...v1.9.0
v1.8.0: Soft-vs-binary detection, platform bridges, governance studies
v1.8.0 — Soft-vs-binary detection, platform bridges, governance studies
489 commits since v1.7.0. Highlights:
Soft-vs-binary detection framework (swarm/detection/)
Turns the self-optimizing-agent vignette into a real experiment: every soft metric paired with its thresholded binary twin, scored as a classifier. AUROC / AUPRC / partial-AUROC, time-to-detection at fixed FPR, market-level adverse selection, calibration, and paired significance testing. Adds a 2D sensitivity-grid runner (run_detection_sensitivity_2d.py --preset heterogeneous) and a heterogeneous "informative" regime that avoids the AUROC=1.0 generator ceiling, plus a companion blog post.
External-platform bridges
MiroShark (social-cascade sim + SoftMetrics judging), LangChain, AutoGPT, CrewAI, Mesa ABM, RAG/LEANN, Hyperspace DAG domain, LabOS Toolmaker→Critic.
Governance & misalignment studies
Adaptive governance controller, governance parameter/sensitivity sweeps, misalignment module + sweeps, Tierra artificial-life scenario + hardening, evolutionary game handler, capability–safety Pareto frontiers, causal-credit propagation, and the triangle (misalignment × causal credit × toxicity) study. Plus escalation-sandbox LLM studies (temperature, prompt framing, model size, cooperation window).
New agent types & mechanisms
ThresholdDancer adversary, behavioral agent types, hyperagent self-modification, dynamic toxicity feedback, artifact registry + cascade-risk governance, PerformanceTracker, net-social-welfare metric.
On-chain
SwarmGym safety auditor CLI + SafetyAttestation contract (Base) + web3 client.
Other
Orchestrator pipeline/middleware refactor (god-object → middleware pipeline + handler factory + scheduler); numerous case-study blog posts.
See CHANGELOG.md for the full itemized list.
Quick Start
python -m pip install -e ".[dev,runtime]"
python -m swarm run scenarios/baseline.yaml --seed 42 --epochs 10 --steps 10v1.7.0: Contract screening, viz game, llama.cpp, 164 commits
Highlights
- Contract screening system for separating equilibrium analysis with multi-seed sweep, collusion/sybil detection, and red-team blog posts (#234)
- Interactive isometric visualization game — browser-based SWARM simulation with Gemini Imagen 4 sprites, compare mode, sweep, leaderboard, and governance intervention controls (#182, #212)
- llama.cpp local inference provider with server setup, health checks, and SSRF hardening (#232)
- LangGraph governed handoff study — 4-agent Claude swarm, 32-config sweep
- Memori semantic memory middleware for LLM agents (#217)
- Loop detector governance lever with graduated enforcement (#198)
- Agent API Phase 1–3: scoped permissions, trace IDs, approval workflows
- SQLite persistence for simulations, governance, and scenarios (lazy-init to fix CI xdist contention)
- SciAgentGym bridge restored — tool substrate integration for scientific workflow agents (9 modules, 44 tests)
- Multiple SSRF/security fixes (#223, #225, #230, #236, #238, #239, #242)
Added
- SciAgentGym bridge restored — tool substrate integration for scientific workflow agents with environment management, workspace isolation, toolkit, governance hooks, and provider abstraction (reverts removal from #209)
- Contract screening system for separating equilibrium analysis with lock-in semantics, welfare metric, multi-seed sweep (10 seeds), collusion detection, sybil detection, and plot script (#234)
- LangGraph governed handoff study with 4-agent Claude swarm, 32-config sweep (seed 42), and sweep overview plot
- Hodoscope trajectory analysis bridge for agent trace inspection
- SQLite persistence for simulations, governance state, and scenarios with lazy-init singletons
- SoftMetrics wired into Web API
/api/v1/metricsendpoint - llama.cpp local inference provider with server setup script, health checks, seed validation, and SSRF/path-traversal hardening (#232)
- Interactive isometric visualization game (
viz/): Next.js browser-based SWARM simulation with client-side engine, Gemini Imagen 4 sprite assets, compare mode, parameter sweep, leaderboard, governance intervention controls, preset scenarios, narrative annotations, and data export (#182, #212) - Memori semantic memory middleware for LLM agents with persistent fact recall, SQLite-backed storage, and OpenRouter scenario variant (#217)
- Loop detector governance lever with graduated enforcement (#198)
- Agent API Phase 1–3: scoped permissions, trace IDs, structured errors, PATCH endpoints, filtering, validation, agent approval workflow
- SciAgentBench harness with topology matrix support (#200)
- Evaluation metrics suite for success rate, efficiency, and detection (#201)
- SciForge-style trace-to-task synthesis with replay verification (#203)
- Parameter validation and clamping diagnostics for proxy computation (#176)
- MetricsAggregator wired into CLI and example export (#212)
- Reproducibility documentation with one-command run workflow (#204)
- Integration tests for runtime environment lifecycle with leak detection (#197)
- EPIC tracking infrastructure for bridge integrations (#194)
- Collaborative chemistry under budget and audits scenario (#202)
- E2E integration tests for Web API simulation lifecycle
- Blog posts: Qwen3-30B SWARM Economy v0.2, contract screening separating equilibrium, multi-seed results, red-team findings
- Slash commands:
/build_game,/obsidian,/sync_artifacts,/security-review,/audit_docs,/check_nav,/bump_version - Streamlit Cloud deployment and HF Spaces sandbox link
- Social preview image (1280x640)
Changed
- README audit: Updated all counts to match codebase (4603 tests, 78 scenarios, 29 agent modules, 27 governance modules, 95 bridge files)
- LLM provider list expanded to all 9 supported providers
- Consolidated slash commands: merged related commands into
/ship,/merge_session,/sync,/fix_pr,/analyze_experiment - Moved pytest from pre-commit to pre-push hook (#177)
- Removed
abs()fromProxyWeights.normalize()(#178) - Updated crewai
>=0.80.0,<2.0(#221), bumped action-download-artifact to 15 (#220) - Pinned langgraph and langchain-core to exact versions
Fixed
- SQLite lock contention in CI: Lazy-init store singletons to prevent
database is lockedunder pytest-xdist - SSRF hardening: 4 separate fixes (#223, #225, #230, #236, #238, #242)
- Information exposure in AWM adapter (#239)
- 7 security vulnerabilities in contract screening
- mypy
method-assignerror in simulations router - SkillRL refinement governance bypass (#214)
- 77 Ruff linting errors (#218), mypy errors across multiple modules
- Flaky test stabilized with deterministic RNG seeds
- Static asset paths for viz game deployment
- 8 missing blog posts in mkdocs nav
Full Changelog: v1.6.0...v1.7.0
166 commits
v1.6.0: Artifacts migration, 6 new bridges, visual upgrade
Highlights
- Agent sandbox with retry/failover, CrewAI adapter, PettingZoo/AWM/AI-Scientist bridges
- Recursive subagent spawning, self-modification governance, Team-of-Rivals review
- 12 visual analysis modules with dark/light theme system
- Artifacts repo migration (~5 GB removed from main → swarm-artifacts)
- 12 new slash commands, research integrity auditor agent
- Multiple critical fixes: unseeded RNG, EventLog.clear(), security hardening
Added
- Agent sandbox with exponential backoff retry, async failover, virtual filesystem, and checkpoint isolation (#152, #157)
- CrewAI adapter for integrating SWARM agent policies into CrewAI workflows (#167)
- PettingZoo bridge for multi-agent RL environment interop
- AWM (Agent World Model) bridge — database-backed task environment with MCP server lifecycle (Phase 1 + 2)
- AI-Scientist bridge for autonomous research pipeline integration
- LangGraph Swarm bridge with governance-aware agent orchestration (#151)
- Concordia entity agent with entity sweep, run logger, and governance report
- Gather-Trade-Build domain with bilevel tax policy and adversarial agents (#164)
- Self-modification governance lever — Two-Gate policy for agent self-edit control (#165)
- Recursive subagent spawning infrastructure with spawn metrics, scenario loader, and red-team evaluation
- Team-of-Rivals adversarial review pipeline with Lean proof modules
- Visual upgrade: 12 analysis modules with dark/light theme system, KPI cards, gradient fills, and multi-scenario dashboard (#163)
- Agent API with runs, posts, persistence, and security hardening (#156)
- Slash commands:
/rename_symbol,/session_guard,/audit_fix,/fix_commit,/load_keys,/render_promo,/council_review,/scrub_id,/deploy_blog,/cherry_pick_pr,/post_skillevolve,/refine_study - Research papers: AI Economist GTB multi-seed, deeper acausality, collusion tax effect
- Blog posts: Self-optimizer distributional safety, Claude Code subagents, AI Economist GTB, SkillRL dynamics
Changed
- Artifacts repo migration: Moved
runs/,lean/,promo/,research/,docs/papers/to swarm-artifacts — reduces clone size by ~5 GB - Lean toolchain upgraded to v4.28.0; all
sorryeliminated from proofs - EventBus initialization simplified across all handlers
swarm.analysislazy-loads matplotlib so it works without display dependencies
Fixed
- Critical: Unseeded RNG and destructive
EventLog.clear() - 18 security audit findings in agent sandbox
- Circuit breaker, cost tracking, Holm-Bonferroni correction (#158)
- Governed swarm: cycle threshold, composite redirect, handoff counter (#159)
- GasTown bridge: branch fallback, CI-fail grep pattern (#160)
- 5 flaky tests stabilized with seeds and constrained inputs
- 5 mypy errors and lint issues
Full Changelog: v1.5.0...v1.6.0
108 commits from 8 contributors
v1.5.0: GasTown governance cost study
New
- GasTown governance cost study: 42-run study (7 compositions x 2 regimes x 3 seeds, 1,260 total epochs) revealing a governance cost paradox — safety levers reduce toxicity at every adversarial level (mean reduction 0.071) but impose welfare costs that exceed the safety benefit at all tested proportions
- Research paper: "The Cost of Safety: Governance Overhead vs. Toxicity Reduction in GasTown Multi-Agent Workspaces" with 5 figures (toxicity, welfare, payoff breakdown, adverse selection, governance protection)
- Pre-commit private infra scan: Blocks accidental commit of Prime Intellect dashboard URLs and run IDs in public-facing files
Improvements
- IMPLEMENTATION_PLAN.md updated to reflect current stats (2,922 tests, 55 scenarios, 12 domain handlers, 22 agent modules, CSM and Council sections added)
Key Finding
At 0% adversarial, governance costs 216 welfare units (-57.6%) for only 0.066 toxicity reduction. The cost narrows as adversarial pressure increases, converging at 86% rogue. This suggests governance levers are most cost-effective when targeted rather than applied uniformly.
Quick Start
pip install swarm-safety
python -m swarm run scenarios/baseline.yaml --seed 42 --epochs 10 --steps 10v1.4.0: Handler extraction, decision theory studies, event bus
New
- Handler extraction: 8 core actions extracted from Orchestrator into FeedHandler (POST/REPLY/VOTE), CoreInteractionHandler (PROPOSE/ACCEPT/REJECT), and TaskHandler (CLAIM/SUBMIT) —
_handle_core_actionreduced from 130 lines to 5 - Decision theory studies: Full studies comparing TDT vs FDT vs UDT at population scales up to 21 agents, including UDT precommitment advantage analysis
- Prime Intellect bridge:
external_run_idcolumn inscenario_runsfor cross-platform run tracking - Event bus: TypedDict schemas for event payloads and metadata, generalizing the WorktreeEvent pattern to the core framework
- GasTown bridge: Branch-based observation support for multi-branch governance
- CHANGELOG auto-update:
/releasecommand now automatically converts[Unreleased]to versioned entry
Improvements
SoftInteraction.to_dict()→model_dump(mode='json')andfrom_dict()→model_validate()(DRY)- Reputation delta formula
(p - 0.5) - c_adocumented with full derivation in InteractionFinalizer - Comprehensive CHANGELOG covering all releases from v0.1.0 through v1.3.1
Fixes
- 87 pre-existing mypy errors across tests/ and scripts/
- CAPTCHA solver dash deobfuscation and multiply detection
- Submission author normalization to SWARM Research Collective
Stats
- 274 files changed, 36,426 insertions, 1,736 deletions since v1.3.1
- 2,922 tests passing
Quick Start
pip install swarm-safety
python -m swarm run scenarios/baseline.yaml --seed 42 --epochs 10 --steps 10v1.3.1
What's Changed
- Codex-generated pull request by @rsavitt in #111
- docs: Add security & integration review for
abhi-arya1/wtby @rsavitt in #105 - Potential fix for code scanning alert no. 14: Clear-text logging of sensitive information by @rsavitt in #112
- Claude/swarm csm benchmark y f0 do by @rsavitt in #115
- Fix MCP config: use portable uvx path instead of hardcoded user path by @rsavitt in #116
- Add initial SWARM ↔ Ralph bridge with JSONL event ingestion by @rsavitt in #109
- Claude/swarm evolving skills f bcn k by @rsavitt in #114
- Implement Logical Decision Theory (LDT) agent with updateless cooperation by @rsavitt in #117
- Add LDT vs honest agent composition study by @rsavitt in #118
- Potential fix for code scanning alert no. 16: Clear-text logging of sensitive information by @rsavitt in #119
- Prepare package for PyPI publishing by @rsavitt in #120
Full Changelog: v1.3.0...v1.3.1
v1.2.0: Paper Completion, Smarter Pre-Commit Hook
New
- Paper: Related Work section — positions SWARM against market microstructure, multi-agent safety, and mechanism design literature
- Paper: Conclusion section — summarizes three-regime findings, governance implications, and future directions
- Paper: Appendix data — fills scenario parameter tables and detailed per-epoch breakdowns
Improvements
- Pre-commit hook skips pytest for non-code changes — staging only
.md,.yaml, or other non-code files no longer triggers the full 2200-test suite, cutting commit time from ~30s to <1s for docs-only changes
Quick Start
git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"
jupyter notebook examples/quickstart.ipynbFull Changelog: v1.1.2...v1.2.0
v1.1.2: Tmux Multi-Session Launcher
New
/tmuxcommand — hotkey reference for tmux multi-session workflowsscripts/claude-tmux.sh— launcher script for running parallel Claude Code sessions in tmux panes
Quick Start
git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"
jupyter notebook examples/quickstart.ipynbFull Changelog: v1.1.1...v1.1.2
v1.1.1: Hook Fix, New Slash Commands, Paper Expansion
Fixes
- Pre-commit hook exit code handling — capture pytest exit code explicitly and add
exit 0to prevent bash from misinterpreting trailing output underset -e - Missing agent frontmatter — add name field to
research_scoutagent
New
/warmupcommand — session opening sequence for fast orientation/check-ignorecommand — verify gitignore coverage for sensitive files/lint-fixcommand — auto-fix linting issues on staged files
Improvements
- Paper expanded — formal model section, marketplace/network results tables
- Hot mess theory reference — added Anthropic's variance-dominated failure framing to incoherence scaling section
Quick Start
git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"
jupyter notebook examples/quickstart.ipynbFull Changelog: v1.1.0...v1.1.1