Skip to content

Releases: swarm-ai-research/swarm

v1.9.0

29 May 02:08
0ff48a8

Choose a tag to compare

What's Changed

  • ablation: τ* sweep — binary's 'blindness' is mostly threshold placement by @rsavitt in #460
  • ablation: threshold-drift — once-calibrated alarms fail silently, AUROC can't see it by @rsavitt in #461
  • blog: add Limitations section — what the τ*/drift ablations revised by @rsavitt in #462
  • blog: add adaptive-adversary finding to Limitations (four ablations) by @rsavitt in #464
  • ablation: adaptive adversary — soft constrains the mean, thresholds can be shaped around by @rsavitt in #463
  • fix(api): deterministic 404 in compare endpoint (fixes flaky test_compare_nonexistent_run_404) by @rsavitt in #465
  • test(api): isolate default API DB to temp for the session (close get_store real-DB hazard) by @rsavitt in #466
  • Add resource negotiation game handler with multi-round bargaining by @rsavitt in #415
  • feat(bridges): add Aeon agent-first ledger bridge by @rsavitt in #467
  • Release v1.9.0: finalize CHANGELOG, bump version by @rsavitt in #469
  • fix(release): install full test extras so PyPI publish isn't skipped by @rsavitt in #470
  • feat(agentgit): capability enforcement from delegation + OS-level isolation (7ge5) by @rsavitt in #468
  • fix(packaging): drop direct git dep so PyPI publish succeeds by @rsavitt in #473
  • feat(agentgit): enriched, tamper-evident provenance block (8ll9) by @rsavitt in #474
  • fix(release): run release tests in parallel with a timeout by @rsavitt in #475

Full Changelog: v1.8.0...v1.9.0

v1.8.0: Soft-vs-binary detection, platform bridges, governance studies

26 May 01:43
d60b9be

Choose a tag to compare

v1.8.0 — Soft-vs-binary detection, platform bridges, governance studies

489 commits since v1.7.0. Highlights:

Soft-vs-binary detection framework (swarm/detection/)

Turns the self-optimizing-agent vignette into a real experiment: every soft metric paired with its thresholded binary twin, scored as a classifier. AUROC / AUPRC / partial-AUROC, time-to-detection at fixed FPR, market-level adverse selection, calibration, and paired significance testing. Adds a 2D sensitivity-grid runner (run_detection_sensitivity_2d.py --preset heterogeneous) and a heterogeneous "informative" regime that avoids the AUROC=1.0 generator ceiling, plus a companion blog post.

External-platform bridges

MiroShark (social-cascade sim + SoftMetrics judging), LangChain, AutoGPT, CrewAI, Mesa ABM, RAG/LEANN, Hyperspace DAG domain, LabOS Toolmaker→Critic.

Governance & misalignment studies

Adaptive governance controller, governance parameter/sensitivity sweeps, misalignment module + sweeps, Tierra artificial-life scenario + hardening, evolutionary game handler, capability–safety Pareto frontiers, causal-credit propagation, and the triangle (misalignment × causal credit × toxicity) study. Plus escalation-sandbox LLM studies (temperature, prompt framing, model size, cooperation window).

New agent types & mechanisms

ThresholdDancer adversary, behavioral agent types, hyperagent self-modification, dynamic toxicity feedback, artifact registry + cascade-risk governance, PerformanceTracker, net-social-welfare metric.

On-chain

SwarmGym safety auditor CLI + SafetyAttestation contract (Base) + web3 client.

Other

Orchestrator pipeline/middleware refactor (god-object → middleware pipeline + handler factory + scheduler); numerous case-study blog posts.

See CHANGELOG.md for the full itemized list.

Quick Start

python -m pip install -e ".[dev,runtime]"
python -m swarm run scenarios/baseline.yaml --seed 42 --epochs 10 --steps 10

v1.7.0: Contract screening, viz game, llama.cpp, 164 commits

21 Feb 16:55

Choose a tag to compare

Highlights

  • Contract screening system for separating equilibrium analysis with multi-seed sweep, collusion/sybil detection, and red-team blog posts (#234)
  • Interactive isometric visualization game — browser-based SWARM simulation with Gemini Imagen 4 sprites, compare mode, sweep, leaderboard, and governance intervention controls (#182, #212)
  • llama.cpp local inference provider with server setup, health checks, and SSRF hardening (#232)
  • LangGraph governed handoff study — 4-agent Claude swarm, 32-config sweep
  • Memori semantic memory middleware for LLM agents (#217)
  • Loop detector governance lever with graduated enforcement (#198)
  • Agent API Phase 1–3: scoped permissions, trace IDs, approval workflows
  • SQLite persistence for simulations, governance, and scenarios (lazy-init to fix CI xdist contention)
  • SciAgentGym bridge restored — tool substrate integration for scientific workflow agents (9 modules, 44 tests)
  • Multiple SSRF/security fixes (#223, #225, #230, #236, #238, #239, #242)

Added

  • SciAgentGym bridge restored — tool substrate integration for scientific workflow agents with environment management, workspace isolation, toolkit, governance hooks, and provider abstraction (reverts removal from #209)
  • Contract screening system for separating equilibrium analysis with lock-in semantics, welfare metric, multi-seed sweep (10 seeds), collusion detection, sybil detection, and plot script (#234)
  • LangGraph governed handoff study with 4-agent Claude swarm, 32-config sweep (seed 42), and sweep overview plot
  • Hodoscope trajectory analysis bridge for agent trace inspection
  • SQLite persistence for simulations, governance state, and scenarios with lazy-init singletons
  • SoftMetrics wired into Web API /api/v1/metrics endpoint
  • llama.cpp local inference provider with server setup script, health checks, seed validation, and SSRF/path-traversal hardening (#232)
  • Interactive isometric visualization game (viz/): Next.js browser-based SWARM simulation with client-side engine, Gemini Imagen 4 sprite assets, compare mode, parameter sweep, leaderboard, governance intervention controls, preset scenarios, narrative annotations, and data export (#182, #212)
  • Memori semantic memory middleware for LLM agents with persistent fact recall, SQLite-backed storage, and OpenRouter scenario variant (#217)
  • Loop detector governance lever with graduated enforcement (#198)
  • Agent API Phase 1–3: scoped permissions, trace IDs, structured errors, PATCH endpoints, filtering, validation, agent approval workflow
  • SciAgentBench harness with topology matrix support (#200)
  • Evaluation metrics suite for success rate, efficiency, and detection (#201)
  • SciForge-style trace-to-task synthesis with replay verification (#203)
  • Parameter validation and clamping diagnostics for proxy computation (#176)
  • MetricsAggregator wired into CLI and example export (#212)
  • Reproducibility documentation with one-command run workflow (#204)
  • Integration tests for runtime environment lifecycle with leak detection (#197)
  • EPIC tracking infrastructure for bridge integrations (#194)
  • Collaborative chemistry under budget and audits scenario (#202)
  • E2E integration tests for Web API simulation lifecycle
  • Blog posts: Qwen3-30B SWARM Economy v0.2, contract screening separating equilibrium, multi-seed results, red-team findings
  • Slash commands: /build_game, /obsidian, /sync_artifacts, /security-review, /audit_docs, /check_nav, /bump_version
  • Streamlit Cloud deployment and HF Spaces sandbox link
  • Social preview image (1280x640)

Changed

  • README audit: Updated all counts to match codebase (4603 tests, 78 scenarios, 29 agent modules, 27 governance modules, 95 bridge files)
  • LLM provider list expanded to all 9 supported providers
  • Consolidated slash commands: merged related commands into /ship, /merge_session, /sync, /fix_pr, /analyze_experiment
  • Moved pytest from pre-commit to pre-push hook (#177)
  • Removed abs() from ProxyWeights.normalize() (#178)
  • Updated crewai >=0.80.0,<2.0 (#221), bumped action-download-artifact to 15 (#220)
  • Pinned langgraph and langchain-core to exact versions

Fixed

  • SQLite lock contention in CI: Lazy-init store singletons to prevent database is locked under pytest-xdist
  • SSRF hardening: 4 separate fixes (#223, #225, #230, #236, #238, #242)
  • Information exposure in AWM adapter (#239)
  • 7 security vulnerabilities in contract screening
  • mypy method-assign error in simulations router
  • SkillRL refinement governance bypass (#214)
  • 77 Ruff linting errors (#218), mypy errors across multiple modules
  • Flaky test stabilized with deterministic RNG seeds
  • Static asset paths for viz game deployment
  • 8 missing blog posts in mkdocs nav

Full Changelog: v1.6.0...v1.7.0

166 commits

v1.6.0: Artifacts migration, 6 new bridges, visual upgrade

21 Feb 16:53

Choose a tag to compare

Highlights

  • Agent sandbox with retry/failover, CrewAI adapter, PettingZoo/AWM/AI-Scientist bridges
  • Recursive subagent spawning, self-modification governance, Team-of-Rivals review
  • 12 visual analysis modules with dark/light theme system
  • Artifacts repo migration (~5 GB removed from main → swarm-artifacts)
  • 12 new slash commands, research integrity auditor agent
  • Multiple critical fixes: unseeded RNG, EventLog.clear(), security hardening

Added

  • Agent sandbox with exponential backoff retry, async failover, virtual filesystem, and checkpoint isolation (#152, #157)
  • CrewAI adapter for integrating SWARM agent policies into CrewAI workflows (#167)
  • PettingZoo bridge for multi-agent RL environment interop
  • AWM (Agent World Model) bridge — database-backed task environment with MCP server lifecycle (Phase 1 + 2)
  • AI-Scientist bridge for autonomous research pipeline integration
  • LangGraph Swarm bridge with governance-aware agent orchestration (#151)
  • Concordia entity agent with entity sweep, run logger, and governance report
  • Gather-Trade-Build domain with bilevel tax policy and adversarial agents (#164)
  • Self-modification governance lever — Two-Gate policy for agent self-edit control (#165)
  • Recursive subagent spawning infrastructure with spawn metrics, scenario loader, and red-team evaluation
  • Team-of-Rivals adversarial review pipeline with Lean proof modules
  • Visual upgrade: 12 analysis modules with dark/light theme system, KPI cards, gradient fills, and multi-scenario dashboard (#163)
  • Agent API with runs, posts, persistence, and security hardening (#156)
  • Slash commands: /rename_symbol, /session_guard, /audit_fix, /fix_commit, /load_keys, /render_promo, /council_review, /scrub_id, /deploy_blog, /cherry_pick_pr, /post_skillevolve, /refine_study
  • Research papers: AI Economist GTB multi-seed, deeper acausality, collusion tax effect
  • Blog posts: Self-optimizer distributional safety, Claude Code subagents, AI Economist GTB, SkillRL dynamics

Changed

  • Artifacts repo migration: Moved runs/, lean/, promo/, research/, docs/papers/ to swarm-artifacts — reduces clone size by ~5 GB
  • Lean toolchain upgraded to v4.28.0; all sorry eliminated from proofs
  • EventBus initialization simplified across all handlers
  • swarm.analysis lazy-loads matplotlib so it works without display dependencies

Fixed

  • Critical: Unseeded RNG and destructive EventLog.clear()
  • 18 security audit findings in agent sandbox
  • Circuit breaker, cost tracking, Holm-Bonferroni correction (#158)
  • Governed swarm: cycle threshold, composite redirect, handoff counter (#159)
  • GasTown bridge: branch fallback, CI-fail grep pattern (#160)
  • 5 flaky tests stabilized with seeds and constrained inputs
  • 5 mypy errors and lint issues

Full Changelog: v1.5.0...v1.6.0

108 commits from 8 contributors

v1.5.0: GasTown governance cost study

13 Feb 17:40

Choose a tag to compare

New

  • GasTown governance cost study: 42-run study (7 compositions x 2 regimes x 3 seeds, 1,260 total epochs) revealing a governance cost paradox — safety levers reduce toxicity at every adversarial level (mean reduction 0.071) but impose welfare costs that exceed the safety benefit at all tested proportions
  • Research paper: "The Cost of Safety: Governance Overhead vs. Toxicity Reduction in GasTown Multi-Agent Workspaces" with 5 figures (toxicity, welfare, payoff breakdown, adverse selection, governance protection)
  • Pre-commit private infra scan: Blocks accidental commit of Prime Intellect dashboard URLs and run IDs in public-facing files

Improvements

  • IMPLEMENTATION_PLAN.md updated to reflect current stats (2,922 tests, 55 scenarios, 12 domain handlers, 22 agent modules, CSM and Council sections added)

Key Finding

At 0% adversarial, governance costs 216 welfare units (-57.6%) for only 0.066 toxicity reduction. The cost narrows as adversarial pressure increases, converging at 86% rogue. This suggests governance levers are most cost-effective when targeted rather than applied uniformly.

Quick Start

pip install swarm-safety
python -m swarm run scenarios/baseline.yaml --seed 42 --epochs 10 --steps 10

v1.4.0: Handler extraction, decision theory studies, event bus

13 Feb 16:56

Choose a tag to compare

New

  • Handler extraction: 8 core actions extracted from Orchestrator into FeedHandler (POST/REPLY/VOTE), CoreInteractionHandler (PROPOSE/ACCEPT/REJECT), and TaskHandler (CLAIM/SUBMIT) — _handle_core_action reduced from 130 lines to 5
  • Decision theory studies: Full studies comparing TDT vs FDT vs UDT at population scales up to 21 agents, including UDT precommitment advantage analysis
  • Prime Intellect bridge: external_run_id column in scenario_runs for cross-platform run tracking
  • Event bus: TypedDict schemas for event payloads and metadata, generalizing the WorktreeEvent pattern to the core framework
  • GasTown bridge: Branch-based observation support for multi-branch governance
  • CHANGELOG auto-update: /release command now automatically converts [Unreleased] to versioned entry

Improvements

  • SoftInteraction.to_dict()model_dump(mode='json') and from_dict()model_validate() (DRY)
  • Reputation delta formula (p - 0.5) - c_a documented with full derivation in InteractionFinalizer
  • Comprehensive CHANGELOG covering all releases from v0.1.0 through v1.3.1

Fixes

  • 87 pre-existing mypy errors across tests/ and scripts/
  • CAPTCHA solver dash deobfuscation and multiply detection
  • Submission author normalization to SWARM Research Collective

Stats

  • 274 files changed, 36,426 insertions, 1,736 deletions since v1.3.1
  • 2,922 tests passing

Quick Start

pip install swarm-safety
python -m swarm run scenarios/baseline.yaml --seed 42 --epochs 10 --steps 10

v1.3.1

11 Feb 06:39

Choose a tag to compare

What's Changed

  • Codex-generated pull request by @rsavitt in #111
  • docs: Add security & integration review for abhi-arya1/wt by @rsavitt in #105
  • Potential fix for code scanning alert no. 14: Clear-text logging of sensitive information by @rsavitt in #112
  • Claude/swarm csm benchmark y f0 do by @rsavitt in #115
  • Fix MCP config: use portable uvx path instead of hardcoded user path by @rsavitt in #116
  • Add initial SWARM ↔ Ralph bridge with JSONL event ingestion by @rsavitt in #109
  • Claude/swarm evolving skills f bcn k by @rsavitt in #114
  • Implement Logical Decision Theory (LDT) agent with updateless cooperation by @rsavitt in #117
  • Add LDT vs honest agent composition study by @rsavitt in #118
  • Potential fix for code scanning alert no. 16: Clear-text logging of sensitive information by @rsavitt in #119
  • Prepare package for PyPI publishing by @rsavitt in #120

Full Changelog: v1.3.0...v1.3.1

v1.2.0: Paper Completion, Smarter Pre-Commit Hook

10 Feb 05:11

Choose a tag to compare

New

  • Paper: Related Work section — positions SWARM against market microstructure, multi-agent safety, and mechanism design literature
  • Paper: Conclusion section — summarizes three-regime findings, governance implications, and future directions
  • Paper: Appendix data — fills scenario parameter tables and detailed per-epoch breakdowns

Improvements

  • Pre-commit hook skips pytest for non-code changes — staging only .md, .yaml, or other non-code files no longer triggers the full 2200-test suite, cutting commit time from ~30s to <1s for docs-only changes

Quick Start

git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"
jupyter notebook examples/quickstart.ipynb

Full Changelog: v1.1.2...v1.2.0

v1.1.2: Tmux Multi-Session Launcher

10 Feb 04:58

Choose a tag to compare

New

  • /tmux command — hotkey reference for tmux multi-session workflows
  • scripts/claude-tmux.sh — launcher script for running parallel Claude Code sessions in tmux panes

Quick Start

git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"
jupyter notebook examples/quickstart.ipynb

Full Changelog: v1.1.1...v1.1.2

v1.1.1: Hook Fix, New Slash Commands, Paper Expansion

10 Feb 04:52

Choose a tag to compare

Fixes

  • Pre-commit hook exit code handling — capture pytest exit code explicitly and add exit 0 to prevent bash from misinterpreting trailing output under set -e
  • Missing agent frontmatter — add name field to research_scout agent

New

  • /warmup command — session opening sequence for fast orientation
  • /check-ignore command — verify gitignore coverage for sensitive files
  • /lint-fix command — auto-fix linting issues on staged files

Improvements

  • Paper expanded — formal model section, marketplace/network results tables
  • Hot mess theory reference — added Anthropic's variance-dominated failure framing to incoherence scaling section

Quick Start

git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"
jupyter notebook examples/quickstart.ipynb

Full Changelog: v1.1.0...v1.1.1