Releases · swarm-ai-research/swarm

29 May 02:08

github-actions

v1.9.0

0ff48a8

v1.9.0 Latest

Latest

What's Changed

ablation: τ* sweep — binary's 'blindness' is mostly threshold placement by @rsavitt in #460
ablation: threshold-drift — once-calibrated alarms fail silently, AUROC can't see it by @rsavitt in #461
blog: add Limitations section — what the τ*/drift ablations revised by @rsavitt in #462
blog: add adaptive-adversary finding to Limitations (four ablations) by @rsavitt in #464
ablation: adaptive adversary — soft constrains the mean, thresholds can be shaped around by @rsavitt in #463
fix(api): deterministic 404 in compare endpoint (fixes flaky test_compare_nonexistent_run_404) by @rsavitt in #465
test(api): isolate default API DB to temp for the session (close get_store real-DB hazard) by @rsavitt in #466
Add resource negotiation game handler with multi-round bargaining by @rsavitt in #415
feat(bridges): add Aeon agent-first ledger bridge by @rsavitt in #467
Release v1.9.0: finalize CHANGELOG, bump version by @rsavitt in #469
fix(release): install full test extras so PyPI publish isn't skipped by @rsavitt in #470
feat(agentgit): capability enforcement from delegation + OS-level isolation (7ge5) by @rsavitt in #468
fix(packaging): drop direct git dep so PyPI publish succeeds by @rsavitt in #473
feat(agentgit): enriched, tamper-evident provenance block (8ll9) by @rsavitt in #474
fix(release): run release tests in parallel with a timeout by @rsavitt in #475

Full Changelog: v1.8.0...v1.9.0

Contributors

rsavitt

Assets 2

26 May 01:43

rsavitt

v1.8.0

d60b9be

v1.8.0: Soft-vs-binary detection, platform bridges, governance studies

v1.8.0 — Soft-vs-binary detection, platform bridges, governance studies

489 commits since v1.7.0. Highlights:

Soft-vs-binary detection framework (`swarm/detection/`)

Turns the self-optimizing-agent vignette into a real experiment: every soft metric paired with its thresholded binary twin, scored as a classifier. AUROC / AUPRC / partial-AUROC, time-to-detection at fixed FPR, market-level adverse selection, calibration, and paired significance testing. Adds a 2D sensitivity-grid runner (run_detection_sensitivity_2d.py --preset heterogeneous) and a heterogeneous "informative" regime that avoids the AUROC=1.0 generator ceiling, plus a companion blog post.

External-platform bridges

MiroShark (social-cascade sim + SoftMetrics judging), LangChain, AutoGPT, CrewAI, Mesa ABM, RAG/LEANN, Hyperspace DAG domain, LabOS Toolmaker→Critic.

Governance & misalignment studies

Adaptive governance controller, governance parameter/sensitivity sweeps, misalignment module + sweeps, Tierra artificial-life scenario + hardening, evolutionary game handler, capability–safety Pareto frontiers, causal-credit propagation, and the triangle (misalignment × causal credit × toxicity) study. Plus escalation-sandbox LLM studies (temperature, prompt framing, model size, cooperation window).

New agent types & mechanisms

ThresholdDancer adversary, behavioral agent types, hyperagent self-modification, dynamic toxicity feedback, artifact registry + cascade-risk governance, PerformanceTracker, net-social-welfare metric.

On-chain

SwarmGym safety auditor CLI + SafetyAttestation contract (Base) + web3 client.

Other

Orchestrator pipeline/middleware refactor (god-object → middleware pipeline + handler factory + scheduler); numerous case-study blog posts.

See CHANGELOG.md for the full itemized list.

Quick Start

python -m pip install -e ".[dev,runtime]"
python -m swarm run scenarios/baseline.yaml --seed 42 --epochs 10 --steps 10

Assets 2

21 Feb 16:55

rsavitt

v1.7.0

7dd4a12

v1.7.0: Contract screening, viz game, llama.cpp, 164 commits

Highlights

Contract screening system for separating equilibrium analysis with multi-seed sweep, collusion/sybil detection, and red-team blog posts (#234)
Interactive isometric visualization game — browser-based SWARM simulation with Gemini Imagen 4 sprites, compare mode, sweep, leaderboard, and governance intervention controls (#182, #212)
llama.cpp local inference provider with server setup, health checks, and SSRF hardening (#232)
LangGraph governed handoff study — 4-agent Claude swarm, 32-config sweep
Memori semantic memory middleware for LLM agents (#217)
Loop detector governance lever with graduated enforcement (#198)
Agent API Phase 1–3: scoped permissions, trace IDs, approval workflows
SQLite persistence for simulations, governance, and scenarios (lazy-init to fix CI xdist contention)
SciAgentGym bridge restored — tool substrate integration for scientific workflow agents (9 modules, 44 tests)
Multiple SSRF/security fixes (#223, #225, #230, #236, #238, #239, #242)

Added

SciAgentGym bridge restored — tool substrate integration for scientific workflow agents with environment management, workspace isolation, toolkit, governance hooks, and provider abstraction (reverts removal from #209)
Contract screening system for separating equilibrium analysis with lock-in semantics, welfare metric, multi-seed sweep (10 seeds), collusion detection, sybil detection, and plot script (#234)
LangGraph governed handoff study with 4-agent Claude swarm, 32-config sweep (seed 42), and sweep overview plot
Hodoscope trajectory analysis bridge for agent trace inspection
SQLite persistence for simulations, governance state, and scenarios with lazy-init singletons
SoftMetrics wired into Web API /api/v1/metrics endpoint
llama.cpp local inference provider with server setup script, health checks, seed validation, and SSRF/path-traversal hardening (#232)
Interactive isometric visualization game (viz/): Next.js browser-based SWARM simulation with client-side engine, Gemini Imagen 4 sprite assets, compare mode, parameter sweep, leaderboard, governance intervention controls, preset scenarios, narrative annotations, and data export (#182, #212)
Memori semantic memory middleware for LLM agents with persistent fact recall, SQLite-backed storage, and OpenRouter scenario variant (#217)
Loop detector governance lever with graduated enforcement (#198)
Agent API Phase 1–3: scoped permissions, trace IDs, structured errors, PATCH endpoints, filtering, validation, agent approval workflow
SciAgentBench harness with topology matrix support (#200)
Evaluation metrics suite for success rate, efficiency, and detection (#201)
SciForge-style trace-to-task synthesis with replay verification (#203)
Parameter validation and clamping diagnostics for proxy computation (#176)
MetricsAggregator wired into CLI and example export (#212)
Reproducibility documentation with one-command run workflow (#204)
Integration tests for runtime environment lifecycle with leak detection (#197)
EPIC tracking infrastructure for bridge integrations (#194)
Collaborative chemistry under budget and audits scenario (#202)
E2E integration tests for Web API simulation lifecycle
Blog posts: Qwen3-30B SWARM Economy v0.2, contract screening separating equilibrium, multi-seed results, red-team findings
Slash commands: /build_game, /obsidian, /sync_artifacts, /security-review, /audit_docs, /check_nav, /bump_version
Streamlit Cloud deployment and HF Spaces sandbox link
Social preview image (1280x640)

Changed

README audit: Updated all counts to match codebase (4603 tests, 78 scenarios, 29 agent modules, 27 governance modules, 95 bridge files)
LLM provider list expanded to all 9 supported providers
Consolidated slash commands: merged related commands into /ship, /merge_session, /sync, /fix_pr, /analyze_experiment
Moved pytest from pre-commit to pre-push hook (#177)
Removed abs() from ProxyWeights.normalize() (#178)
Updated crewai >=0.80.0,<2.0 (#221), bumped action-download-artifact to 15 (#220)
Pinned langgraph and langchain-core to exact versions

Fixed

SQLite lock contention in CI: Lazy-init store singletons to prevent database is locked under pytest-xdist
SSRF hardening: 4 separate fixes (#223, #225, #230, #236, #238, #242)
Information exposure in AWM adapter (#239)
7 security vulnerabilities in contract screening
mypy method-assign error in simulations router
SkillRL refinement governance bypass (#214)
77 Ruff linting errors (#218), mypy errors across multiple modules
Flaky test stabilized with deterministic RNG seeds
Static asset paths for viz game deployment
8 missing blog posts in mkdocs nav

Full Changelog: v1.6.0...v1.7.0

166 commits

Assets 2

21 Feb 16:53

rsavitt

v1.6.0

9edc0de

v1.6.0: Artifacts migration, 6 new bridges, visual upgrade

Highlights

Agent sandbox with retry/failover, CrewAI adapter, PettingZoo/AWM/AI-Scientist bridges
Recursive subagent spawning, self-modification governance, Team-of-Rivals review
12 visual analysis modules with dark/light theme system
Artifacts repo migration (~5 GB removed from main → swarm-artifacts)
12 new slash commands, research integrity auditor agent
Multiple critical fixes: unseeded RNG, EventLog.clear(), security hardening

Added

Agent sandbox with exponential backoff retry, async failover, virtual filesystem, and checkpoint isolation (#152, #157)
CrewAI adapter for integrating SWARM agent policies into CrewAI workflows (#167)
PettingZoo bridge for multi-agent RL environment interop
AWM (Agent World Model) bridge — database-backed task environment with MCP server lifecycle (Phase 1 + 2)
AI-Scientist bridge for autonomous research pipeline integration
LangGraph Swarm bridge with governance-aware agent orchestration (#151)
Concordia entity agent with entity sweep, run logger, and governance report
Gather-Trade-Build domain with bilevel tax policy and adversarial agents (#164)
Self-modification governance lever — Two-Gate policy for agent self-edit control (#165)
Recursive subagent spawning infrastructure with spawn metrics, scenario loader, and red-team evaluation
Team-of-Rivals adversarial review pipeline with Lean proof modules
Visual upgrade: 12 analysis modules with dark/light theme system, KPI cards, gradient fills, and multi-scenario dashboard (#163)
Agent API with runs, posts, persistence, and security hardening (#156)
Slash commands: /rename_symbol, /session_guard, /audit_fix, /fix_commit, /load_keys, /render_promo, /council_review, /scrub_id, /deploy_blog, /cherry_pick_pr, /post_skillevolve, /refine_study
Research papers: AI Economist GTB multi-seed, deeper acausality, collusion tax effect
Blog posts: Self-optimizer distributional safety, Claude Code subagents, AI Economist GTB, SkillRL dynamics

Changed

Artifacts repo migration: Moved runs/, lean/, promo/, research/, docs/papers/ to swarm-artifacts — reduces clone size by ~5 GB
Lean toolchain upgraded to v4.28.0; all sorry eliminated from proofs
EventBus initialization simplified across all handlers
swarm.analysis lazy-loads matplotlib so it works without display dependencies

Fixed

Critical: Unseeded RNG and destructive EventLog.clear()
18 security audit findings in agent sandbox
Circuit breaker, cost tracking, Holm-Bonferroni correction (#158)
Governed swarm: cycle threshold, composite redirect, handoff counter (#159)
GasTown bridge: branch fallback, CI-fail grep pattern (#160)
5 flaky tests stabilized with seeds and constrained inputs
5 mypy errors and lint issues

Full Changelog: v1.5.0...v1.6.0

108 commits from 8 contributors

Assets 2

13 Feb 17:40

rsavitt

v1.5.0

34d230e

v1.5.0: GasTown governance cost study

New

GasTown governance cost study: 42-run study (7 compositions x 2 regimes x 3 seeds, 1,260 total epochs) revealing a governance cost paradox — safety levers reduce toxicity at every adversarial level (mean reduction 0.071) but impose welfare costs that exceed the safety benefit at all tested proportions
Research paper: "The Cost of Safety: Governance Overhead vs. Toxicity Reduction in GasTown Multi-Agent Workspaces" with 5 figures (toxicity, welfare, payoff breakdown, adverse selection, governance protection)
Pre-commit private infra scan: Blocks accidental commit of Prime Intellect dashboard URLs and run IDs in public-facing files

Improvements

IMPLEMENTATION_PLAN.md updated to reflect current stats (2,922 tests, 55 scenarios, 12 domain handlers, 22 agent modules, CSM and Council sections added)

Key Finding

At 0% adversarial, governance costs 216 welfare units (-57.6%) for only 0.066 toxicity reduction. The cost narrows as adversarial pressure increases, converging at 86% rogue. This suggests governance levers are most cost-effective when targeted rather than applied uniformly.

Quick Start

pip install swarm-safety
python -m swarm run scenarios/baseline.yaml --seed 42 --epochs 10 --steps 10

Assets 2

13 Feb 16:56

rsavitt

v1.4.0

8512e25

v1.4.0: Handler extraction, decision theory studies, event bus

New

Handler extraction: 8 core actions extracted from Orchestrator into FeedHandler (POST/REPLY/VOTE), CoreInteractionHandler (PROPOSE/ACCEPT/REJECT), and TaskHandler (CLAIM/SUBMIT) — _handle_core_action reduced from 130 lines to 5
Decision theory studies: Full studies comparing TDT vs FDT vs UDT at population scales up to 21 agents, including UDT precommitment advantage analysis
Prime Intellect bridge: external_run_id column in scenario_runs for cross-platform run tracking
Event bus: TypedDict schemas for event payloads and metadata, generalizing the WorktreeEvent pattern to the core framework
GasTown bridge: Branch-based observation support for multi-branch governance
CHANGELOG auto-update: /release command now automatically converts [Unreleased] to versioned entry

Improvements

SoftInteraction.to_dict() → model_dump(mode='json') and from_dict() → model_validate() (DRY)
Reputation delta formula (p - 0.5) - c_a documented with full derivation in InteractionFinalizer
Comprehensive CHANGELOG covering all releases from v0.1.0 through v1.3.1

Fixes

87 pre-existing mypy errors across tests/ and scripts/
CAPTCHA solver dash deobfuscation and multiply detection
Submission author normalization to SWARM Research Collective

Stats

274 files changed, 36,426 insertions, 1,736 deletions since v1.3.1
2,922 tests passing

Quick Start

pip install swarm-safety
python -m swarm run scenarios/baseline.yaml --seed 42 --epochs 10 --steps 10

Assets 2

11 Feb 06:39

github-actions

v1.3.1

8e27b1d

v1.3.1

What's Changed

Codex-generated pull request by @rsavitt in #111
docs: Add security & integration review for abhi-arya1/wt by @rsavitt in #105
Potential fix for code scanning alert no. 14: Clear-text logging of sensitive information by @rsavitt in #112
Claude/swarm csm benchmark y f0 do by @rsavitt in #115
Fix MCP config: use portable uvx path instead of hardcoded user path by @rsavitt in #116
Add initial SWARM ↔ Ralph bridge with JSONL event ingestion by @rsavitt in #109
Claude/swarm evolving skills f bcn k by @rsavitt in #114
Implement Logical Decision Theory (LDT) agent with updateless cooperation by @rsavitt in #117
Add LDT vs honest agent composition study by @rsavitt in #118
Potential fix for code scanning alert no. 16: Clear-text logging of sensitive information by @rsavitt in #119
Prepare package for PyPI publishing by @rsavitt in #120

Full Changelog: v1.3.0...v1.3.1

Contributors

rsavitt

Assets 2

10 Feb 05:11

rsavitt

v1.2.0

c6474e9

v1.2.0: Paper Completion, Smarter Pre-Commit Hook

New

Paper: Related Work section — positions SWARM against market microstructure, multi-agent safety, and mechanism design literature
Paper: Conclusion section — summarizes three-regime findings, governance implications, and future directions
Paper: Appendix data — fills scenario parameter tables and detailed per-epoch breakdowns

Improvements

Pre-commit hook skips pytest for non-code changes — staging only .md, .yaml, or other non-code files no longer triggers the full 2200-test suite, cutting commit time from ~30s to <1s for docs-only changes

Quick Start

git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"
jupyter notebook examples/quickstart.ipynb

Full Changelog: v1.1.2...v1.2.0

Assets 2

10 Feb 04:58

rsavitt

v1.1.2

f905211

v1.1.2: Tmux Multi-Session Launcher

New

/tmux command — hotkey reference for tmux multi-session workflows
scripts/claude-tmux.sh — launcher script for running parallel Claude Code sessions in tmux panes

Quick Start

git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"
jupyter notebook examples/quickstart.ipynb

Full Changelog: v1.1.1...v1.1.2

Assets 2

10 Feb 04:52

rsavitt

v1.1.1

731074f

v1.1.1: Hook Fix, New Slash Commands, Paper Expansion

Fixes

Pre-commit hook exit code handling — capture pytest exit code explicitly and add exit 0 to prevent bash from misinterpreting trailing output under set -e
Missing agent frontmatter — add name field to research_scout agent

New

/warmup command — session opening sequence for fast orientation
/check-ignore command — verify gitignore coverage for sensitive files
/lint-fix command — auto-fix linting issues on staged files

Improvements

Paper expanded — formal model section, marketplace/network results tables
Hot mess theory reference — added Anthropic's variance-dominated failure framing to incoherence scaling section

Quick Start

git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"
jupyter notebook examples/quickstart.ipynb

Full Changelog: v1.1.0...v1.1.1

Assets 2

Uh oh!

Releases: swarm-ai-research/swarm

v1.9.0

What's Changed

Contributors

Uh oh!

v1.8.0: Soft-vs-binary detection, platform bridges, governance studies

v1.8.0 — Soft-vs-binary detection, platform bridges, governance studies

Soft-vs-binary detection framework (swarm/detection/)

External-platform bridges

Governance & misalignment studies

New agent types & mechanisms

On-chain

Other

Quick Start

Uh oh!

v1.7.0: Contract screening, viz game, llama.cpp, 164 commits

Highlights

Added

Changed

Fixed

Uh oh!

v1.6.0: Artifacts migration, 6 new bridges, visual upgrade

Highlights

Added

Changed

Fixed

Uh oh!

v1.5.0: GasTown governance cost study

New

Improvements

Key Finding

Quick Start

Uh oh!

v1.4.0: Handler extraction, decision theory studies, event bus

New

Improvements

Fixes

Stats

Quick Start

Uh oh!

v1.3.1

What's Changed

Contributors

Uh oh!

v1.2.0: Paper Completion, Smarter Pre-Commit Hook

New

Improvements

Quick Start

Uh oh!

v1.1.2: Tmux Multi-Session Launcher

New

Quick Start

Uh oh!

v1.1.1: Hook Fix, New Slash Commands, Paper Expansion

Fixes

New

Improvements

Quick Start

Uh oh!

Soft-vs-binary detection framework (`swarm/detection/`)