Skip to content

v1.7.0: Contract screening, viz game, llama.cpp, 164 commits

Choose a tag to compare

@rsavitt rsavitt released this 21 Feb 16:55
· 572 commits to main since this release

Highlights

  • Contract screening system for separating equilibrium analysis with multi-seed sweep, collusion/sybil detection, and red-team blog posts (#234)
  • Interactive isometric visualization game — browser-based SWARM simulation with Gemini Imagen 4 sprites, compare mode, sweep, leaderboard, and governance intervention controls (#182, #212)
  • llama.cpp local inference provider with server setup, health checks, and SSRF hardening (#232)
  • LangGraph governed handoff study — 4-agent Claude swarm, 32-config sweep
  • Memori semantic memory middleware for LLM agents (#217)
  • Loop detector governance lever with graduated enforcement (#198)
  • Agent API Phase 1–3: scoped permissions, trace IDs, approval workflows
  • SQLite persistence for simulations, governance, and scenarios (lazy-init to fix CI xdist contention)
  • SciAgentGym bridge restored — tool substrate integration for scientific workflow agents (9 modules, 44 tests)
  • Multiple SSRF/security fixes (#223, #225, #230, #236, #238, #239, #242)

Added

  • SciAgentGym bridge restored — tool substrate integration for scientific workflow agents with environment management, workspace isolation, toolkit, governance hooks, and provider abstraction (reverts removal from #209)
  • Contract screening system for separating equilibrium analysis with lock-in semantics, welfare metric, multi-seed sweep (10 seeds), collusion detection, sybil detection, and plot script (#234)
  • LangGraph governed handoff study with 4-agent Claude swarm, 32-config sweep (seed 42), and sweep overview plot
  • Hodoscope trajectory analysis bridge for agent trace inspection
  • SQLite persistence for simulations, governance state, and scenarios with lazy-init singletons
  • SoftMetrics wired into Web API /api/v1/metrics endpoint
  • llama.cpp local inference provider with server setup script, health checks, seed validation, and SSRF/path-traversal hardening (#232)
  • Interactive isometric visualization game (viz/): Next.js browser-based SWARM simulation with client-side engine, Gemini Imagen 4 sprite assets, compare mode, parameter sweep, leaderboard, governance intervention controls, preset scenarios, narrative annotations, and data export (#182, #212)
  • Memori semantic memory middleware for LLM agents with persistent fact recall, SQLite-backed storage, and OpenRouter scenario variant (#217)
  • Loop detector governance lever with graduated enforcement (#198)
  • Agent API Phase 1–3: scoped permissions, trace IDs, structured errors, PATCH endpoints, filtering, validation, agent approval workflow
  • SciAgentBench harness with topology matrix support (#200)
  • Evaluation metrics suite for success rate, efficiency, and detection (#201)
  • SciForge-style trace-to-task synthesis with replay verification (#203)
  • Parameter validation and clamping diagnostics for proxy computation (#176)
  • MetricsAggregator wired into CLI and example export (#212)
  • Reproducibility documentation with one-command run workflow (#204)
  • Integration tests for runtime environment lifecycle with leak detection (#197)
  • EPIC tracking infrastructure for bridge integrations (#194)
  • Collaborative chemistry under budget and audits scenario (#202)
  • E2E integration tests for Web API simulation lifecycle
  • Blog posts: Qwen3-30B SWARM Economy v0.2, contract screening separating equilibrium, multi-seed results, red-team findings
  • Slash commands: /build_game, /obsidian, /sync_artifacts, /security-review, /audit_docs, /check_nav, /bump_version
  • Streamlit Cloud deployment and HF Spaces sandbox link
  • Social preview image (1280x640)

Changed

  • README audit: Updated all counts to match codebase (4603 tests, 78 scenarios, 29 agent modules, 27 governance modules, 95 bridge files)
  • LLM provider list expanded to all 9 supported providers
  • Consolidated slash commands: merged related commands into /ship, /merge_session, /sync, /fix_pr, /analyze_experiment
  • Moved pytest from pre-commit to pre-push hook (#177)
  • Removed abs() from ProxyWeights.normalize() (#178)
  • Updated crewai >=0.80.0,<2.0 (#221), bumped action-download-artifact to 15 (#220)
  • Pinned langgraph and langchain-core to exact versions

Fixed

  • SQLite lock contention in CI: Lazy-init store singletons to prevent database is locked under pytest-xdist
  • SSRF hardening: 4 separate fixes (#223, #225, #230, #236, #238, #242)
  • Information exposure in AWM adapter (#239)
  • 7 security vulnerabilities in contract screening
  • mypy method-assign error in simulations router
  • SkillRL refinement governance bypass (#214)
  • 77 Ruff linting errors (#218), mypy errors across multiple modules
  • Flaky test stabilized with deterministic RNG seeds
  • Static asset paths for viz game deployment
  • 8 missing blog posts in mkdocs nav

Full Changelog: v1.6.0...v1.7.0

166 commits