Releases: scottconverse/AgentSuiteLocal
AgentSuiteLocal v1.0.0
Fixed
- PDF export replaced WeasyPrint with reportlab — WeasyPrint required the GTK3 native runtime (libgobject, libpango, libcairo) which is not bundled and must be installed separately on Windows. PDF export silently failed with a 501 on most fresh installs. Replaced with reportlab (pure Python, no native runtime). PDF export now works out of the box on all platforms. The PyInstaller spec
hiddenimportsblock is updated accordingly; the ~2 MB of GTK-dependent weasyprint/cairocffi/tinycss2 packages are replaced by reportlab. approve_runstate guard corrected — the guard previously accepted runs in"done"state (not in ("waiting", "done")); individual runs never reach"done"(only pipeline steps do), so the guard was wrong in principle. Changed to!= "waiting"for symmetry withreject_run.ApprovalGateViewApprove button tooltip — the tooltip now shows a distinct message when the button is disabled by missing/failed QA score ("QA score unavailable — run QA evaluation or use Override & Approve") vs. below-threshold score ("Score X/10 is below your Y gate"). Previously always showed the score message, which renderednull/10whenqa_scorewas absent.qa_scorewas silentlyNoneon every real-LLM run (V4).agentsuitelocal/api/execution.py(both call sites at L358-363 and L449-454) read the per-runqa_scores.jsonlooking for fields namedweighted_score/overall_score/score/overall— none of which are in agentsuite'sQAReportschema. The canonical field isaverage(agentsuite/kernel/qa.py:21). Result:qa_scorewas alwaysNoneon every successful real run, masked by the test'sxfail strict=Falsemarker until A3 removed it. Addedaverageto the field-lookup chain at both sites (kept legacy field names as forward-compat fallbacks). Addedtests/test_qa_score_schema_contract.py(4 contract tests) so the field-name agreement with agentsuite is now enforced by the suite.
Added (Sprint A — v0.9 milestone)
- agentsuite repinned to v1.1.1 (V1 + V2 closed at the source).
tests/test_real_founder_run.pyxfail removed (A3). The test now hard-asserts that a real founder run produces approve-able artifacts end-to-end. After the V4 fix this test is the active gate forqa_score.tests/test_qa_score_schema_contract.py(NEW, 4 tests). Schema contract guard against the V4 regression — asserts agentsuite'sQAReport.averagefield exists, round-trips JSON, preserves0.0, and uses thedimensions/scoresshape AgentSuiteLocal reads.docs/MOCKING_AUDIT.md(A4). Classification-only audit of all 48 real mock call sites intests/. 23 BOUNDARY-OK, 16 INTERNAL-JUSTIFIED, 9 INTERNAL-SUSPECT-REFACTOR (_save_state/_log_telemetry/_send_notification/_load_settingsshould become DI in Sprint B), 0 INTERNAL-SUSPECT-DELETE. Sprint B will action the recommendations.- One-run-per-session limitation declared (A5). README
Known issues,docs/user-manual.mdFAQ, anddocs/FAQ.md"Running agents" all state v1.0 supports one active run at a time per session; concurrent runs land in v1.1. - a11y Bar 1, code-only (A6).
Sidebar(top + bottom nav) setsaria-current="page"on the active item.ApprovalGateViewoverride dialog now hasrole="dialog"+aria-modal="true"+aria-label, and a window keydown effect closes it on Escape. Vitest tests cover all of the above (Sidebar.test.jsx4 tests,styles.test.js2 tests,ApprovalGateView.test.jsx2 new tests). - Bundle smoke CI on macOS + Windows (A7).
build-macosjob appends a step that launches the .app, polls~/.agentsuitelocal/launcher.port.jsonfor ≤30s, GETs/api/health, and verifies clean exit. Newbuild-windowsjob mirrors this forwindows-latest. Catches v0.8.7-class regressions where the bundle ships missing a hidden import. Both jobs gate onmain || tags || release/*.
Changed (Sprint A — v0.9 milestone)
- Removed dead
RunRequest.constraintsfield (A1, D1). Field was unused everywhere; deleted fromagentsuitelocal/api/schemas.py. Wire-compat preserved (Pydantic v2 defaultextra="ignore"accepts old clients sendingconstraints). - E2E "Run failed within 3s" assertion restored (A2). The assertion was previously commented out; A2 restores it. The mock-provider prose-vs-extract failure exposed by this change is correctly classified as evidence FOR A4 (mocking audit), not a regression — production correctly rejects non-JSON at the extract stage.
AgentSuiteLocal v0.8.9
[0.8.9] — 2026-05-06
Fixed
-
QA-DD-001 (Critical) — Trust/Risk agent slug drift fixed. v0.8.8 advertised seven agents in the picker but
web/src/data.jsusedid: "trust"whilelauncher.py/cli.pyusedtrust_risk. The kernel registry only knowstrust_risk, so every Trust/Risk run errored 3 s after launch withAgent 'trust' is not enabled or not registered. Fixed by aligningdata.js(id and mock-run reference) and the_SETTINGS_DEFAULTS["enabled_agents"]default inagentsuitelocal/api/config.pyto the canonicaltrust_risk. -
TEST-CRIT-001 (Critical, Test discipline) —
tests/test_execution.pyrestructured. The file mocked every dependency it claimed to integrate (5 of 5 tests patched_resolve_llm, the agent class,_save_state, telemetry, notifications,_workspace) — the same pattern that shipped v0.8.7's missing-ollama-SDK regression. Renamed totests/test_execution_state_machine.pywith a corrected docstring stating what the file actually covers (run-status state machine, dispatch, SSE wiring) and what it does NOT cover (resolver path, agent class). Added a newtests/test_execution_integration.pythat usesAGENTSUITE_LLM_PROVIDER_FACTORYto exercise the real resolver path with no patching: anMockLLMProviderfromagentsuite.llm.mock, a per-testAGENTSUITE_WORKSPACEtmpdir, and unmocked_save_state/_log_telemetry/_send_notification. -
DOC-V088-001 (Critical, Documentation) — In-app
ManualView.jsxrefreshed to v0.8.9. Six stale items the round-1 / round-3 doc fixes had missed: (1) Smoke step described "five quick checks" — actually four since v0.8.8; updated and notes that v0.8.8 added the kernel-inference check. (2) Kernel section claimed "you can't delete from the Kernel through the UI in v0.1" — v0.1-vintage caveat replaced with current behaviour: read-only by design, demote via file system. (3) Troubleshooting note about smoke-test failures referenced a "Phase 2 will surface these errors" roadmap promise that's been closed for months — replaced with a description of the current per-check fix-card UX. (4) "My run disappeared" answer pointed users at~/.agentsuitelocal/runs.json(replaced by SQLite in v0.8.0); now describes the WAL-modestate.dband notes the legacy file is migrated on first launch. (5) Added a "Manual version: v0.8.9 · matches docs/user-manual.md" stamp at the top so drift is now visible. (6) The recommended-models table — already in sync from QA-DD-002. -
ENG-088-002 (Critical, Performance/Data) —
run["events"]andpipeline["events"]capped at 200 entries. The lists were unbounded and serialized to SQLite on every_save_state(), so disk write size grew linearly with run length. Long pipeline runs amplified the cost noticeably. Chose Option A (cap; drop the dead deque) over Option B (wire deque into SSE replay; drop persistent events) — smaller diff, simpler invariant, no SSE protocol change. Replaced direct["events"].append(evt)calls inexecution.py(run + pipeline emit),routers/runs.py(cancellation), androuters/pipelines.py(rejection) with a single_append_event(container, evt)helper inapi.statethat FIFO-evicts beyond_MAX_EVENTS_PER_RUN = 200. Removed the dead_run_event_buffersdict and_SSE_BUFFER_SIZEconstant fromstate.py. Lifecycle markers (agent_start, agent_done, approval, error) are <10 events — well within the cap; ~190 stage_progress events of recent history fit alongside. -
UX-V088-001 (Critical, UX) — Settings save errors no longer silently show "Saved".
SettingsView.jsx's save handler did.catch(() => {}); setSaved(true)regardless of fetch outcome, giving users false confirmation when the backend was unreachable or returned non-2xx. Now: optimistic update is rolled back on failure, the topbar showsCouldn't save: <reason>(red), andsavedis not set to true. Distinguishes 5xx (detailfrom response body), 4xx, and network errors. Affects every toggle, the API key save, run timeout, QA gate threshold, model tier — every edit-and-save in Settings. -
ENG-088-001 (Critical, Security/Correctness) — PDF export now HTML-escapes artifact content.
agentsuitelocal/api/routers/runs.py:333-344was interpolatingrun_id, file paths, and artifact bodies directly into HTML inside<pre>blocks. LLM-produced artifacts routinely contain<,>,&, or literal</pre>(markdown-with-embedded-HTML, code blocks); without escaping, weasyprint parsed them as live HTML and the PDF rendered incorrectly. With a malicious artifact, injected<style>or<a href="javascript:">would execute against the rendering context. Extracted the HTML-construction logic into_build_pdf_html(run_id, outputs_dir)and appliedhtml.escape()to every interpolated value. -
QA-DD-002 (Critical) — Pro-tier model name fixed.
_TIER_MODEL_MAP["pro"]wasgemma4:26b-moe, which 404s fromhttps://registry.ollama.ai/v2/library/gemma4/manifests/26b-moe. Fresh installs that selected the Pro tier failed to pull. The wrong suffix was the entire bug — baregemma4:26b(andgemma4:31b,gemma4:latest) all exist on Ollama Hub. Fixed togemma4:26b(the closest real tag to the original 26B intent; same gemma4 family as light/balanced for consistency). Fanned out toweb/src/data.js,docs/user-manual.md,docs/architecture.md, README, both discussion seeds,ManualView.jsx, andModelView.test.jsx. Thegemma4:e2bandgemma4:e4bentries — flagged by the audit as also missing — actually do exist; left unchanged.
Changed (CI test-environment alignment)
.github/workflows/ci.yml: Playwright job now pullsgemma4:e4binstead ofgemma2:2b. The smoke endpoint verifies the configured model is installed locally before running the kernel-inference probe;_SETTINGS_DEFAULTS["model_name"]isgemma4:e4b, so CI must have that model present or the smoke step rejects with "Model not installed" and the installer walk fails on Step 5 (Continue stays disabled). v0.8.8's audit-round-1 added the smoke check; the CI workflow was never updated to match. v0.8.8 Playwright hung at "Install Playwright browsers" so this regression was masked. v0.8.9 was the first run to actually surface it.
Documentation (cross-surface currency sweep)
CONTRIBUTING.mdreconciled with v0.8.0+ reality: dev port description now points atlauncher.port.json(was the legacylauncher.logplaintext file); E2E test instructions referencegemma4:e4b(wasgemma2:2b); test-count claim "108+ tests" replaced with "160+" + "see CHANGELOG for an exact figure"; the long-stale "keepmain.pythe single source of truth" instruction replaced with the actual v0.8.0 router-per-domain layout and the policy that new routes go in the closest existing router; bug-report instructions reference bothlauncher.logandlauncher.port.jsoncorrectly.docs/architecture.md: doc-currency stamp bumped from "as of v0.8.8" to "as of v0.8.9"; the hard-coded "Full suite as of v0.8.7: 135 passing" line replaced with a release-by-release approximate table that points at CHANGELOG for exact figures.- README "Updated in" hero: the trailing block stopped at v0.8.0–v0.8.2. Added paragraphs for v0.8.0–v0.8.4, v0.8.5–v0.8.7, v0.8.8, and v0.8.9 so a reader can see the full release shape from the top of the README without opening the CHANGELOG.
- Discussion seeds and reddit launch post: bumped current-version line and the download filename (
AgentSuiteLocal-0.8.9-setup.exe) — the latter sed missed because the filename has novprefix. - README known-issues + architecture.md test-tree note: previously claimed "E2E test suite uses
gemma2:2b(Gemma 2 family), not a Gemma 4 model" and pointed readers attests/e2e/conftest.pyfor the documentation of that choice. Both became stale the moment CI was bumped togemma4:e4bin this same release; the conftest pointer was also factually wrong (conftest contains zero gemma references). Both surfaces now correctly cite.github/workflows/ci.ymlas the source-of-truth and describe the smoke-step model-installed check that motivates the CI choice. docs/user-manual.mdper-agent artifact totals: closes the audit's DOC-V088-004 (landing-page agent cards advertised 17–18 artifacts per non-Founder agent while the manual listed 5 named categories — different views of the same agent output). Added a "What you'll get back: ~N artifacts" line to each non-Founder agent matchingweb/src/data.jsexactly: Design 18, Product 17, Engineering 17, Marketing 18, Trust/Risk 17, CIO 17. Reader sees both the named categories AND the total file count, so the numbers can't read as contradictory.
Added
tests/test_execution_integration.py: real-path integration coverage for TEST-CRIT-001. Two tests: a resolver smoke-test that catches v0.8.7-class regressions directly (passes — closes the main concern of TEST-CRIT-001), and a full_execute_runend-to-end that exposed a test-fixture limitation rather than a production bug (the substring-router mock provider returns prose for the extract stage; production correctly rejects it as invalid JSON). The full-flow test isxfail-marked with an explicit pointer to the fixture follow-up; the resolver test is the active regression guard.
Watchlist follow-up
-
The
xfailontest_execute_run_real_path_against_factory_providerbelongs on the next-sprint audit watchlist (W-1 — sweep over-mocking). Hardening the mock provider to return canonical JSON for stages that demand it (extract / qa) closes the gap. Recommended approach: switch to aRecordingMockProviderkeyed by stage name with explicit JSON shapes, or aagentsuite.testing.fixtures.founder_smoke_provider()factory. -
tests/test_event_cap.py: regression test for ENG-088-002. Asserts the helper caps at_MAX_EVENTS_PER_RUN, FIFO-evicts oldest first, initialises a missingeventskey, works on pipelines, and grep-checks pro...
AgentSuiteLocal v0.8.8
This release started life as a CHANGELOG-correction patch and grew into a substantial bug-fix release. Three audit rounds produced 28 Critical/Major fixes plus the v0.8.7 broken-bundle remediation. All fixes were authored, reviewed, and validated within a single sprint window; per-finding detail lives in audit-AgentSuiteLocal-2026-05-05/.
Fixed (broken-v0.8.7-bundle remediation, bf74eb3)
ollamaSDK was missing from runtime dependencies (57ab097): SDK was assumed-imported in installer/model-management code paths but never declared inpyproject.toml's runtime deps, so wheel installs and frozen builds failed on firstimport ollamaoutside the dev environment. This is the headline regression — every other remediation in this round exists because v0.8.7's structural gaps allowed it to ship.- Installer flow re-adds Smoke as Step 5 (was dead code in v0.8.7):
web/src/App.jsxTOTAL_STEPS5→6;STEP_FIX_MAPkeys re-aligned to the labels actually emitted by/api/smoke(old keys were stale, so failed users saw no fix guidance). E2E walks all 6 steps. - Smoke now exercises the real Python kernel path:
/api/smokeconstructs anOllamaProvidervia the same_resolve_llmNew Run uses, then issues a 1-token completion viaprovider.complete. Until v0.8.7 the smoke test verified the environment (Ollama daemon healthy) but never the app (Python bundle can resolve and call a provider) — exactly why a build with a missingollamaSDK passed install and broke on first New Run. - Ollama install starts the daemon explicitly + 90s wait + actionable error: the Windows installer auto-launches a desktop GUI but does not reliably start the API daemon. We now
Popenollama serveourselves, then poll for 90s (was 30s — too tight on first boot with AV scan + GPU detection + tray handshake). Failure message points to the exact PowerShell command instead of a vague "Try launching Ollama manually." - WeasyPrint PDF export: graceful "PDF unavailable in this build": the bundled distributable doesn't ship GTK runtime libs (cairo/pango/gdk-pixbuf). Telling end users to
pip install weasyprintis advice they can't act on (nopipin a PyInstaller bundle, and the native libs are still missing). Now returns a clear "use ZIP/Markdown instead" error, with bothImportErrorandOSError(missing native libs) branches handled. - Resolver stops swallowing real failures into silent
None:_resolve_llmpreviously hadexcept Exception: return None, which hid both the missing-ollama-SDK bug AND a separateOllamaProvider(model=…)→OllamaProvider(default_model=…)kwarg mismatch. Now logs the failure (traceback at ERROR level) and stores it in a module-level snapshot retrievable viaget_last_resolver_error(). - SSE keepalive comments no longer break installer fetch-stream parsers (
b5fc36b): four installer screens (ScreenModelDownload,ScreenOllama, two paths inScreenOllamaModel) consume server-sent-event streams viafetch+ReadableStream.sse-starletteperiodically emits: ping - Nkeepalive comments, which the hand-rolled parsers were treating as malformed event data. Fixed by skipping any line beginning with:(per the SSE spec for comments).
Fixed (audit round 1 — 12 Criticals + 8 Majors, 7d3a24a)
- UX-001: strip CLI exposure from macOS install fallback copy.
ScreenOllama.jsxandScreenOllamaModel.jsxno longer tell Mac users to runbrew install ollamain Terminal. Both screens now route to the same osascript-with-admin install path used by the Windows .exe runner. User-manual / FAQ / architecture docs rewritten in the same pass. - DOC-001 / DOC-002 / DOC-004 / DOC-005: replace stale 11-step / 5-step installer descriptions with the actual 6-step flow.
ManualView.jsx,docs/user-manual.md,docs/architecture.md,docs/FAQ.mdall updated to matchApp.jsx TOTAL_STEPS=6. ManualView trailing note now points to Settings for cloud key / agent selection. - DOC-003: rewrite
SECURITY.mdto reflect OS-keychain reality. Old doc claimed API keys live insettings.json— they have actually been stored in Windows Credential Manager / macOS Keychain / Secret Service since v0.7.1. - TEST-002: document cleanroom proxy limitation in
start.shso future maintainers know cold-pull / SSE-keepalive bug classes are architecturally invisible to cleanroom. - TEST-003: new
tests/e2e/test_new_run.pywalks 6-step installer → Dashboard → New Run → asserts orchestrator dispatches without immediate failure. HonorsAGENTSUITE_LLM_PROVIDER_FACTORYfor mock injection. Closes the gap where the agent code path was untested at the UI level — exactly what the v0.8.7 missing-SDK bug crashed. - QA-001: stop hardcoding port 8765 in places the launcher's free-port fallback breaks.
launcher.pywrites~/.agentsuitelocal/launcher.port.json(single-purpose JSON, separate from the plaintext log being corrupted by overlapping writes). Inno uninstall hook reads it via PowerShell instead of POSTing to a hardcoded:8765/api/uninstall.execution.pynotificationaction_urluses_read_launcher_port(). - Plus 12 additional Critical/Major findings closed in this round; full IDs in
audit-AgentSuiteLocal-2026-05-05/.
Fixed (audit round 2 — 5 Criticals + 8 Majors, 2445268)
- Major — Windows console-flicker bug:
subprocess.run(["ollama", "--version"])from/api/ollama/statusflashed a console window on every poll because the--windowedPyInstaller bundle has no parent console. Frontend polls every few seconds. Addedcreationflags=CREATE_NO_WINDOWto that call and to the uninstallollama rmcall. Indistinguishable from malware to non-technical users. - In-app uninstall discoverability: added "Uninstall" entry to sidebar with red treatment, scrolls Settings to Danger zone on click. Settings panel was already correct — users couldn't find it without scrolling.
- QA-202: Inno
[UninstallRun]dead-socket:InitializeUninstallwas killing the process before the hook fired. Reordered so the hook POSTs graceful-shutdown first, waits 3s, then force-kills as fallback. Workspace cleanup now actually runs. - Inno
unins000.exepath discovery: also checks Program Files (x86), LocalAppData\Programs, and the running .exe's dir. - ENG-R2-001:
/api/run/{id}/retrystate-guarded — only retryable fromerror/timeout/cancelled/failed. - ENG-R2-002: E2E conftest reads the structured
launcher.port.json(was reading legacy plaintextlauncher.log). - ENG-R2-003:
AGENTSUITE_LLM_PROVIDER_FACTORYrestricted totests.*/agentsuite.testing.*/agentsuite.llm.mockprefixes — closes RCE-via-env-var primitive. - ENG-R2-005:
launcher.port.jsonwritten atomically (os.replace) AFTER server bind. - QA-201: LiveRunView Retry / Open Settings now use proper
setViewcallbacks (App.jsx has no hash router; the buttons were dead). - QA-203:
/api/smokecallsraise_for_status()after/api/generate— a 5xx no longer marks probes green. - QA-204: "Open Ollama" button checks
response.ok— 404 (Ollama not installed) no longer treated as success. - QA-205:
_resolve_llmserialized via_resolver_lock— concurrent callers can't race on scoped env restoration. - TEST2-001: mock-factory env vars set in conftest before backend import + in CI workflow Start-backend step. New sentinel-file assertion in
test_new_run.pyproves mock ran in CI. - UX2-001: added
<Icon name="open" />definition. Mac smoke recovery button no longer has phantom gap. - UX-004: Live Run no longer fakes a token counter (was
setTokens(t => t + 18)perstage_update). Cost line is "Local — no cloud cost". - UX-005: Run-failed dead-end replaced with Retry / Open Settings / Diagnostic / Back. Retry uses the new state-guarded endpoint.
- DOC2-001:
docs/user-manual.mdtier→model table was wrong (gemma2:2b/llama3.1:8b); aligned to canonical map (gemma4:e2b/gemma4:e4b/gemma4:26b-moe). - DOC2-003 / DOC2-004: README architecture section updated —
main.pyno longer described as 2000-line monolith; installer screens reflect 6-screen active flow. - CLI exposure removed from
user-manual.md: "pull custom models from the terminal" rewritten — regression from round-1 doc rewrite.
Fixed (audit round 3 — 3 Criticals + 5 Majors, 1a433ec)
- ENG-R3-001 (Critical) —
threading.Lockin async event loop: the QA-205 lock was acquired sync from inside 5 async route handlers and 1 async smoke endpoint — a contended sync lock blocks the FastAPI event loop while one resolver waits on another. Converted_resolve_llmto async, replacedthreading.Lockwithasyncio.Lock, ran the sync constructor body in a threadpool viaasyncio.to_thread. All 5 call sites inexecution.py+ 1 inrouters/ollama.pyupdated toawait. - DOC3-001 (Critical) — tier→model fan-out: DOC2-001 only landed in
docs/user-manual.md. Searched the whole repo forgemma2:2b/llama3.1:8b— found 13 references. Updateddocs/architecture.mdtier diagram, both discussion seeds, the ManualView recommended-models table, and the SettingsView uninstall-path fallback. CI workflow / CONTRIBUTING / known-issues notes left alone (legitimate test references). - QA3-301 (Critical) — in-app uninstall now re-elevates:
/api/uninstall/phase3was launchingunins000.exevia plainsubprocess.Popen, inheriting the backend's non-admin token, so the uninstaller silently failed to remove Program Files entries and registry keys. Now usesctypes ShellExecuteWwith therunasverb to prompt UAC. Falls back to plainPopenfor LocalAppData installs where elevation isn't required. - UX3-001 (Major) —
retryErrorstate set but never rendered: QA-201 added 3setRetryError()branches with no JSX referencing them — silent failure on 409 / non-OK HTTP / network errors. Inline error display added. - **ENG-R3-002 (Majo...
AgentSuiteLocal v0.8.7
⚠️ Looking for the installer? Scroll down to Assets.Don't use GitHub's Source code (zip) links — those are the source tree, not the app.
- Windows: download
AgentSuiteLocal-0.8.7-setup.exeand double-click it.- macOS: download
AgentSuiteLocal-v0.8.7.dmgand open it.
Issue #19 — migrate _execute_pipeline_step to PipelineOrchestrator for K1 cross-stage context accumulation.
Changed
_execute_pipeline_step(step 0) now routes throughPipelineOrchestrator.run(): each agent receivesStageContext.cross_stage_contextfrom all preceding stages. The old directBaseAgent.run()path is preserved as a fallback for the resume-from-error flow (step_idx > 0)._advance_pipelinenow routes throughPipelineOrchestrator.approve(): approval promotes artifacts at the kernel level and drives the next step with accumulated context. Falls back to direct execution if no orchestrator state is found on disk (resume/recovery path).- Extracted
_collect_step_artifacts(run_id, output_root): eliminates duplicated artifact + QA-score collection that was copy-pasted in_execute_run,_execute_pipeline_step, and_advance_pipeline. on_progress/kernel_progress_callbackforwarded:agent_start,agent_done, andstage_updateSSE events are emitted through the orchestrator's callback hooks, preserving real-time stream behavior.- Closes Issue #19.
Added
_execute_pipeline_step_direct: extracted legacy direct-agent path for resume; keeps the recovery flow working without requiring orchestrator state on disk.
Test changes
test_execute_pipeline_step_dispatches_non_founder_agent: updated to mockPipelineOrchestrator.runinstead ofDesignAgent.run. Verifiesstep["run_id"]andstep["status"] == "awaiting_approval"via the orchestrator code path.test_execute_pipeline_step_emits_progress_events: updated to verifykernel_progress_callbackis wired throughorch.run(), notagent.run()directly.
Test metrics (v0.8.7)
- Backend tests: 129 passing (same test count as v0.8.6; 6 ollama tests deselected in non-E2E run)
execution.pycoverage: 62%- Repo-wide coverage: 65% (floor 58%)
AgentSuiteLocal v0.8.6
Sprint 2 close-out — regression-guard tests for progress_callback wire-up; fix step key collision in pipeline SSE events.
Added
test_execute_run_emits_progress_events: regression guard that turns red ifprogress_callback=progress_callbackis removed fromagent.run()in_execute_run. Usesside_effectto invoke the callback synchronously from the executor thread;await asyncio.sleep(0)flushes thecall_soon_threadsafequeue before asserting onrun["events"].test_execute_pipeline_step_emits_progress_events: same guard for the_execute_pipeline_steppath. Asserts ≥1stage_updateevent inpipeline["events"]and thatstepcarries the pipeline step index (not the AgentSuite internal stage step).- Issue #19 filed: migrate
_execute_pipeline_steptoPipelineOrchestratorto enable K1 cross-stage context accumulation (currently bypassed; each pipeline step runs as an isolated single-agent call).
Fixed
stepkey collision in pipelineprogress_callback:stage_progressevents emitted byBaseAgent.run()include a"step"field (intra-stage step counter). Forwarding the full dict while also passingstep=step_idxto_emit_pipelinecausedTypeError: got multiple values for keyword argument 'step'at runtime, silently swallowing all pipelinestage_updateevents. Now strips"step"from the forwarded payload;step=step_idx(pipeline step index) is authoritative.
Test metrics (v0.8.6)
- Backend tests: 135 passing (was 127 in v0.8.5; +8 net including 2 new progress-event guards)
execution.pycoverage: 72% (was 70% against v0.8.5 tests; +2pp, coversprogress_callbackclosure bodies)- Repo-wide coverage: 67% (floor 60%)
AgentSuiteLocal v0.8.5
Sprint 2 — wire AgentSuite v1.1.0 intra-stage progress events to SSE stream.
Changed
- AgentSuite pin bumped
@v1.0.11→@v1.1.0: brings in K1 cross-stage context accumulator and K2 intra-stage progress callbacks (BaseAgent.run(progress_callback=...)+PipelineOrchestrator(kernel_progress_callback=...)). - Real
progress_callbackwired in_execute_run: replaces no-op stubs. Usesloop.call_soon_threadsafeto safely pushstage_updateSSE events from the thread-pool executor thread to the asyncio event loop. The frontendLiveRunView.jsxalready handlesstage_updateevents. - Real
progress_callbackwired in_execute_pipeline_step: same pattern; events arrive asstage_updatewith an additionalstepfield carrying the pipeline step index.
Fixed
- Closes Issue #10 — intra-stage SSE events were blocked on AgentSuite v1.1.0 shipping
PipelineOrchestrator. That work is now tagged and the no-op stubs are removed.
AgentSuiteLocal v0.8.4
Fixed
softprops/action-gh-releasenode24 migration:release.ymlwas pinned to3bb12739(v2, node20). Updated tob4309332(v3.0.0, node24). This completes the Sprint 0 node24 migration — all fiveactions/*pins were already on node24; this was the one missed action.- SHA-pins comment block: updated to
@v3notation and added(node24)annotation to every entry so future audits can verify compatibility without an API call.
Note: Sprint 0 was declared complete after migrating
actions/checkout,actions/setup-python,actions/setup-node,actions/upload-artifact, andactions/download-artifact.softprops/action-gh-releasewas listed in the same comment block but its pinned SHA was not checked for node20/24 status. The deprecation warning appeared on the v0.8.3 release run. Node.js 20 forced-default deadline: 2026-06-02. Node.js 20 hard-removal from runners: 2026-09-16.
AgentSuiteLocal v0.8.3
Added
tests/test_launcher.py— two tests:test_primes_enabled_agents_envandtest_does_not_override_operator_env. Regression guard: removing thesetdefaultfromlauncher.main()causes these to fail immediately.TestEnabledAgentsclass intests/test_cli.py— same two assertions forcli.main().
Changed
pyproject.toml: replaced staticversion = "0.8.2"withdynamic = ["version"]+[tool.setuptools.dynamic]pointing toagentsuitelocal.__version__.__version__.__version__.pyis now the single source of truth;pyproject.tomlhas no independent version string that can drift.tests/test_execution.py: removed inlineos.environ.setdefault(...)calls fromtest_execute_run_dispatches_non_founder_agentandtest_execute_pipeline_step_dispatches_non_founder_agent. Replaced with_all_agents_enabledpytest fixture (usesmonkeypatch). Failure signals are now clean: entry-point tests fail for entry-point regressions; execution tests fail for execution regressions.README.md: bumped version header tov0.8.2, updated installer filename references, updated/api/versionexample, updated data-flow description (PipelineOrchestrator → BaseAgent.run()), updated Known Issues header, removed stale v0.1.2 commit-SHA bullet, added Recent releases table.
AgentSuiteLocal v0.8.2
Fixed
- Version metadata:
pyproject.tomlandagentsuitelocal/__version__.pybumped to0.8.2. v0.8.0 and v0.8.1 shipped with version0.7.1in package metadata —pip showand/api/versionreported the wrong value. Fixed going forward; see note below. - CI version gate:
release.ymlverify-cijob now checks that the package version in__version__.pymatches the git tag before any build starts. Prevents this class of drift from recurrence.
Note: v0.8.0 and v0.8.1 wheels report version
0.7.1frompip showdue to the metadata bump being missed in those releases. The git tags and release assets are unaffected. v0.8.2 fixes this and adds CI enforcement to prevent recurrence.
AgentSuiteLocal v0.8.1
v0.8.1 — enable all 7 agents at launch, footgun fix