Skip to content

Releases: scottconverse/AgentSuiteLocal

AgentSuiteLocal v1.0.0

08 May 20:59

Choose a tag to compare

Fixed

  • PDF export replaced WeasyPrint with reportlab — WeasyPrint required the GTK3 native runtime (libgobject, libpango, libcairo) which is not bundled and must be installed separately on Windows. PDF export silently failed with a 501 on most fresh installs. Replaced with reportlab (pure Python, no native runtime). PDF export now works out of the box on all platforms. The PyInstaller spec hiddenimports block is updated accordingly; the ~2 MB of GTK-dependent weasyprint/cairocffi/tinycss2 packages are replaced by reportlab.
  • approve_run state guard corrected — the guard previously accepted runs in "done" state (not in ("waiting", "done")); individual runs never reach "done" (only pipeline steps do), so the guard was wrong in principle. Changed to != "waiting" for symmetry with reject_run.
  • ApprovalGateView Approve button tooltip — the tooltip now shows a distinct message when the button is disabled by missing/failed QA score ("QA score unavailable — run QA evaluation or use Override & Approve") vs. below-threshold score ("Score X/10 is below your Y gate"). Previously always showed the score message, which rendered null/10 when qa_score was absent.
  • qa_score was silently None on every real-LLM run (V4). agentsuitelocal/api/execution.py (both call sites at L358-363 and L449-454) read the per-run qa_scores.json looking for fields named weighted_score / overall_score / score / overall — none of which are in agentsuite's QAReport schema. The canonical field is average (agentsuite/kernel/qa.py:21). Result: qa_score was always None on every successful real run, masked by the test's xfail strict=False marker until A3 removed it. Added average to the field-lookup chain at both sites (kept legacy field names as forward-compat fallbacks). Added tests/test_qa_score_schema_contract.py (4 contract tests) so the field-name agreement with agentsuite is now enforced by the suite.

Added (Sprint A — v0.9 milestone)

  • agentsuite repinned to v1.1.1 (V1 + V2 closed at the source).
  • tests/test_real_founder_run.py xfail removed (A3). The test now hard-asserts that a real founder run produces approve-able artifacts end-to-end. After the V4 fix this test is the active gate for qa_score.
  • tests/test_qa_score_schema_contract.py (NEW, 4 tests). Schema contract guard against the V4 regression — asserts agentsuite's QAReport.average field exists, round-trips JSON, preserves 0.0, and uses the dimensions/scores shape AgentSuiteLocal reads.
  • docs/MOCKING_AUDIT.md (A4). Classification-only audit of all 48 real mock call sites in tests/. 23 BOUNDARY-OK, 16 INTERNAL-JUSTIFIED, 9 INTERNAL-SUSPECT-REFACTOR (_save_state / _log_telemetry / _send_notification / _load_settings should become DI in Sprint B), 0 INTERNAL-SUSPECT-DELETE. Sprint B will action the recommendations.
  • One-run-per-session limitation declared (A5). README Known issues, docs/user-manual.md FAQ, and docs/FAQ.md "Running agents" all state v1.0 supports one active run at a time per session; concurrent runs land in v1.1.
  • a11y Bar 1, code-only (A6). Sidebar (top + bottom nav) sets aria-current="page" on the active item. ApprovalGateView override dialog now has role="dialog" + aria-modal="true" + aria-label, and a window keydown effect closes it on Escape. Vitest tests cover all of the above (Sidebar.test.jsx 4 tests, styles.test.js 2 tests, ApprovalGateView.test.jsx 2 new tests).
  • Bundle smoke CI on macOS + Windows (A7). build-macos job appends a step that launches the .app, polls ~/.agentsuitelocal/launcher.port.json for ≤30s, GETs /api/health, and verifies clean exit. New build-windows job mirrors this for windows-latest. Catches v0.8.7-class regressions where the bundle ships missing a hidden import. Both jobs gate on main || tags || release/*.

Changed (Sprint A — v0.9 milestone)

  • Removed dead RunRequest.constraints field (A1, D1). Field was unused everywhere; deleted from agentsuitelocal/api/schemas.py. Wire-compat preserved (Pydantic v2 default extra="ignore" accepts old clients sending constraints).
  • E2E "Run failed within 3s" assertion restored (A2). The assertion was previously commented out; A2 restores it. The mock-provider prose-vs-extract failure exposed by this change is correctly classified as evidence FOR A4 (mocking audit), not a regression — production correctly rejects non-JSON at the extract stage.

AgentSuiteLocal v0.8.9

06 May 08:33
cadd021

Choose a tag to compare

[0.8.9] — 2026-05-06

Fixed

  • QA-DD-001 (Critical) — Trust/Risk agent slug drift fixed. v0.8.8 advertised seven agents in the picker but web/src/data.js used id: "trust" while launcher.py / cli.py used trust_risk. The kernel registry only knows trust_risk, so every Trust/Risk run errored 3 s after launch with Agent 'trust' is not enabled or not registered. Fixed by aligning data.js (id and mock-run reference) and the _SETTINGS_DEFAULTS["enabled_agents"] default in agentsuitelocal/api/config.py to the canonical trust_risk.

  • TEST-CRIT-001 (Critical, Test discipline) — tests/test_execution.py restructured. The file mocked every dependency it claimed to integrate (5 of 5 tests patched _resolve_llm, the agent class, _save_state, telemetry, notifications, _workspace) — the same pattern that shipped v0.8.7's missing-ollama-SDK regression. Renamed to tests/test_execution_state_machine.py with a corrected docstring stating what the file actually covers (run-status state machine, dispatch, SSE wiring) and what it does NOT cover (resolver path, agent class). Added a new tests/test_execution_integration.py that uses AGENTSUITE_LLM_PROVIDER_FACTORY to exercise the real resolver path with no patching: an MockLLMProvider from agentsuite.llm.mock, a per-test AGENTSUITE_WORKSPACE tmpdir, and unmocked _save_state / _log_telemetry / _send_notification.

  • DOC-V088-001 (Critical, Documentation) — In-app ManualView.jsx refreshed to v0.8.9. Six stale items the round-1 / round-3 doc fixes had missed: (1) Smoke step described "five quick checks" — actually four since v0.8.8; updated and notes that v0.8.8 added the kernel-inference check. (2) Kernel section claimed "you can't delete from the Kernel through the UI in v0.1" — v0.1-vintage caveat replaced with current behaviour: read-only by design, demote via file system. (3) Troubleshooting note about smoke-test failures referenced a "Phase 2 will surface these errors" roadmap promise that's been closed for months — replaced with a description of the current per-check fix-card UX. (4) "My run disappeared" answer pointed users at ~/.agentsuitelocal/runs.json (replaced by SQLite in v0.8.0); now describes the WAL-mode state.db and notes the legacy file is migrated on first launch. (5) Added a "Manual version: v0.8.9 · matches docs/user-manual.md" stamp at the top so drift is now visible. (6) The recommended-models table — already in sync from QA-DD-002.

  • ENG-088-002 (Critical, Performance/Data) — run["events"] and pipeline["events"] capped at 200 entries. The lists were unbounded and serialized to SQLite on every _save_state(), so disk write size grew linearly with run length. Long pipeline runs amplified the cost noticeably. Chose Option A (cap; drop the dead deque) over Option B (wire deque into SSE replay; drop persistent events) — smaller diff, simpler invariant, no SSE protocol change. Replaced direct ["events"].append(evt) calls in execution.py (run + pipeline emit), routers/runs.py (cancellation), and routers/pipelines.py (rejection) with a single _append_event(container, evt) helper in api.state that FIFO-evicts beyond _MAX_EVENTS_PER_RUN = 200. Removed the dead _run_event_buffers dict and _SSE_BUFFER_SIZE constant from state.py. Lifecycle markers (agent_start, agent_done, approval, error) are <10 events — well within the cap; ~190 stage_progress events of recent history fit alongside.

  • UX-V088-001 (Critical, UX) — Settings save errors no longer silently show "Saved". SettingsView.jsx's save handler did .catch(() => {}); setSaved(true) regardless of fetch outcome, giving users false confirmation when the backend was unreachable or returned non-2xx. Now: optimistic update is rolled back on failure, the topbar shows Couldn't save: <reason> (red), and saved is not set to true. Distinguishes 5xx (detail from response body), 4xx, and network errors. Affects every toggle, the API key save, run timeout, QA gate threshold, model tier — every edit-and-save in Settings.

  • ENG-088-001 (Critical, Security/Correctness) — PDF export now HTML-escapes artifact content. agentsuitelocal/api/routers/runs.py:333-344 was interpolating run_id, file paths, and artifact bodies directly into HTML inside <pre> blocks. LLM-produced artifacts routinely contain <, >, &, or literal </pre> (markdown-with-embedded-HTML, code blocks); without escaping, weasyprint parsed them as live HTML and the PDF rendered incorrectly. With a malicious artifact, injected <style> or <a href="javascript:"> would execute against the rendering context. Extracted the HTML-construction logic into _build_pdf_html(run_id, outputs_dir) and applied html.escape() to every interpolated value.

  • QA-DD-002 (Critical) — Pro-tier model name fixed. _TIER_MODEL_MAP["pro"] was gemma4:26b-moe, which 404s from https://registry.ollama.ai/v2/library/gemma4/manifests/26b-moe. Fresh installs that selected the Pro tier failed to pull. The wrong suffix was the entire bug — bare gemma4:26b (and gemma4:31b, gemma4:latest) all exist on Ollama Hub. Fixed to gemma4:26b (the closest real tag to the original 26B intent; same gemma4 family as light/balanced for consistency). Fanned out to web/src/data.js, docs/user-manual.md, docs/architecture.md, README, both discussion seeds, ManualView.jsx, and ModelView.test.jsx. The gemma4:e2b and gemma4:e4b entries — flagged by the audit as also missing — actually do exist; left unchanged.

Changed (CI test-environment alignment)

  • .github/workflows/ci.yml: Playwright job now pulls gemma4:e4b instead of gemma2:2b. The smoke endpoint verifies the configured model is installed locally before running the kernel-inference probe; _SETTINGS_DEFAULTS["model_name"] is gemma4:e4b, so CI must have that model present or the smoke step rejects with "Model not installed" and the installer walk fails on Step 5 (Continue stays disabled). v0.8.8's audit-round-1 added the smoke check; the CI workflow was never updated to match. v0.8.8 Playwright hung at "Install Playwright browsers" so this regression was masked. v0.8.9 was the first run to actually surface it.

Documentation (cross-surface currency sweep)

  • CONTRIBUTING.md reconciled with v0.8.0+ reality: dev port description now points at launcher.port.json (was the legacy launcher.log plaintext file); E2E test instructions reference gemma4:e4b (was gemma2:2b); test-count claim "108+ tests" replaced with "160+" + "see CHANGELOG for an exact figure"; the long-stale "keep main.py the single source of truth" instruction replaced with the actual v0.8.0 router-per-domain layout and the policy that new routes go in the closest existing router; bug-report instructions reference both launcher.log and launcher.port.json correctly.
  • docs/architecture.md: doc-currency stamp bumped from "as of v0.8.8" to "as of v0.8.9"; the hard-coded "Full suite as of v0.8.7: 135 passing" line replaced with a release-by-release approximate table that points at CHANGELOG for exact figures.
  • README "Updated in" hero: the trailing block stopped at v0.8.0–v0.8.2. Added paragraphs for v0.8.0–v0.8.4, v0.8.5–v0.8.7, v0.8.8, and v0.8.9 so a reader can see the full release shape from the top of the README without opening the CHANGELOG.
  • Discussion seeds and reddit launch post: bumped current-version line and the download filename (AgentSuiteLocal-0.8.9-setup.exe) — the latter sed missed because the filename has no v prefix.
  • README known-issues + architecture.md test-tree note: previously claimed "E2E test suite uses gemma2:2b (Gemma 2 family), not a Gemma 4 model" and pointed readers at tests/e2e/conftest.py for the documentation of that choice. Both became stale the moment CI was bumped to gemma4:e4b in this same release; the conftest pointer was also factually wrong (conftest contains zero gemma references). Both surfaces now correctly cite .github/workflows/ci.yml as the source-of-truth and describe the smoke-step model-installed check that motivates the CI choice.
  • docs/user-manual.md per-agent artifact totals: closes the audit's DOC-V088-004 (landing-page agent cards advertised 17–18 artifacts per non-Founder agent while the manual listed 5 named categories — different views of the same agent output). Added a "What you'll get back: ~N artifacts" line to each non-Founder agent matching web/src/data.js exactly: Design 18, Product 17, Engineering 17, Marketing 18, Trust/Risk 17, CIO 17. Reader sees both the named categories AND the total file count, so the numbers can't read as contradictory.

Added

  • tests/test_execution_integration.py: real-path integration coverage for TEST-CRIT-001. Two tests: a resolver smoke-test that catches v0.8.7-class regressions directly (passes — closes the main concern of TEST-CRIT-001), and a full _execute_run end-to-end that exposed a test-fixture limitation rather than a production bug (the substring-router mock provider returns prose for the extract stage; production correctly rejects it as invalid JSON). The full-flow test is xfail-marked with an explicit pointer to the fixture follow-up; the resolver test is the active regression guard.

Watchlist follow-up

  • The xfail on test_execute_run_real_path_against_factory_provider belongs on the next-sprint audit watchlist (W-1 — sweep over-mocking). Hardening the mock provider to return canonical JSON for stages that demand it (extract / qa) closes the gap. Recommended approach: switch to a RecordingMockProvider keyed by stage name with explicit JSON shapes, or a agentsuite.testing.fixtures.founder_smoke_provider() factory.

  • tests/test_event_cap.py: regression test for ENG-088-002. Asserts the helper caps at _MAX_EVENTS_PER_RUN, FIFO-evicts oldest first, initialises a missing events key, works on pipelines, and grep-checks pro...

Read more

AgentSuiteLocal v0.8.8

05 May 22:52

Choose a tag to compare

This release started life as a CHANGELOG-correction patch and grew into a substantial bug-fix release. Three audit rounds produced 28 Critical/Major fixes plus the v0.8.7 broken-bundle remediation. All fixes were authored, reviewed, and validated within a single sprint window; per-finding detail lives in audit-AgentSuiteLocal-2026-05-05/.

Fixed (broken-v0.8.7-bundle remediation, bf74eb3)

  • ollama SDK was missing from runtime dependencies (57ab097): SDK was assumed-imported in installer/model-management code paths but never declared in pyproject.toml's runtime deps, so wheel installs and frozen builds failed on first import ollama outside the dev environment. This is the headline regression — every other remediation in this round exists because v0.8.7's structural gaps allowed it to ship.
  • Installer flow re-adds Smoke as Step 5 (was dead code in v0.8.7): web/src/App.jsx TOTAL_STEPS 5→6; STEP_FIX_MAP keys re-aligned to the labels actually emitted by /api/smoke (old keys were stale, so failed users saw no fix guidance). E2E walks all 6 steps.
  • Smoke now exercises the real Python kernel path: /api/smoke constructs an OllamaProvider via the same _resolve_llm New Run uses, then issues a 1-token completion via provider.complete. Until v0.8.7 the smoke test verified the environment (Ollama daemon healthy) but never the app (Python bundle can resolve and call a provider) — exactly why a build with a missing ollama SDK passed install and broke on first New Run.
  • Ollama install starts the daemon explicitly + 90s wait + actionable error: the Windows installer auto-launches a desktop GUI but does not reliably start the API daemon. We now Popen ollama serve ourselves, then poll for 90s (was 30s — too tight on first boot with AV scan + GPU detection + tray handshake). Failure message points to the exact PowerShell command instead of a vague "Try launching Ollama manually."
  • WeasyPrint PDF export: graceful "PDF unavailable in this build": the bundled distributable doesn't ship GTK runtime libs (cairo/pango/gdk-pixbuf). Telling end users to pip install weasyprint is advice they can't act on (no pip in a PyInstaller bundle, and the native libs are still missing). Now returns a clear "use ZIP/Markdown instead" error, with both ImportError and OSError (missing native libs) branches handled.
  • Resolver stops swallowing real failures into silent None: _resolve_llm previously had except Exception: return None, which hid both the missing-ollama-SDK bug AND a separate OllamaProvider(model=…)OllamaProvider(default_model=…) kwarg mismatch. Now logs the failure (traceback at ERROR level) and stores it in a module-level snapshot retrievable via get_last_resolver_error().
  • SSE keepalive comments no longer break installer fetch-stream parsers (b5fc36b): four installer screens (ScreenModelDownload, ScreenOllama, two paths in ScreenOllamaModel) consume server-sent-event streams via fetch + ReadableStream. sse-starlette periodically emits : ping - N keepalive comments, which the hand-rolled parsers were treating as malformed event data. Fixed by skipping any line beginning with : (per the SSE spec for comments).

Fixed (audit round 1 — 12 Criticals + 8 Majors, 7d3a24a)

  • UX-001: strip CLI exposure from macOS install fallback copy. ScreenOllama.jsx and ScreenOllamaModel.jsx no longer tell Mac users to run brew install ollama in Terminal. Both screens now route to the same osascript-with-admin install path used by the Windows .exe runner. User-manual / FAQ / architecture docs rewritten in the same pass.
  • DOC-001 / DOC-002 / DOC-004 / DOC-005: replace stale 11-step / 5-step installer descriptions with the actual 6-step flow. ManualView.jsx, docs/user-manual.md, docs/architecture.md, docs/FAQ.md all updated to match App.jsx TOTAL_STEPS=6. ManualView trailing note now points to Settings for cloud key / agent selection.
  • DOC-003: rewrite SECURITY.md to reflect OS-keychain reality. Old doc claimed API keys live in settings.json — they have actually been stored in Windows Credential Manager / macOS Keychain / Secret Service since v0.7.1.
  • TEST-002: document cleanroom proxy limitation in start.sh so future maintainers know cold-pull / SSE-keepalive bug classes are architecturally invisible to cleanroom.
  • TEST-003: new tests/e2e/test_new_run.py walks 6-step installer → Dashboard → New Run → asserts orchestrator dispatches without immediate failure. Honors AGENTSUITE_LLM_PROVIDER_FACTORY for mock injection. Closes the gap where the agent code path was untested at the UI level — exactly what the v0.8.7 missing-SDK bug crashed.
  • QA-001: stop hardcoding port 8765 in places the launcher's free-port fallback breaks. launcher.py writes ~/.agentsuitelocal/launcher.port.json (single-purpose JSON, separate from the plaintext log being corrupted by overlapping writes). Inno uninstall hook reads it via PowerShell instead of POSTing to a hardcoded :8765/api/uninstall. execution.py notification action_url uses _read_launcher_port().
  • Plus 12 additional Critical/Major findings closed in this round; full IDs in audit-AgentSuiteLocal-2026-05-05/.

Fixed (audit round 2 — 5 Criticals + 8 Majors, 2445268)

  • Major — Windows console-flicker bug: subprocess.run(["ollama", "--version"]) from /api/ollama/status flashed a console window on every poll because the --windowed PyInstaller bundle has no parent console. Frontend polls every few seconds. Added creationflags=CREATE_NO_WINDOW to that call and to the uninstall ollama rm call. Indistinguishable from malware to non-technical users.
  • In-app uninstall discoverability: added "Uninstall" entry to sidebar with red treatment, scrolls Settings to Danger zone on click. Settings panel was already correct — users couldn't find it without scrolling.
  • QA-202: Inno [UninstallRun] dead-socket: InitializeUninstall was killing the process before the hook fired. Reordered so the hook POSTs graceful-shutdown first, waits 3s, then force-kills as fallback. Workspace cleanup now actually runs.
  • Inno unins000.exe path discovery: also checks Program Files (x86), LocalAppData\Programs, and the running .exe's dir.
  • ENG-R2-001: /api/run/{id}/retry state-guarded — only retryable from error/timeout/cancelled/failed.
  • ENG-R2-002: E2E conftest reads the structured launcher.port.json (was reading legacy plaintext launcher.log).
  • ENG-R2-003: AGENTSUITE_LLM_PROVIDER_FACTORY restricted to tests.* / agentsuite.testing.* / agentsuite.llm.mock prefixes — closes RCE-via-env-var primitive.
  • ENG-R2-005: launcher.port.json written atomically (os.replace) AFTER server bind.
  • QA-201: LiveRunView Retry / Open Settings now use proper setView callbacks (App.jsx has no hash router; the buttons were dead).
  • QA-203: /api/smoke calls raise_for_status() after /api/generate — a 5xx no longer marks probes green.
  • QA-204: "Open Ollama" button checks response.ok — 404 (Ollama not installed) no longer treated as success.
  • QA-205: _resolve_llm serialized via _resolver_lock — concurrent callers can't race on scoped env restoration.
  • TEST2-001: mock-factory env vars set in conftest before backend import + in CI workflow Start-backend step. New sentinel-file assertion in test_new_run.py proves mock ran in CI.
  • UX2-001: added <Icon name="open" /> definition. Mac smoke recovery button no longer has phantom gap.
  • UX-004: Live Run no longer fakes a token counter (was setTokens(t => t + 18) per stage_update). Cost line is "Local — no cloud cost".
  • UX-005: Run-failed dead-end replaced with Retry / Open Settings / Diagnostic / Back. Retry uses the new state-guarded endpoint.
  • DOC2-001: docs/user-manual.md tier→model table was wrong (gemma2:2b / llama3.1:8b); aligned to canonical map (gemma4:e2b / gemma4:e4b / gemma4:26b-moe).
  • DOC2-003 / DOC2-004: README architecture section updated — main.py no longer described as 2000-line monolith; installer screens reflect 6-screen active flow.
  • CLI exposure removed from user-manual.md: "pull custom models from the terminal" rewritten — regression from round-1 doc rewrite.

Fixed (audit round 3 — 3 Criticals + 5 Majors, 1a433ec)

  • ENG-R3-001 (Critical) — threading.Lock in async event loop: the QA-205 lock was acquired sync from inside 5 async route handlers and 1 async smoke endpoint — a contended sync lock blocks the FastAPI event loop while one resolver waits on another. Converted _resolve_llm to async, replaced threading.Lock with asyncio.Lock, ran the sync constructor body in a threadpool via asyncio.to_thread. All 5 call sites in execution.py + 1 in routers/ollama.py updated to await.
  • DOC3-001 (Critical) — tier→model fan-out: DOC2-001 only landed in docs/user-manual.md. Searched the whole repo for gemma2:2b / llama3.1:8b — found 13 references. Updated docs/architecture.md tier diagram, both discussion seeds, the ManualView recommended-models table, and the SettingsView uninstall-path fallback. CI workflow / CONTRIBUTING / known-issues notes left alone (legitimate test references).
  • QA3-301 (Critical) — in-app uninstall now re-elevates: /api/uninstall/phase3 was launching unins000.exe via plain subprocess.Popen, inheriting the backend's non-admin token, so the uninstaller silently failed to remove Program Files entries and registry keys. Now uses ctypes ShellExecuteW with the runas verb to prompt UAC. Falls back to plain Popen for LocalAppData installs where elevation isn't required.
  • UX3-001 (Major) — retryError state set but never rendered: QA-201 added 3 setRetryError() branches with no JSX referencing them — silent failure on 409 / non-OK HTTP / network errors. Inline error display added.
  • **ENG-R3-002 (Majo...
Read more

AgentSuiteLocal v0.8.7

05 May 06:02
6db68c9

Choose a tag to compare

⚠️ Looking for the installer? Scroll down to Assets.

Don't use GitHub's Source code (zip) links — those are the source tree, not the app.

  • Windows: download AgentSuiteLocal-0.8.7-setup.exe and double-click it.
  • macOS: download AgentSuiteLocal-v0.8.7.dmg and open it.

Issue #19 — migrate _execute_pipeline_step to PipelineOrchestrator for K1 cross-stage context accumulation.

Changed

  • _execute_pipeline_step (step 0) now routes through PipelineOrchestrator.run(): each agent receives StageContext.cross_stage_context from all preceding stages. The old direct BaseAgent.run() path is preserved as a fallback for the resume-from-error flow (step_idx > 0).
  • _advance_pipeline now routes through PipelineOrchestrator.approve(): approval promotes artifacts at the kernel level and drives the next step with accumulated context. Falls back to direct execution if no orchestrator state is found on disk (resume/recovery path).
  • Extracted _collect_step_artifacts(run_id, output_root): eliminates duplicated artifact + QA-score collection that was copy-pasted in _execute_run, _execute_pipeline_step, and _advance_pipeline.
  • on_progress / kernel_progress_callback forwarded: agent_start, agent_done, and stage_update SSE events are emitted through the orchestrator's callback hooks, preserving real-time stream behavior.
  • Closes Issue #19.

Added

  • _execute_pipeline_step_direct: extracted legacy direct-agent path for resume; keeps the recovery flow working without requiring orchestrator state on disk.

Test changes

  • test_execute_pipeline_step_dispatches_non_founder_agent: updated to mock PipelineOrchestrator.run instead of DesignAgent.run. Verifies step["run_id"] and step["status"] == "awaiting_approval" via the orchestrator code path.
  • test_execute_pipeline_step_emits_progress_events: updated to verify kernel_progress_callback is wired through orch.run(), not agent.run() directly.

Test metrics (v0.8.7)

  • Backend tests: 129 passing (same test count as v0.8.6; 6 ollama tests deselected in non-E2E run)
  • execution.py coverage: 62%
  • Repo-wide coverage: 65% (floor 58%)

AgentSuiteLocal v0.8.6

05 May 05:18
d5911dd

Choose a tag to compare

Sprint 2 close-out — regression-guard tests for progress_callback wire-up; fix step key collision in pipeline SSE events.

Added

  • test_execute_run_emits_progress_events: regression guard that turns red if progress_callback=progress_callback is removed from agent.run() in _execute_run. Uses side_effect to invoke the callback synchronously from the executor thread; await asyncio.sleep(0) flushes the call_soon_threadsafe queue before asserting on run["events"].
  • test_execute_pipeline_step_emits_progress_events: same guard for the _execute_pipeline_step path. Asserts ≥1 stage_update event in pipeline["events"] and that step carries the pipeline step index (not the AgentSuite internal stage step).
  • Issue #19 filed: migrate _execute_pipeline_step to PipelineOrchestrator to enable K1 cross-stage context accumulation (currently bypassed; each pipeline step runs as an isolated single-agent call).

Fixed

  • step key collision in pipeline progress_callback: stage_progress events emitted by BaseAgent.run() include a "step" field (intra-stage step counter). Forwarding the full dict while also passing step=step_idx to _emit_pipeline caused TypeError: got multiple values for keyword argument 'step' at runtime, silently swallowing all pipeline stage_update events. Now strips "step" from the forwarded payload; step=step_idx (pipeline step index) is authoritative.

Test metrics (v0.8.6)

  • Backend tests: 135 passing (was 127 in v0.8.5; +8 net including 2 new progress-event guards)
  • execution.py coverage: 72% (was 70% against v0.8.5 tests; +2pp, covers progress_callback closure bodies)
  • Repo-wide coverage: 67% (floor 60%)

AgentSuiteLocal v0.8.5

04 May 23:28
3a18e13

Choose a tag to compare

Sprint 2 — wire AgentSuite v1.1.0 intra-stage progress events to SSE stream.

Changed

  • AgentSuite pin bumped @v1.0.11@v1.1.0: brings in K1 cross-stage context accumulator and K2 intra-stage progress callbacks (BaseAgent.run(progress_callback=...) + PipelineOrchestrator(kernel_progress_callback=...)).
  • Real progress_callback wired in _execute_run: replaces no-op stubs. Uses loop.call_soon_threadsafe to safely push stage_update SSE events from the thread-pool executor thread to the asyncio event loop. The frontend LiveRunView.jsx already handles stage_update events.
  • Real progress_callback wired in _execute_pipeline_step: same pattern; events arrive as stage_update with an additional step field carrying the pipeline step index.

Fixed

  • Closes Issue #10 — intra-stage SSE events were blocked on AgentSuite v1.1.0 shipping PipelineOrchestrator. That work is now tagged and the no-op stubs are removed.

AgentSuiteLocal v0.8.4

04 May 21:55
f8ca1bc

Choose a tag to compare

Fixed

  • softprops/action-gh-release node24 migration: release.yml was pinned to 3bb12739 (v2, node20). Updated to b4309332 (v3.0.0, node24). This completes the Sprint 0 node24 migration — all five actions/* pins were already on node24; this was the one missed action.
  • SHA-pins comment block: updated to @v3 notation and added (node24) annotation to every entry so future audits can verify compatibility without an API call.

Note: Sprint 0 was declared complete after migrating actions/checkout, actions/setup-python, actions/setup-node, actions/upload-artifact, and actions/download-artifact. softprops/action-gh-release was listed in the same comment block but its pinned SHA was not checked for node20/24 status. The deprecation warning appeared on the v0.8.3 release run. Node.js 20 forced-default deadline: 2026-06-02. Node.js 20 hard-removal from runners: 2026-09-16.

AgentSuiteLocal v0.8.3

04 May 18:55
8ebf540

Choose a tag to compare

Added

  • tests/test_launcher.py — two tests: test_primes_enabled_agents_env and test_does_not_override_operator_env. Regression guard: removing the setdefault from launcher.main() causes these to fail immediately.
  • TestEnabledAgents class in tests/test_cli.py — same two assertions for cli.main().

Changed

  • pyproject.toml: replaced static version = "0.8.2" with dynamic = ["version"] + [tool.setuptools.dynamic] pointing to agentsuitelocal.__version__.__version__. __version__.py is now the single source of truth; pyproject.toml has no independent version string that can drift.
  • tests/test_execution.py: removed inline os.environ.setdefault(...) calls from test_execute_run_dispatches_non_founder_agent and test_execute_pipeline_step_dispatches_non_founder_agent. Replaced with _all_agents_enabled pytest fixture (uses monkeypatch). Failure signals are now clean: entry-point tests fail for entry-point regressions; execution tests fail for execution regressions.
  • README.md: bumped version header to v0.8.2, updated installer filename references, updated /api/version example, updated data-flow description (PipelineOrchestrator → BaseAgent.run()), updated Known Issues header, removed stale v0.1.2 commit-SHA bullet, added Recent releases table.

AgentSuiteLocal v0.8.2

04 May 18:03
c098d1b

Choose a tag to compare

Fixed

  • Version metadata: pyproject.toml and agentsuitelocal/__version__.py bumped to 0.8.2. v0.8.0 and v0.8.1 shipped with version 0.7.1 in package metadata — pip show and /api/version reported the wrong value. Fixed going forward; see note below.
  • CI version gate: release.yml verify-ci job now checks that the package version in __version__.py matches the git tag before any build starts. Prevents this class of drift from recurrence.

Note: v0.8.0 and v0.8.1 wheels report version 0.7.1 from pip show due to the metadata bump being missed in those releases. The git tags and release assets are unaffected. v0.8.2 fixes this and adds CI enforcement to prevent recurrence.

AgentSuiteLocal v0.8.1

04 May 17:13
1c21203

Choose a tag to compare

v0.8.1 — enable all 7 agents at launch, footgun fix