AgentSuiteLocal v1.0.0

Latest

Latest

scottconverse released this 08 May 20:59

· 16 commits to main since this release

0509f34

Fixed

PDF export replaced WeasyPrint with reportlab — WeasyPrint required the GTK3 native runtime (libgobject, libpango, libcairo) which is not bundled and must be installed separately on Windows. PDF export silently failed with a 501 on most fresh installs. Replaced with reportlab (pure Python, no native runtime). PDF export now works out of the box on all platforms. The PyInstaller spec hiddenimports block is updated accordingly; the ~2 MB of GTK-dependent weasyprint/cairocffi/tinycss2 packages are replaced by reportlab.
approve_run state guard corrected — the guard previously accepted runs in "done" state (not in ("waiting", "done")); individual runs never reach "done" (only pipeline steps do), so the guard was wrong in principle. Changed to != "waiting" for symmetry with reject_run.
ApprovalGateView Approve button tooltip — the tooltip now shows a distinct message when the button is disabled by missing/failed QA score ("QA score unavailable — run QA evaluation or use Override & Approve") vs. below-threshold score ("Score X/10 is below your Y gate"). Previously always showed the score message, which rendered null/10 when qa_score was absent.
qa_score was silently None on every real-LLM run (V4). agentsuitelocal/api/execution.py (both call sites at L358-363 and L449-454) read the per-run qa_scores.json looking for fields named weighted_score / overall_score / score / overall — none of which are in agentsuite's QAReport schema. The canonical field is average (agentsuite/kernel/qa.py:21). Result: qa_score was always None on every successful real run, masked by the test's xfail strict=False marker until A3 removed it. Added average to the field-lookup chain at both sites (kept legacy field names as forward-compat fallbacks). Added tests/test_qa_score_schema_contract.py (4 contract tests) so the field-name agreement with agentsuite is now enforced by the suite.

Added (Sprint A — v0.9 milestone)

agentsuite repinned to v1.1.1 (V1 + V2 closed at the source).
tests/test_real_founder_run.py xfail removed (A3). The test now hard-asserts that a real founder run produces approve-able artifacts end-to-end. After the V4 fix this test is the active gate for qa_score.
tests/test_qa_score_schema_contract.py (NEW, 4 tests). Schema contract guard against the V4 regression — asserts agentsuite's QAReport.average field exists, round-trips JSON, preserves 0.0, and uses the dimensions/scores shape AgentSuiteLocal reads.
docs/MOCKING_AUDIT.md (A4). Classification-only audit of all 48 real mock call sites in tests/. 23 BOUNDARY-OK, 16 INTERNAL-JUSTIFIED, 9 INTERNAL-SUSPECT-REFACTOR (_save_state / _log_telemetry / _send_notification / _load_settings should become DI in Sprint B), 0 INTERNAL-SUSPECT-DELETE. Sprint B will action the recommendations.
One-run-per-session limitation declared (A5). README Known issues, docs/user-manual.md FAQ, and docs/FAQ.md "Running agents" all state v1.0 supports one active run at a time per session; concurrent runs land in v1.1.
a11y Bar 1, code-only (A6). Sidebar (top + bottom nav) sets aria-current="page" on the active item. ApprovalGateView override dialog now has role="dialog" + aria-modal="true" + aria-label, and a window keydown effect closes it on Escape. Vitest tests cover all of the above (Sidebar.test.jsx 4 tests, styles.test.js 2 tests, ApprovalGateView.test.jsx 2 new tests).
Bundle smoke CI on macOS + Windows (A7). build-macos job appends a step that launches the .app, polls ~/.agentsuitelocal/launcher.port.json for ≤30s, GETs /api/health, and verifies clean exit. New build-windows job mirrors this for windows-latest. Catches v0.8.7-class regressions where the bundle ships missing a hidden import. Both jobs gate on main || tags || release/*.

Changed (Sprint A — v0.9 milestone)

Removed dead RunRequest.constraints field (A1, D1). Field was unused everywhere; deleted from agentsuitelocal/api/schemas.py. Wire-compat preserved (Pydantic v2 default extra="ignore" accepts old clients sending constraints).
E2E "Run failed within 3s" assertion restored (A2). The assertion was previously commented out; A2 restores it. The mock-provider prose-vs-extract failure exposed by this change is correctly classified as evidence FOR A4 (mocking audit), not a regression — production correctly rejects non-JSON at the extract stage.

Assets 4