Skip to content

v2.3.0 — GAN-harness design-loop integration (E1–E5)

Choose a tag to compare

@lasswellt lasswellt released this 31 May 18:36
· 46 commits to main since this release

[2.3.0] — 2026-05-31 · GAN-harness design-loop integration (E1–E5)

Closes the five deltas between blitz's design loop and the planner/generator/evaluator harness in anthropic.com/engineering/harness-design-long-running-apps. Blitz already had the architecture (sprint-plan → ui-build/sprint-dev → design-critic/critic); these are the deltas, not a rebuild. Specs: docs/integrations/harness-design/.

Added

  • skills/_shared/design-criteria.md — single-source 5-dimension design rubric, shared by the generator (steering) and evaluator (scoring). The criteria themselves steer the model off generic defaults before any evaluator cycle.
  • E1 criteria-as-steeringui-build Phase 3.0.1.1 carries the 5 dims ("museum quality") into generation, not just into the evaluator. Tone-conditional phrasing for informal tones.
  • E2 live-navigating evaluatoragents/design-critic.md granted the Playwright navigation subset and navigates the live page before scoring (click primaries, exercise states, resize for responsive, read console). New coverage_boundary reply field; static-screenshot path retained as fallback (never silently passes interaction dims). maxTurns 15→30. browser_run_code_unsafe/browser_evaluate deliberately NOT granted (threat-model §5 posture).
  • E3 iterate + pivotui-build Phase 5.4.2 flat-3 cap replaced with ceiling = min(10, budget); refine-vs-pivot strategic decision after each evaluation (pivot space = the 13-tone menu).
  • E4 sprint-contract negotiationsprint-dev Phase 0.6: generator↔evaluator negotiate testable acceptance before code; persisted as co-owned scope.acceptance. Registered in state-handoff.md.
  • E5 capability-relative triggerui-build standard tier evaluates only on edge-of-capability signals (novel aesthetic / interaction complexity / low generator confidence / deterministic-lane hits); high always evaluates. Re-examine per model release; cites the v1.16.0/cohesion/det-20 detector re-justification precedent.

Changed

  • agents/design-critic.md — "read screenshots, not source" → "read the rendered app, not the source" (input surface expands to live DOM; the source prohibition stands).