feat(v4): cursor-driven engine refactor + UX hardening#323
Conversation
Skip archetype generation when the designer already has finished designs. The picker now opens with no archetypes — the user either uploads files (`mode: "upload"`, written to `stages/<stage>/artifacts/design-direction/uploads/` and surfaced via a new `design_direction_uploaded` action) or signals the agent to generate variants (`mode: "generate"`). Generation only happens after the explicit ask, so we don't burn tokens producing variants the user is going to throw away. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the pre-v4 phase/status FSM with a cursor that walks per-unit
frontmatter (`iterations[]` / `reviews{}` / `approvals{}`) instead of
per-stage `state.json`. Adds 9 numbered priorities (P1–P9) on top of
the v4 base, each scoped to a single class of bug or UX pain surfaced
in real session logs.
Engine
- v0→v4 soft-scrub migrator (BFS-graph migrate-registry). Strips
deprecated FM, stamps `plugin_version`, synthesizes terminal
approvals for v3-completed units, relocates v3 FBs whose
`upstream_stage` pointed elsewhere, preserves `replies[]` (caught
silent data-loss bug), tolerates malformed YAML per-file.
- New `field-hygiene.ts` cruft detector for post-migration audit.
- Cursor walks 26 lifecycle scenarios (was 13). Adds:
• design_direction_required hard gate (studio/stage-conditional)
• clarify_required gate at elaborate entry, sourced from per-stage
`clarify/*.md`
• discovery_required (finished the cursor.ts:419 stub via
readStageArtifactDefs + per-unit `fm.discovery.<agent>` records)
• classifier-first dispatch on unclassified user feedback
• merge_stage transition, cross-stage FB priority, mid-wave noop,
closed-FB invalidation re-route, reject_hat re-entry
- New `haiku_feedback_set_targets` MCP tool. Classifier hat lands the
user-FB triage decision (target_unit, target_invalidates,
reasoning) once and only once — immutable post-classification.
- Reply-on-closure: terminal `haiku_feedback_advance_hat` requires a
`reply` string; surfaces in SPA as a "Resolved" card with dismiss.
Filter chip for unread replies in the FB summary bar.
- Resolution dropdown removed from SPA. Agent classifies, not user.
- File-backed dispatch is now the DEFAULT for every haiku_run_next
tick (P1). Skip set is just noop/sealed/error. Cuts the 47K-char
tool-result blobs that ate context.
- Self-contained dispatch prompts (P2): `start_feedback_hat` emits
one per-FB subagent block with the canonical FB ID inlined into
every tool call. No more `<FB-NN>` placeholders. Kills the 6-retry
ID-guessing loop pattern.
- Lock-aware branch enforcement (P9): `ensureOnStageBranch` refuses
to checkout under a locked worktree. Closes the hijack class that
caused the 2026-05-06 incident where this very worktree got
switched to a different intent's branch.
- Edit-auto-read-hint hook (P6): PostToolUse hook on Edit/MultiEdit
detects "file not read yet" and surfaces an actionable Read+retry
message with the file path inlined.
Studios
- Classifier hat distributed to all 23 non-software studios
(114 stages, 114 fix_hats updated). software/* stages already had
it. fix_hats sequence: [classifier, <implementer>, feedback-assessor].
- design-stage opted in to `requires_design_direction: true`.
- Discovery template uniqueness validation (P8): studio-loader
refuses to load when two discovery templates within a stage share
the same `location:` field.
SPA
- Dual-pathing for v3↔v4 status / phase / completion across 9 files.
- New helpers: `isIntentTerminal` (sealed_at + v3 fallback),
`deriveUnitStatus` (iterations[]-based with v3 status fallback),
`resolveWalkthroughForDetail` (tab-scoped fallback for off-tab
browsing — fixes Chris Downard's advance/back UX bug).
- Schema indicator chip (v4 / v3) on IntentReview.
- Migrated breadcrumb banner when intent or unit approvals carry
`migrated: true`.
- await_gate UX: 30min → 4h timeout, isError: false on timeout
(continuation, not fault), per-session `announced_at` so retries
don't re-post the URL.
Website (browse)
- Dual-paths for v4 in `parseIntentFromRaw`, all three providers
(local, github, gitlab), search index inherits derivation via
`intent.status`.
- New helpers: `deriveStageStatusFromUnits`, `deriveV4ActiveStage`.
- Schema indicator chip on IntentDetailView header.
- 20 new derivation/integration tests via `npx tsx`.
Tests
- Engine: 1244/1244 (was ~1130 at session start). 73 → 74 test files.
- SPA: 453/453 (was 441).
- New test files (this session): cursor-walk (26 scenarios),
v0-to-v4-realistic-scenario (11), v0-to-v4-migrator (4),
feedback-set-targets (7), haiku-reflect-dual-path (8),
field-hygiene (4), skill-tools-smoke (14), announcement-contract (7),
await-gate-timeout-classifier (9), await-gate-announce-dedup (4),
worktree-lock-guard (4), closure-reply-filter (6), and SPA-side
walkthrough/review-helpers/UnitsTable derivation tests.
- run-all.mjs: silent-test-loss guard + node:test output parsing
(caught a hidden-coverage bug where 12 tests were invisible).
Skill audit
- Refresh + dual-path on `haiku_dashboard`, `haiku_capacity`,
`haiku_reflect`. Smoke tests for previously-untested
`haiku_intent_archive/unarchive`, `haiku_backlog`, `haiku_seed`,
`haiku_release_notes`, `haiku_review`, `haiku_version_info`.
- 4 skill docs updated (refine, revisit, adopt, repair) for v4 wording.
- repair: documented split-purpose under v4 (drift-baseline rebuilds
+ worktree relocation are still relevant; metadata-cleanup paths
are obsolete and no-op on v4 intents).
Known gaps (deferred):
- No real-world `.haiku/intents/` fixture run-through. All migrator
+ cursor fixtures are synthesized.
- No end-to-end "create → run → seal" pipeline test.
- Drift sweep cursor-emit path not scenario-tested (the sweep itself
has unit tests).
- intent_completion_review / intent_completion_fix cursor scenarios
are stubbed.
- P5 channels / Monitor pattern researched, not built. Channels
(https://code.claude.com/docs/en/channels.md) is the recommended
path for SPA-decision push when we get to it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 15m 3s —— View job PR Review: feat(v4): cursor-driven engine refactor + UX hardening
This is a large, architecturally significant PR. The cursor model is cleaner than the handler FSM it replaces. Several issues found range from a guaranteed post-ship breakage to data-loss-class migration bugs. Listed in priority order. CRITICAL1. Version bump after ship permanently blocks all v4 intentsFile: if (sourceVersion !== target) { // exact string comparison
migrateIntent(ctx, sourceVersion, target) // throws if no edgeThe registry ( Fix: Change line 71 to compare majors only, not full semver: const sourceMajor = Number(sourceVersion.split(".")[0] ?? "0")
if (sourceMajor !== targetMajor) {2.
|
| # | Severity | File | Description |
|---|---|---|---|
| 1 | Critical | run-tick.ts:71 |
Exact semver comparison → all v4 intents break on next CI bump |
| 2 | Critical | v0-to-v4.ts, cursor.ts:287 |
"rejected" not normalized → migrated units stuck as in-progress |
| 3 | Critical | state-tools.ts:62 |
withIntentMainLock imported but never called — stage merges unserialised |
| 4 | High | cursor.ts:425, prompts/index.ts |
close_feedback has no prompt builder → "Unknown Action" dump |
| 5 | High | locks.ts:61 |
isAlive catches EPERM as dead → premature lock-stealing on Windows |
| 6 | High | cursor.ts:737–759 |
Stage[0] fallback when all merged blocks intent-level approvals |
| 7 | Medium | cursor.ts:363–365 |
Unknown stage → walks entire stage list for FBs |
| 8 | Medium | drift-sweep.ts |
Repo-relative output paths resolved against wrong base |
| 9 | Medium | cursor.ts:438–441 |
Non-numeric FB filenames produce unresolvable dispatch IDs |
| 10 | Minor | run-tick.ts:87 |
Migration error returns misleading track: "drift" |
| 11 | Minor | v0-to-v4.ts:243 |
No-op started_at double-write |
Issues 1, 2, and 3 should be resolved before merging. Issue 1 in particular is a ticking clock — it activates on the first CI bump after this lands.
Checkpoint of in-flight v4 engine work before merging the design-direction intake/upload PR (#322). What's in this checkpoint: - Cursor-driven engine refactor: derivePosition + Track A/B/C walks, firstUnmergedStage with strict-ahead semantic, body-sha256 witness model replacing baseline.json, drift-sweep dedup by source_ref. - Filesystem-mode persistence first-class: stages_merged in intent.md as canonical merge signal when no git repo. - All MCP elicitation removed; studio/mode/stage selection routes through SPA picker (createPickerSession + runPicker helper). - Tick-driven select_*: run-tick.ts emits select_studio / select_mode / select_stage when fields missing; haiku_run_next intercepts and runs the picker inline so the agent sees a blocking tick, never "Call haiku_select_*" instructions. - close_feedback / merge_stage / merge_intent auto-execute in haiku_run_next instead of bouncing back to the agent. - E2E suites: software studio, multi-mode, feedback mid-flight, drift mid-flight, filesystem mode, squash-merge fallback, picker wire round-trip, SPA wire round-trip. Next up (not yet done): - Consolidate URL+await pairs (gate_review, visual question, design direction) into single blocking tool calls so the agent never sees "post URL + call await_*" two-step instructions. - Pull in PR #322 (intake-first design picker + upload mode) and align the new wire shapes with the engine-side blocking pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls in the intake-first design picker (PR #322) and aligns the new wire shapes with the v4 cursor-driven engine. What carried over verbatim: - haiku-api schemas: DirectionUploadFile + DirectionUploadModeSchema + DirectionGenerateModeSchema discriminator branches. - haiku-ui DirectionPage split into IntakePage + ArchetypePage with drag/drop, per-file caption, generate fallback. - HTTP /direction/:id/select route handles upload mode via persistDesignDirectionUploads (decodes data URLs into stages/<stage>/artifacts/design-direction/uploads/). - Studio docs (ARCHITECTURE.md, software/design ELABORATION.md) + ModalRouter blurb describe intake-first. - New prompts: design_direction_required (rewritten for intake-first) + design_direction_uploaded. What had to be ported (PR #322 modified handlers/elaborate.ts; v4 deleted that file because the cursor itself now emits dispatch actions): - cursor.ts: design-direction gate is now two-phase. After selection, the cursor's surface-once branch emits ONE `design_direction_complete` (archetype mode) or `design_direction_uploaded` (upload mode) action so the agent can Read screenshots / uploaded files before elaborate. Surfaced state tracked by `surfaced_at` on the same intent.md record. - haiku_run_next.ts: stamps `surfaced_at` on intent.md right before returning the surface-once action so the next tick falls through to elaborate (mirrors the PR's design_direction_surfaced flag, but on intent.md FM instead of stage state.json). - state-tools.ts persistDesignDirectionUploads: also stamps `design_directions[<stage>] = { mode: "upload", uploads, … }` on intent.md so the v4 cursor's gate check sees the selection (state.json is no longer authoritative in v4). - state-tools.ts persistDesignDirectionSelection (archetype mode): the intent.md stamp now carries `mode: "archetype"`, comments, annotations, and `at` so the cursor can render the full payload on surface-once. Tests: - cursor-walk.test.mjs: existing fixtures stamping design_directions[stage] now also stamp `surfaced_at` so the walk tests assert on downstream gates (clarify, discovery) instead of the new surface-once branch. - New cursor-walk cases: archetype-mode without surfaced_at → design_direction_complete; upload-mode without surfaced_at → design_direction_uploaded. - haiku-api test suite: 182 passed. - haiku-ui test suite: 456 passed. - haiku focused tests (sad-paths + cursor-walk + multi-tick + picker-wire-round-trip + select-mode-stage-constraints + select-studio-prompt): 50 passed. Still open (queued for the URL+await consolidation pass): - pick_design_direction → haiku_await_design_direction is still a two-step (URL + await) pair. Will collapse into a single blocking call alongside gate_review and ask_user_visual_question. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The agent used to see "post URL + call await tool" pairs for every interactive surface — gate review, visual question, design direction. Each pair was three turns minimum: tool 1 returns URL, agent posts it, agent calls tool 2 which blocks. Half the time the agent forgot step 2 and the user clicked Approve into the void. Now every interactive surface is a single blocking tool call. The engine creates the session, launches the browser best-effort, and blocks until the user submits. The await tools stay as resume entry points for the case where the original blocking call timed out. ## ask_user_visual_question + haiku_await_visual_answer - ask_user_visual_question now creates the session, launches the browser, and blocks on awaitVisualAnswerSession inline. Returns the answer + screenshot annotations directly. - haiku_await_visual_answer becomes a thin resume wrapper around the same helper. - Extracted awaitVisualAnswerSession() to share the wait + response builder between the two paths. ## pick_design_direction + haiku_await_design_direction Same shape as visual: pick_design_direction inlines the await; the old await tool is the resume entry. Extracted awaitDesignDirectionSession() that handles all four submission modes (select / regenerate / generate / upload). ## gate_review + haiku_await_gate haiku_run_next's gate_review block used to return the URL and ask the agent to call haiku_await_gate. Now it: 1. Prepares the session (creates it, writes the resume pointers to intent.md, stamps announced_at on first prepare). 2. Calls haiku_await_gate inline — same code path as the resume entry, so post-decision side effects (stampGateApproval, workflowAdvancePhase/Stage, writeReviewFeedbackFiles, sealIntentState, etc.) all go through the canonical handler. 3. Inspects the returned action name. For advance cases (advance_phase / advance_stage / intent_approved) it re-ticks so the cursor surfaces the natural-next workflow action. For terminal/feedback cases (intent_complete, external_review_requested, changes_requested, revise_unit_specs, stage_revisit) it returns the await response directly. The gate_review handling is now a `while` loop, so a chain of gates (e.g. final stage approval → intent_review pre-completion gate) can all process in the same blocking tick — the agent sees one tool call that returns the post-everything action. Added extractActionFromAwaitResponse() helper that parses the action name out of haiku_await_gate's rendered response. ## Tests - Added design_direction_complete / design_direction_uploaded handlers to the e2e test scaffolding (e2e-software-studio + real-intent-dry-run). The scaffolding mirrors haiku_run_next's surface-once stamp by writing surfaced_at on the design_directions record. - All 1340 haiku tests, 182 haiku-api tests, and 456 haiku-ui tests pass (1978 total). Type-check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picker UI - New /picker/:sessionId route in haiku-ui rendering studio (card grid + stage-chain preview), mode (cards with mini-timeline showing where the engine pauses), stage (simple list), confirm (two-button decision), and url_input (free-text input for the new external-review URL collection path). - haiku-api carries PickerSessionPayloadSchema in the SessionPayload discriminated union plus PickerSelectRequest/Response schemas and /picker/:id GET + /picker/:id/select POST routes. - session-api projects picker fields (kind / title / prompt / options / selection) into the wire response. ApiClient gained submitPicker(). External review URL — engine direction - haiku_run_next's external_review_requested response now tells the agent to open the change request via gh / glab directly (the agent has CLI access; no need to ask the user). Fallback: ask the user in chat if gh/glab isn't reachable. The url_input picker stays available as a future fallback path. - FeedbackSidebar gained gateAcceptsLocalApprove() — pure-external gates no longer render Approve (the merge IS the approval signal). Compound [external, ask] still renders both Approve + External buttons so the user can pick either path. Dead code + migration cleanup - Deleted prompts/gate_review.ts and its companion test (gate-review-prompt.test.mjs). Under v4 the gate_review action never reaches the agent — haiku_run_next inlines the await + post-decision processing in one tool call. - Removed the gate_review entry from the prompt registry. - v0-to-v4 migrator now scrubs pre-v4 baseline noise: deletes baseline.json, drift-markers.json, and baseline-content/ at both intent and per-stage scope. Added a migrator test covering both scopes. Stale tool descriptions reframed - haiku_await_gate, haiku_await_visual_answer, and haiku_await_design_direction descriptions now read as resume entry points, not "Call this AFTER X" two-step instructions. Canonical flow blocks inside haiku_run_next / ask_user_visual_question / pick_design_direction. Sync surfaces refreshed (per CLAUDE.md sync rule) - ArchitectureMap orchestrator-actor + webui-actor blurbs rewritten for engine-side blocking (no more URL+await two-step language). The actor talks_to lists now mention the SPA picker and engine-side gate review explicitly. - plugin/studios/ARCHITECTURE.md action table: select_studio / select_mode / select_stage / gate_review entries describe engine-side blocking, not "Call haiku_select_*" / "Call haiku_await_gate". Tests - haiku 1336 / 1336 (added migrator baseline-cleanup case) - haiku-api 182 / 182 - haiku-ui 456 / 456 (test mocks for ApiClient gained submitPicker) - All type-checks clean - Stray packages/haiku/.haiku/intents/picker-mockups/ scaffolding removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 11m 22s —— View job PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardening
The new commits (31c6ee1 → 580e77d) close two of the eleven prior issues (Issues #4 and #6), introduce the SPA picker flow and engine-side Status of previously-flagged issues
NEW issues from the latest commitsN1 (HIGH):
|
| Issue | Location | What breaks |
|---|---|---|
| #1 | run-tick.ts:82 |
All v4 intents fail on the first CI bump after landing (4.0.0 → 4.0.1 → "no migration path" error) |
| #2 | v0-to-v4.ts + cursor.ts:312 |
Migrated units with v3-style result:"rejected" treated as in-progress forever |
| #3 | haiku_run_next.ts:531, side-effects.ts:129/333 |
Stage-into-main merges are unserialised; concurrent haiku_run_next calls can clobber each other's merge commit |
| N1 | haiku_run_next.ts:392 |
Production FB fix loops never close; close_feedback loops infinitely (test coverage only works by accident) |
| N2 | haiku_run_next.ts:302 |
Picker timeout causes infinite re-open loop; effectively hangs any session where the user doesn't pick |
Issues #1, #2, #3, N1, and N2 should all be resolved before merging.
Two real-world footguns in the discrete / external-review path that
broke pickup for User B (the "Pedro issue"):
1. Stage PRs landing on the repo default branch instead of
`haiku/<slug>/main`. Agents would run `gh pr create` without
--base, so the PR opened against `main`. Once merged, the cursor's
`firstUnmergedStage` (which checks against intent main, not the
repo default) kept the stage pinned and pickup wedged.
2. No graceful recovery when the misroute had already happened. User
B fetches, sees the stage commits on `main` but not on
`haiku/<slug>/main`, and the engine has no way to reconcile.
## Engine-side stage PR opening
`openStagePullRequest(slug, stage)` in `git-worktree.ts`:
- Pushes `haiku/<slug>/<stage>` to origin.
- Calls `gh pr create --base haiku/<slug>/main` (or
`glab mr create --target-branch haiku/<slug>/main`).
- If gh/glab CLI fails or isn't on PATH, builds a provider-specific
"open MR" URL the user can click:
GitHub → /compare/<base>...<head>?expand=1 (auto-opens create form)
GitLab → /-/merge_requests/new?merge_request[source_branch]=…&[target_branch]=… (pre-fills source + target)
- Returns a structured result with createdUrl / compareUrl / pushed /
errors so callers can surface whichever path succeeded.
Wired into `haiku_await_gate.ts`'s external_review handling: when the
user clicks External Review at the gate, the engine opens the MR
itself with the right base, persists `external_review_url` on
intent.md so downstream pickups see it, and surfaces the URL to the
agent. The agent's instruction set no longer relies on the agent
remembering to pass `--base`.
`haiku_run_next.ts` external_review_requested handling does the same
when the action arrives without an URL already set (the cursor path,
not the gate-await path).
## Misrouted-merge reconciliation
`reconcileMisroutedStageMerges(slug, stages)` in `git-worktree.ts`:
- Walks each stage branch.
- Detects when the branch's commits exist on the repo mainline (local
or `origin/<mainline>`) but not on `haiku/<slug>/main`.
- Fast-forwards intent main to mainline when safe (i.e. intent main
is itself an ancestor of mainline). Pushes the FF to origin so
other clones see the fix.
- Refuses to FF when intent main has divergent commits — surfaces a
structured error with the exact `git checkout && git merge` recipe
the operator needs to run.
Pre-cursor reconciliation step in `haiku_run_next.ts.handle()`: when
the intent has `external_review_url` set and we're in a git repo,
fetch from origin and run reconciliation BEFORE the cursor walks.
That's the "User B opens a fresh checkout, runs /haiku:pickup,
expects the engine to figure it out" path. Cleanly reconciled stages
are invisible to the agent (which is the right UX); divergent ones
surface a hard error with a fix recipe.
## Tests
Added `misrouted-stage-merge.test.mjs` covering all three paths:
- Happy path (merged into intent main → no reconciliation needed).
- Misroute + FF-safe (engine fast-forwards intent main, stage now
appears merged to the cursor).
- Misroute + divergent intent main (engine refuses to FF, surfaces a
manual-fix error message).
All 1339 haiku tests pass across 94 test files. Type-check clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 7m 1s —— View job Claude finished @jwaldrip's task in ~15m —— PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardening
The latest commit ( Status of previously-flagged issues
CRITICAL — unchanged, still blocking1. Exact semver comparison still breaks all v4 intents on CI bumpFile:
if (sourceVersion !== target) { // ← "4.0.0" !== "4.0.1" → triggers
migrateIntent(ctx, sourceVersion, target) // no path → throwsThe registry has exactly one edge: Fix: Change line 82 to compare majors: const sourceMajor = Number(sourceVersion.split(".")[0] ?? "0") || 0
if (sourceMajor !== targetMajor) {2.
|
| Issue | Severity | Status | File |
|---|---|---|---|
| #1 | Critical | OPEN | run-tick.ts:82 — semver exact compare → all v4 intents break on next CI bump |
| #2 | Critical | OPEN | v0-to-v4.ts + cursor.ts:312 — "rejected" not normalized → migrated units stuck |
| #3 | Critical | OPEN | haiku_run_next.ts:601 — withIntentMainLock never called |
| N1 | High | OPEN | haiku_run_next.ts:462 — close_feedback never finds files; production infinite loop |
| #5 | High | OPEN | locks.ts:61 — isAlive bare catch steals live locks on Windows |
| N5 | Medium | NEW | git-worktree.ts:732 — restore after checkout not in finally; wrong-branch writes on throw |
| #7 | Medium | OPEN | cursor.ts:495 — cutoff === -1 walks all stages for FBs |
| #8 | Medium | OPEN | drift-sweep.ts:224 — repo-relative outputs joined against intentDir |
| #9 | Medium | OPEN | cursor.ts:568 — raw-basename fallback in parseFbIdFromFilename |
| N3 | Medium | OPEN | clarify_required.ts:56 — writes via blocked path; field not in schema |
| #4 | High | FIXED ✓ | close_feedback prompt builder added |
| #6 | High | FIXED ✓ | Stage[0] fallback removed |
| N2 | High | FIXED ✓ | Picker timeout returns isError: true |
| #10–11 | Minor | OPEN | Cosmetic |
Issues #1, #2, #3, and N1 must be resolved before merging. #1 activates on the first CI bump after this lands. N1 affects every production fix-loop closure.
…boundary Matt (designer) tried to upload `Travel_Delivery_Example_Header_Only.html` to the knowledge uploader and got "File type not accepted." That allowlist (.md .pdf .png .jpg .jpeg .svg .txt) rejected real artifacts: designers exporting .html mockups from Sketch / Figma, researchers attaching .docx / .xlsx / .csv notes, anyone pasting a .json data bundle. The upload-side allowlist + extension blocklist on the knowledge route were defense against V-01 (stored XSS via uploaded HTML rendering in the privileged tunnel origin). That defense moved to serve time when serveFile (http/path-safety.ts) gained the inverted MIME map: anything not in SAFE_INLINE_MIME_TYPES gets `application/octet-stream` + `Content-Disposition: attachment`, so the reviewer's browser downloads unknown types instead of executing them. Once that landed, the upload-side rejection became redundant friction. What this commit does: - KnowledgeDropZone DEFAULT_ACCEPT changes from a fixed extension list to "*" — accepts any file. The size cap (10 MB) is the only real limit. The "drop files here" caption renders "any file type · max 10 MB each" in this mode. - Knowledge upload route in http/upload-routes.ts drops both the BLOCKED_EXTENSIONS check and the ALLOWED_MIMES_KNOWLEDGE allowlist. The route accepts any file; serveFile downgrades dangerous MIMEs at serve time. - Stage-output uploads keep their existing allowlist — those files are intentionally rendered inline in the SPA review UI and the inverted MIME map alone isn't enough cover. Tests: - upload-routes.test.mjs: replaced 4 "knowledge: X rejected with 415" cases with positive "X uploads successfully and lands on disk" assertions covering .html (Sketch export shape), .svg, .js (snippet reference), and octet-stream payloads. - red-team-unit-01-upload-bypass.test.mjs: R-05 was the V-01 bypass via .js + octet-stream on the knowledge route. Same conversion — upload now succeeds (200) and the test documents that V-01 closure moved to serveFile, with a comment guarding against reintroducing the upload-side guard. - 1336 / 1336 haiku tests green; UI tests 456 / 456; type-check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 4m 6s —— View job PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardening
Two new commits reviewed ( Six of eleven previously-flagged issues are still open, including all three criticals and the highest-impact high (N1). Status below. Status of all previously-flagged issues
CRITICAL — still blocking merge1. Exact semver compare still breaks all v4 intents on first CI bump
if (sourceVersion !== target) { // "4.0.0" !== "4.0.1" → triggers
migrateIntent(ctx, sourceVersion, target) // no path → throwsThe registry has one edge: Fix: Change line 82 to: const sourceMajor = Number(sourceVersion.split(".")[0] ?? "0") || 0
if (sourceMajor !== targetMajor) {2.
|
…te repair Hardens the v4 engine against failure modes surfaced by debugging real sessions (Matt's designer pickup, the overtime-ac product stage, Mike's quality-gate loop). Single bundle because the fixes touch the same set of dispatch / cursor / lifecycle paths. ENGINE - Cursor Track B walks merged stages too. Pre-fix: once every stage merged, FBs filed on prior stages were silently ignored and the pipeline sealed over them. New e2e regression pins both interpretations of "user on stage 4 leaves FB on stage 1." - Drift sweep is FM-aware. New `outputSha256` body-hashes markdown/text outputs (FM stripped) and full-file-hashes binaries. `buildOutputWitnesses` and the comparator both route through it. Backward-compat: dual-strategy comparator accepts legacy whole-file witnesses so in-flight intents don't false-positive on first sweep. - Model routing cascade wired into every dispatch site that emits an LLM subagent: start_unit_hat, start_feedback_hat, dispatch_review, dispatch_approval, discovery_required, review_fix. dispatch_quality_gates is intentionally exempt (engine tool, no LLM). Studio default_model: sonnet now flows to fix-hat dispatches instead of being silently overridden by the parent's Opus default. - Feedback hat-rejection escalates the model tier (sonnet→opus) on the FB FM, mirroring the unit-hat reject path. Next bolt picks up the bumped tier automatically. - haiku_unit_set lifecycle exemption for `quality_gates`. Mirrors `outputs`: gate definitions are check specs, not workflow state, so letting completed units repair them doesn't violate forward-only. Closes Mike's loop: agent had been telling users "edit the file outside Claude Code" because no MCP path existed. - fix_quality_gates prompt rewritten to point agents at the haiku_unit_set repair path with the exact call signature, splitting "code issue" vs "gate definition issue" failure shapes. REVIEW SPA - FeedbackItem renders iterations[] as a collapsed disclosure under the closure_reply card. Reviewers see the chain of hats that fired, each with result + reason + truncated commit, instead of just a bare "closed" status. - session-api scrubs body_sha256 + witnesses[] from the FM projection before the SPA sees it. The cursor needs them on disk for drift; reviewers don't need to see "scary sha artifacts" (Matt's session). - Click-to-flash boundary documented + tested. deriveExistingAnchors exported with a comment block making the persistent-vs-flash split explicit. Closed FBs stay out of the saved highlight layer (no clutter) but still flash on click. - haiku_feedback MCP tool accepts inline_anchor at create time. Adversarial-review and studio-review hats can now attach an excerpt the SPA flashes on click — same flow as user-authored anchored comments. Strict sub-schema gates the shape at the input layer. TESTS - 1352 haiku tests, 465 SPA tests, all green (was 1336 + 462). - New: cross-stage-fb-rewalk (2 e2e), feedback-model-routing (4), unit-hat-model-routing (2), scrub-witness-fields (1), FeedbackItem.iterations (4), deriveExistingAnchors (5). - Existing test files extended: drift-no-false-positives (3 backward- compat scenarios), feedback (2 inline_anchor cases), state-tools- handlers (3 lifecycle exemption + YAML round-trip). Followup tracked: #324 (inline file diff in review pane, deferred per scope discussion). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…B + units Two surfaces, same migration shape. Both keep working for in-flight intents authored with the legacy 2-digit width. feedback_id (the bigger change): - Schema: `feedback_id: integer (1..999)`. No more `FB-NN` string guessing on input. The handler returns canonical `FB-NNN` in responses. - New helpers: `formatFeedbackId(num)` and `deriveFeedbackIdFromFilename(filename)` — single source of truth for the on-disk → wire-format translation. - Lookup is numeric-prefix match against `^(\d+)-`, so legacy `08-foo.md` files resolve when the agent passes `feedback_id: 8`. New writes pad to 3 digits (`008-foo.md`). - Cross-stage `haiku_feedback_move` (panda's v3 regression) lands with a regression test pinning the cross-stage relocation contract. - All 8 FB handlers unified on `feedbackId` local var (was a mix of `feedbackId` / `fbId`). Units (smaller, narrower scope — slug stays human-readable for the depends_on graph): - `unitPath()` now resolves width-flexibly: if the requested name doesn't match an exact file, fall back to `(numeric prefix, slug)` match. Both `state-tools.ts` and `state/shared.ts` carry the identical implementation since both export `unitPath`. - `validateUnitFrontmatter` depends_on resolver uses the same width-flexible match — a fresh 3-digit unit can declare `depends_on: [unit-001-foo]` against a 2-digit on-disk sibling (and vice versa) and validate cleanly. - Engine prompts now teach `unit-NNN-slug.md` (3-digit pad, max 999) with an explicit migration note that legacy 2-digit names still resolve. Tests: - `numeric-id-migration.test.mjs` — pins FB read with 2-digit and 3-digit on-disk forms, FB write produces NNN-padded files, and unitPath resolves both directions across the width boundary. Includes a depends_on width-flexible cross-reference test. - `cross-stage-feedback-move.test.mjs` — locks in panda's v3 cross-stage move regression as fixed under v4. - `_v4-fixtures.mjs` — `makeFeedback` now accepts numeric / digit- string / FB-NN ids, derives the trailing integer, and writes 3-digit padded filenames. Permissive parser supports legacy `FB-DRIFT-NN` test ids by extracting trailing digits. - All 38 affected test files updated (perl bulk-substitution then per-file fixups for assertions on the canonical `FB-NNN` shape). Sweep: 1362 passed, 0 failed across 100 test files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…related intent dirs
The 2026-05-07 incident: the user noticed `.haiku/intents/cowork-mcp-apps-integration/`
files (`action-log.jsonl`, `.last_action.json`, `stages/development/`) showing up
as untracked in `git status` even on `main` branch. The intent dir wasn't
hardcoded anywhere, no hook or migration script was at fault — the engine
itself was targeting it.
Root cause (`haiku_run_next` auto-resolve):
1. Caller passes `intent` directly ← explicit
2. Current git branch is `haiku/<slug>/main` or
`haiku/<slug>/<stage>` ← explicit
3. Else: pick the sole active intent on disk ← BUG
Step 3 violated the user's invariant: "if I'm not on an intent's branch and
I haven't said 'intent: foo', the engine has no business writing to any
intent dir." The cowork intent had `status: active` (committed by PRs
#174/#180/#238), and it was the only such intent on disk, so every workflow
tick — including ones triggered by hooks the user didn't initiate — picked
it up via step 3 and wrote runtime journals into its dir.
The fix: in git mode, drop step 3 entirely. If the user isn't on an intent
branch and didn't pass `intent`, the engine refuses to auto-target. The
error message is explicit: "Pass `intent` explicitly, or `git switch
haiku/<slug>/main` to scope the engine to a specific intent."
Filesystem mode (no git) has no branch signal — keep the "sole active
intent on disk" fallback there since it's the only auto-resolve available.
Three companion changes ship in the same commit:
- `.gitignore` patterns for the engine's runtime journals
(`.haiku/**/action-log.jsonl`, `.last_action.json`, `write-audit.jsonl`).
Even when the engine is correctly scoped, these are WAL-style journals
that don't belong in git. Adding them to the project's gitignore protects
every plugin user from the same pollution at the source.
- `intent-scope-isolation.test.mjs` — pins the contract:
* non-intent branch + no explicit `intent` → refuse, don't name the
stranger intent in the error
* intent branch → auto-resolve from branch
* explicit `intent` arg → flow through regardless of branch
- Remove the tracked `.haiku/intents/cowork-mcp-apps-integration/` files.
These are zombies from PRs #174/#180/#238 that should never have been
checked in. The engine fix prevents the dir from getting recreated.
Sweep: 1365 passed, 0 failed across 101 test files.
Note: the bundled `plugin/bin/haiku.mjs` is from v3 (last touched at
`ffd22c3f3`) and still writes `.last_action.json` — that's a stale-bundle
issue, not an engine bug. Will resolve on the next plugin release when CI
rebuilds the bundle from current source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 8m 57s —— View job Claude finished @jwaldrip's task ——
PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardeningThree new commits reviewed since the last review:
The intent scope isolation fix ( Status of all previously-flagged issues
CRITICAL — new issue from
|
| Fix | Commit |
|---|---|
| N2 — picker timeout loops | 580e77d |
| Cross-stage FB walk blocks after all stages merged | b877e52 |
| Intent auto-resolve scope pollution | 1ccae8d |
| Drift sweep FM-aware (no false-positives on markdown FM edits) | b877e52 |
| Model routing cascade in all dispatch sites | b877e52 |
Summary
Must fix before merge:
| Issue | File | What breaks |
|---|---|---|
| N7 (NEW) | start_feedback_hat.ts:153,159,163 |
Fix loop broken — every agent attempt to advance/reject a hat gets schema-rejected because prompt emits "FB-001" string but schema requires integer |
| #1 | run-tick.ts:82 |
All v4 intents break on the first CI bump (4.0.0→4.0.1 → "no migration path" on every tick) |
| #2 | v0-to-v4.ts:247 |
Migrated units with v3-style result:"rejected" are permanently stuck as in-flight |
| #3 | haiku_run_next.ts:628 |
Stage→main merges unserialised; concurrent ticks can race and corrupt merge state |
| N1 | haiku_run_next.ts:486 |
Every production fix-loop closure loops infinitely; close_feedback never finds NNN-*.md files |
N7 is new and was not in any prior review — it was introduced by 37ec691 and means the fix loop is currently broken for all production feedback dispatch.
Three reviews flagged the same issues across the engine. Closing them all in one commit. CRITICAL: #1 (run-tick.ts:82) — Exact semver compare on plugin_version would have broken every v4 intent on the first CI bump after release (4.0.0 → 4.0.1 → migrateIntent throws "no migration path"). Compare majors only; the FM stamp marks schema generation, not the build that touched it. Same edit also fixes #10 — the migration-failure error path now returns `track: "intent"` instead of the misleading `track: "drift"` ghost event. #2 (v0-to-v4.ts) — v3 wrote `result: "rejected"` / `"advanced"`; v4's cursor only matches `"reject"` / `"advance"`. A migrated unit's last iteration with a past-tense result fell through both checks and was treated as in-flight on the current hat — the wave never progressed. Added a normalization pass in both migrateUnitFile and migrateFeedbackFile so the drift dies at the source. The v0-to-v4-realistic-scenario test that asserted the buggy preserve-as-rejected behavior now asserts the correct normalized form. Same edit removes the no-op `started_at` double-write (issue #11). #3 (haiku_run_next.ts:628 + side-effects.ts) — `withIntentMainLock` was imported in state-tools.ts and promised to the agent in merge_stage.ts / merge_intent.ts but never actually called. Stage→intent-main merges ran unserialised; two concurrent ticks (autopilot retry overlapping a manual run) could race on the merge commit. Wrapped all three call sites (haiku_run_next merge_stage loop + side-effects' pre-stage cleanup + finalize-stage paths). HIGH: #5 (locks.ts:61) — `isAlive` bare-catch returned false for every error including EPERM. On Windows / cross-user Linux, `process.kill(pid, 0)` throws EPERM (not ESRCH) when the target process is alive but unsignalable. Lock-stealing kicked in against live holders. Differentiate: ESRCH = dead, EPERM = alive-but-unpingable. N1 (haiku_run_next.ts:484) — close_feedback handler scanned the FB dir with `f.startsWith("FB-01-")` but real files are named `01-slug.md`. The match never fired → fbFile stayed empty → break → cursor re-emits close_feedback every tick → infinite loop. Replaced with the existing `findFeedbackFile` helper (single source of truth that already handles both wire-form FB-NNN and on-disk NNN-slug.md). Tests passed accidentally because fixtures used non-standard `FB-DRIFT-NN-` naming. MEDIUM: #7 (cursor.ts:495) — walkFeedbackTrack fell back to the entire stages list when currentStage wasn't found. Walking unknown future stages was the opposite of what "walk up to current" meant. Fall back to `[]` instead. #8 (drift-sweep.ts:282 + sign-slot.ts:104) — Output paths come in two shapes: intent-relative (`stages/design/foo.md`) and repo-relative (`src/components/Button.tsx`). Both were joined against intentDir, so repo-relative paths resolved to non-existent locations and silently skipped drift detection — the most important code artifacts had no witness. Distinguish by leading `stages/`: intent-relative goes to intentDir, everything else to repoRoot. Applied at both sign time and check time. #9 (cursor.ts:574) — `parseFbIdFromFilename` fallback returned the raw basename for non-numeric filenames; downstream `findFeedbackFile` rejected the unparseable ID with `feedback_not_found`, which the cursor ticked into infinite re-dispatch. Return null instead, and skip the FB in `nextActionForFeedback` when the ID can't be parsed. Also bumped the padStart to 3 (matches the 2026-05-07 numeric-ID refactor). N3 (intent.ts + clarify_required.ts:56) — The clarify gate instructed agents to write `intent.md` via `haiku_human_write` (which refuses intent.md) and the field `clarifications` wasn't in the intent schema's properties so `haiku_intent_set` rejected it too. Both paths blocked. Add `clarifications` as `Type.Record(Type.String(), Type.Unknown())` to the schema (so it's in AGENT_AUTHORABLE_INTENT_FIELDS) and rewrite the prompt to use `haiku_intent_set` with a merge note (the field overwrites, so the agent must read prior stages first). N5 (git-worktree.ts:732) — `restoreBranch` checkout-restore was inside the same try block as the operation. An unhandled throw between `git checkout intentMain` and the restore would skip the restore and leave the worktree on the wrong branch — every subsequent agent write lands on intentMain instead of the stage branch. Moved restore into a `finally` block so it always runs when restoreBranch is set. Sweep: 1365 passed, 0 failed across 101 test files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 11m 30s —— View job PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardening
The Two issues remain blocking merge: one carry-forward (N7, not in the 12) and two new CI failures introduced by the latest commit. Status of all previously-flagged issues
CRITICAL — not in the 12, still openN7.
|
| Issue | File | What breaks |
|---|---|---|
| N7 | start_feedback_hat.ts:153,159,163 |
Fix loop completely broken — agents pass "FB-001" (string) to tools that require integer feedback_id; every advance/reject/read call is schema-rejected |
| C1 | Multiple files (42 errors) | CI Biome Lint fails; merge blocked |
| C2 | scripts/check-ci-covers-drift-tests.mjs |
CI MCP Tests fails; two required test files deleted in v4 refactor, guard not updated |
All 12 issues from prior reviews are resolved. The three items above are what stands between this PR and a clean CI.
…uards Three blockers from the latest review (2026-05-08): N7 (Critical) — start_feedback_hat / review_fix / intent_completion_fix emitted `feedback_id: "FB-001"` (string) into every tool-call template. The 37ec691 numeric-id refactor flipped the schema to `Type.Integer({ minimum: 1, maximum: 999 })` with explicit description *"Just a number; no FB- prefix, no zero-padding, no string form."* — agents following the prompt verbatim hit `<tool>_input_invalid` on every advance/reject/read call, so the entire fix loop was dead. Fix: derive `fbNum` once at dispatch time (parse the digit suffix off `FB-NNN`) and embed it as a JSON integer literal — `feedback_id: 1`, no quotes. Headings keep the `FB-NNN` display form for human readability; only the JSON args change. Updated all three prompt builders that compose fix-loop subagent blocks. C1 — 42 Biome lint errors + 12 warnings introduced by the prior commit's hand-edits (import ordering, optional-chain folding, etc.). Ran `biome check --write` then `biome check --write --unsafe` to clear the unused-imports backlog. Net effect across 35 files: pure formatting / dead-code removal, no logic changes. Spot-checked the files I'd actually touched; the rest are auto-formatter passes on files that just hadn't been re-formatted since their last edit. C2 — `scripts/check-ci-covers-drift-tests.mjs` required four named test files: `drift-detection-gate.test.mjs` and `upstream-reconciliation.test.mjs` were deleted in the v4 cursor refactor (b743524) when the standalone pre-tick gates were absorbed into the cursor walk; `drift-baseline.test.mjs` and `drift-markers.test.mjs` survived. CI failed on every push since v4 landed. Replaced REQUIRED_FILES with the v4-equivalent coverage: `drift-baseline`, `drift-markers`, `drift-mid-flight-e2e`, `drift-scenarios`, and `cursor-walk` — the files that actually exercise the cursor's drift behavior, marker bookkeeping, and track-walk priorities. Updated the header docstring + failure message to match. Sweep: 1365 passed, 0 failed across 101 test files. Biome: 0 errors, 0 warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 7m 57s —— View job Claude finished @jwaldrip's task in ~15m ——
PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardeningThe Status of all previously-flagged issues
CI FAILURES — blocking mergeC3 (Medium):
|
| Fix | Verified |
|---|---|
N7 — feedback_id: ${fbNum} integer in all three prompt builders |
✓ |
| C1 — Biome lint 42 errors cleared | ✓ |
| C2 — drift-test guard references v4-equivalent test files | ✓ |
| Normalization pass in migrator (issues #2) | ✓ |
withIntentMainLock wraps merge call-sites (issue #3) |
✓ |
restoreBranch in finally block (N5) |
✓ |
clarifications in INTENT_FRONTMATTER_SCHEMA (N3) |
✓ |
findFeedbackFile in close_feedback handler (N1) |
✓ |
EPERM-aware isAlive (#5) |
✓ |
walkFeedbackTrack cutoff === -1 returns [] (#7) |
✓ |
| Repo-relative output paths against repoRoot (#8) | ✓ |
Summary
Two CI failures must be resolved before merging:
| Issue | File | What breaks |
|---|---|---|
| C3 | packages/haiku-api/src/schemas/session.ts |
61 exports > 60-export budget → Biome Lint job fails on every push |
| C4 | packages/haiku/test/squash-merge-fallback.test.mjs:245 |
glab fallback test assumes gh not in /usr/bin; fails on GitHub Actions where gh is at /usr/bin/gh |
All prior critical/high issues are resolved. The two CI failures are straightforward to fix. Once cleared, this PR is in good shape to merge.
…k, god-file budget Three CI failures from the latest claude-review automation run, all landing in `bb08b27`: silent-test parser (run-all.mjs): the aggregator regex matched `ℹ pass N` / `ℹ fail N` (node:test default reporter), but Linux runners default to TAP-style output (`# pass N` / `# fail N`). 30 of 101 test files fell through both regex sets and reported as 0/0. The silent-test-loss guard (added 2026-05-06 after the v0-to-v4 migrator tests went silent) then failed the run. Locally on macOS the `ℹ` glyph form fires, which is why this didn't reproduce. Fix: extend each regex pair to also match the TAP form. Same handler for the success-exit and crash-exit paths. squash-merge-fallback test (CI gh-on-PATH leak): the `provider fallback success: glab reports merged MR` test stubbed glab and relied on `withStubbedPath` to hide gh. The PATH stub kept `/usr/bin` for `which` itself, but GitHub Actions ships `gh` at `/usr/bin/gh` — so `detectPrTool` ran `which gh` against the real binary, returned "gh", queried the un-stubbed real CLI, got nothing, and isBranchMerged returned false. Test passed locally on macOS (no /usr/bin/gh) and failed on Linux runners. Fix: inject a stub `which` script alongside the other stubs that only resolves tools the test deliberately stubbed. Uses `[ -x ... ]` against the stub dir so `which gh` returns nothing in this test (only glab was stubbed) regardless of what /usr/bin contains. god-file budget (haiku-api/src/schemas/session.ts): the file has 61 exports, 1 over the 60-export budget. It's a single concern (SPA session/wire-payload schemas) and every shape gets a `Schema` const + inferred `Type` export — the export count doubles by design. Splitting it would force every consumer to import from three barrels for one wire shape. Fix: add an ALLOWLIST entry with a written reason. Documented the revisit trigger (a sub-protocol like design-direction growing its own object graph) so the entry isn't a permanent waiver. Sweep: 1365 passed, 0 failed across 101 test files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 4m 2s —— View job Claude finished @jwaldrip's task ——
PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardeningThe C3, C4, + TAP parser (eb9b36d) — all fixed
The TAP gap is worth noting: 30 of 101 test files were silently reporting 0/0 on Linux CI because Complete issue history — all resolved
AssessmentAll 21 issues across 6 review iterations are resolved. The fixes are thorough and well-implemented — the normalization pass covers both CI is running now for this commit. No new issues identified. This PR is ready to merge once CI passes. |
Response to Brian Suh's "Agents Need Control Flow, Not More Prompts." Frames the v4 engine refactor (PR #323) as the lived version of his argument: prompts no longer carry workflow guarantees, the surface area refuses invalid actions, the cursor decides workflow position without a model in the loop. Grounded in Mike's quality_gates session, Panda's silent cross-stage move, today's intent-scope auto-resolve fix, and the four-round review cycle that landed the bundle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 5m 2s —— View job Claude finished @jwaldrip's task ——
PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardeningOne new commit since the last review: Status of all previously-flagged issues
New commit: blog post (
|
Three docs were lagging behind the v4 cursor refactor (commit b743524): ARCHITECTURE.md (canonical structural reference): updated frontmatter spec to drop per-stage state.json, switch to NNN-padded filenames, and document the FM witness fields the cursor walks. Forward-only lifecycle table now derives status from FM (started_at + iterations[-1]) instead of a stored enum, with a new paragraph clarifying that stages aren't sealed (only intents are) and firstUnmergedStage rewinds the cursor when corrective work commits to an earlier stage. Phases section renamed "cursor-derived" with the v4 five-row table (elaborate, execute, review, approve/gate, merge). §5 "Workflow tick semantics" rewritten end-to-end: cursor model with three tracks (drift / feedback / intent), priority order, derivation function semantics, the actual cursor `kind` set, and a new §5.8 documenting the v0→v4 migrator (strips deprecated FM, normalizes past-tense iteration results "rejected"→"reject" / "advanced"→"advance", deletes per-stage state.json, synthesizes user approvals on completed units). FB lifecycle section dropped v3 revisit actions and added classifier-hat triage via haiku_feedback_set_targets. Hook boundary section gets a new §7.1 covering the MCP input gates (TypeBox + AJV with stable named error codes). haiku-method.md (paper): five modes now (continuous / discrete / discrete-hybrid / autopilot / quick) — was "two." Stage loop reframed as five conceptual phases derived by the cursor (was a "fixed six-step loop"); pre-execution adversarial review removed (v4 runs review AFTER execute, not before). Drift detection section rewritten: hashes embed in witness records on FM, no baseline.json, no human-via-mcp / human-implicit authorship classes. "Two commands" became "Commands and the Workflow Tool" — the agent's contract is haiku_run_next, end of story. Studio count nudged from "fifteen" to "two-dozen-plus" with the catalog framed as open-ended. Glossary picked up Workflow Cursor and Feedback Classification entries; Bolt redefined off iterations[] history; Spec Review reframed as the engine-built first reviewer. one-instruction-at-a-time.md (blog): response to Brian Suh's "Agents Need Control Flow, Not More Prompts." Frames v4 as the lived version of his argument — H·AI·K·U has always been workflow-first; v4 is where that lands cleanly. Foregrounds the cursor (derivePosition, nextHatForUnit), the one-instruction-at-a-time loop semantic, drift detection (sweep + content hashes), and the per-finding fix loop dispatch. Slug renamed to match new title. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 4m 44s —— View job Claude finished @jwaldrip's task ——
PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardeningOne new commit reviewed: Status of all previously-flagged issues
What landed in
|
| Check | Status |
|---|---|
| Biome Lint | ✅ pass |
| MCP Tests | ✅ pass |
Assessment
The code is clean. All 21 issues from six prior review rounds are resolved. CI is green. The 222a863 docs commit is factually accurate and well-aligned to the v4 implementation.
The one remaining gap is the tickSemantics modal in the architecture map — it's user-facing documentation that now describes a model the engine no longer uses. The project's own sync rules require it to be updated when the tick contract changes, and the v4 cursor is the largest tick-contract change in the PR's history. This can land as a follow-up or be addressed before merge, but it's worth calling out explicitly.
The interactive arch map at /studios/<slug>/architecture and the
auto-generated per-studio Mermaid diagrams were lagging behind the v4
cursor refactor. Per .claude/rules/architecture-prototype-sync.md, the
map is canonical — divergence is a bug.
Map data files (`website/app/studios/[slug]/architecture/_data/`):
- `payload-for.ts` — every TransitionKey now maps to a real cursor
action (`start_unit_hat`, `start_feedback_hat`, `dispatch_review`,
`dispatch_approval`, `dispatch_quality_gates`, `user_gate`,
`close_feedback`, `merge_stage`, `merge_intent`, `drift_detected`).
Removed v3-only `manual-change-assessment`,
`coverage-review-required`, `output-liveness-review-required`. Mode
shaping wired in (autopilot collapses reviews to `[spec]` and
approvals to `[spec, quality_gates]`). Cross-stage feedback routing
now framed as "purely by file location, no `upstream_stage:`
field." `withIntentMainLock` noted on merge actions. Visual
TransitionKey names (`hat-to-hat`, `wave-to-wave`,
`gate-to-next-stage`) retained — they describe map UI positions,
not cursor actions. The prose under each makes the v4 reality
explicit: hats advance via subagent `haiku_unit_advance_hat` calls,
waves emerge from the cursor's wave-ready predicate.
- `actors.ts` — five-actor section rewritten. User mode taxonomy
updated to continuous/discrete/discrete-hybrid/autopilot/quick;
HITL/OHOTL/AHOTL labeled deprecated. Orchestrator notes rewritten
around cursor model + `firstUnmergedStage` invariant + numeric
`feedback_id` at the wire. Web UI reframed around `user_gate`
variants.
- `hooks.ts` — 12 entries → 8. Removed nonexistent ones (`track-
outputs`, `ensure-deps`, `inject-context`, `subagent-context`,
`subagent-hook`, `quality-gate`); added `stamp-agent-write` and
`edit-auto-read-hint`. Verified against `plugin/hooks/hooks.json`
and `packages/haiku/src/hooks/`.
- `types.ts` — `ModalKind` variant `"preTickTriage"` →
`"cursorTracks"`.
Map components (`_components/`):
- `ArchitectureMap.tsx` — gate badge label updated
("pre-tick triage gate" → "cursor track walk: Track C drift →
Track B feedback → Track A intent"); spec gate description
rewritten around `dispatch_review` per role; reject branch points at
`start_feedback_hat (Track B)`; fix-loop section drops v3
`feedback_revisit` / `integrate_fix_chains` callouts; pre-intent
card orientation chain updated.
- `ModalRouter.tsx` — `tickSemantics` modal rewritten around the
three-track cursor walk and the v4 action surface (round 8 review
flag); `cursorTracks` modal (was `preTickTriage`) describes Track
C/B/A; `gateDetail` rewritten; `subagent` modal tool list updated;
`validation` modal now points at TypeBox + AJV input gate.
Auto-generated diagrams: regenerated all 24 `website/public/workflow-
diagrams/*.mmd` files via `bun run --cwd packages/haiku export:
workflow-diagrams`. Last regeneration was pre-v4
(`df4a4758c`); the .mmd files now match the v4 cursor structure.
Type check (`cd website && npx tsc --noEmit`): clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @jwaldrip's task in 4m 37s —— View job Claude finished @jwaldrip's task ——
PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardeningOne new commit since the last review: What landed in
|
| Prior round flag | Old (v3) | New (v4) |
|---|---|---|
state.json reference |
"Engine reads … stage state.json…" |
"Engine reads … derives cursor position via derivePosition" |
active_stage invariant entry |
"pre-tick consistency — repairs active_stage drift, state.json invariants" |
"Track C — drift sweep. Re-hashes signed witnesses…" |
revisited sideline action |
"…emit revisited targeting the earliest such stage" |
"Track B — feedback. Walks stages 0..active… Cross-stage routing is purely by file location" |
The new modal adds: pre-cursor selection gates section, complete v4 action surface, three-track priority walk (C → B → A), four v4 principles ("State on disk is the truth", "Composition is pure", etc.), and a canonical reference pointer to ARCHITECTURE.md §5.
actors.ts — accurate ✓
HITL/OHOTL/AHOTL terminology correctly labeled as deprecated. v4 mode taxonomy (continuous / discrete / discrete-hybrid / autopilot / quick) matches intent.ts. Orchestrator notes rewritten around firstUnmergedStage invariant and numeric feedback_id. All five-actor section verified.
hooks.ts — accurate ✓
Trimmed from 12 to 8 entries. Verified against packages/haiku/src/hooks/ — every entry maps to a real file:
redirect-plan-mode.ts, guard-workflow-fields.ts, workflow-guard.ts, prompt-guard.ts, inject-state-file.ts, context-monitor.ts, stamp-agent-write.ts, edit-auto-read-hint.ts. The four removed entries (track-outputs, ensure-deps, inject-context, subagent-context, subagent-hook, quality-gate) do not exist in the codebase.
payload-for.ts — accurate ✓
All v3-only actions (manual-change-assessment, coverage-review-required, output-liveness-review-required, feedback_dispatch, feedback_revisit, integrate_fix_chains) removed. Every mapped cursor action verified against cursor.ts: start_unit_hat, start_feedback_hat, dispatch_review, dispatch_approval, dispatch_quality_gates, user_gate, close_feedback, merge_stage, merge_intent, drift_detected — all real. withIntentMainLock noted on merge actions and verified against the actual call at haiku_run_next.ts:597. select_studio / select_mode / select_stage pre-cursor gates documented and verified against run-tick.ts:137-163.
types.ts — accurate ✓
ModalKind variant "preTickTriage" → "cursorTracks". Both ModalRouter.tsx (line 639) and ArchitectureMap.tsx (line 358) updated to use cursorTracks. No dangling references to the old name.
Workflow diagrams — accurate ✓
All 24 .mmd files regenerated via bun run --cwd packages/haiku export:workflow-diagrams. Spot-checked compliance/scope: STAGE.md declares fix_hats: [classifier, compliance-analyst, feedback-assessor]; the generated diagram shows scope_review_fix_bolt_1_classifier → scope_review_fix_bolt_1_compliance_analyst → scope_review_fix_bolt_1_feedback_assessor. ✓
Status of all previously-flagged issues
| # | Severity | Status |
|---|---|---|
| 1 | Critical | FIXED ✓ — sourceMajor !== targetMajor in run-tick.ts |
| 2 | Critical | FIXED ✓ — normalization pass in migrateUnitFile + migrateFeedbackFile |
| 3 | Critical | FIXED ✓ — withIntentMainLock wraps all three merge call-sites |
| 4 | High | FIXED ✓ (prior) |
| 5 | High | FIXED ✓ — catch(err) { return err.code === "EPERM" } |
| 6 | High | FIXED ✓ (prior) |
| 7 | Medium | FIXED ✓ — cutoff === -1 returns [] |
| 8 | Medium | FIXED ✓ — repo-relative paths joined against repoRoot |
| 9 | Medium | FIXED ✓ — parseFbIdFromFilename returns null on non-numeric filename |
| 10 | Minor | FIXED ✓ — migration error returns track: "intent" |
| 11 | Minor | FIXED ✓ — no-op started_at double-write removed |
| N1 | High | FIXED ✓ — findFeedbackFile replaces broken prefix scan |
| N2 | High | FIXED ✓ (prior) |
| N3 | Medium | FIXED ✓ — clarifications in schema; prompt uses haiku_intent_set |
| N5 | Medium | FIXED ✓ — restoreBranch in finally block |
| N7 | Critical | FIXED ✓ — fbNum integer literal in all prompt builders |
| C1 | CI | FIXED ✓ — Biome 0 errors |
| C2 | CI | FIXED ✓ — drift-test guard updated |
| C3 | CI | FIXED ✓ — god-file allowlist entry |
| C4 | CI | FIXED ✓ — stub which in glab fallback test |
| TAP | CI | FIXED ✓ — run-all.mjs handles both output formats |
| stale arch map | Medium | FIXED ✓ — tickSemantics modal, cursorTracks modal, payload-for.ts, actors.ts, hooks.ts, all 24 .mmd diagrams updated |
CI Status
| Check | Status |
|---|---|
| Biome Lint | ✅ pass |
| MCP Tests | ⏳ in progress (test suite running) |
Assessment
All 22 issues across 8 review rounds are resolved. The code is clean, the architecture map is accurate, the workflow diagrams are freshly regenerated, and the blog post cites real artifacts with verifiable line numbers. No new issues identified in d35c79d.
This PR is ready to merge once MCP Tests completes. Based on the clean run of Biome Lint and the drift-test coverage verification passing, the test suite is in good shape.
The 3.17.0 auto-bump (commit b3c09f1) was the bumper bot misclassifying PR #323 — the squash commit subject was `feat(v4): cursor-driven engine refactor + UX hardening` (no `!:` and no `BREAKING CHANGE:` footer), so the regex pass returned `minor` and the bumper short-circuited before Claude pass 2 could escalate. The cursor refactor is a major release — the migrator gate at `run-tick.ts:77` keys on `targetMajor >= 4`, so shipping under a 3.x stamp meant every v0/v3 intent's auto-migration to v4 silently no-op'd. Cleanup of the bad release: - v3.17.0 GitHub release deleted - v3.17.0 git tag deleted (local + remote) - haiku-method@3.17.0 unpublished from npm - This commit bumps to 4.0.0 explicitly; CHANGELOG rewrites the entry as a major-release announcement covering the breaking changes (per- stage state.json removed, numeric feedback_id at the wire, auto- resolve refusal in git mode, upstream_stage:/triaged_at: gone, manual_change_assessment action removed). Bumper-script fix lands in a follow-up commit so the workflow run on that push correctly bumps 4.0.0 → 4.0.1 (patch) and publishes the fixed bumper alongside the now-current bundle. Until then this commit ships v4.0.0 metadata + a fresh bundle built locally with v4.0.0 baked into MCP_VERSION. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
) The 2026-05-08 v3.17.0-was-supposed-to-be-v4.0.0 incident: PR #323's squash subject was `feat(v4): cursor-driven engine refactor + UX hardening` — `feat:` not `feat!:`, no `BREAKING CHANGE:` footer. The regex pass returned `minor` and short-circuited before Claude pass 2 could escalate. The cursor refactor + numeric-id schema flip + per- stage state.json removal shipped under a 3.x stamp, which silently disabled the v0→v4 migrator (gate keys on `targetMajor >= 4`). Two changes: 1. Pass 2 (Claude haiku) now runs whenever pass 1 returned anything below `major` — not just `patch`. The model can upgrade `minor → major` if the diff smells major (engine source rewrites, schema wire-type flips, ARCHITECTURE.md structural rewrites, migrator-gated version comparisons). 2. Pass 2's result is rank-floored against pass 1: it can upgrade (patch → minor → major) but never downgrade. Pass 1 stays the floor — an explicit `BREAKING CHANGE:` marker still beats Claude. Prompt updated with concrete major-signal hints (engine workflow file renames, schema wire-type flips, ARCHITECTURE/paper structural rewrites, `targetMajor >= N` migrator gates) so future cursor-class refactors don't slip through under a `feat:` prefix again. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Replaces the pre-v4 phase/status FSM with a cursor that walks per-unit frontmatter (
iterations[]/reviews{}/approvals{}) instead of per-stagestate.json. On top of the v4 base, this PR adds 9 numbered priorities (P1–P9) — each scoped to a single class of bug or UX pain surfaced in real session logs from the 2026-05-04 → 2026-05-06 run (588M tokens, 52h wall-clock, retry loops + branch hijacks + missing collaboration moments).What landed
Engine
plugin_version, synthesizes terminal approvals for v3-completed units, relocates v3 FBs whoseupstream_stagepointed elsewhere, preservesreplies[](caught a silent data-loss bug where the migrator's denylist includedreplieswhile v4 still reads them), tolerates malformed YAML per-file with structured warnings.field-hygiene.tscruft detector for post-migration audit.design_direction_requiredhard gate (studio/stage-conditional),clarify_requiredgate at elaborate entry sourced from per-stageclarify/*.md, finisheddiscovery_required(cursor.ts:419 was a stub) viareadStageArtifactDefs+ per-unitfm.discovery.<agent>records, classifier-first dispatch on unclassified user feedback,merge_stagetransition, cross-stage FB priority, mid-wave noop, closed-FB invalidation re-route,reject_hatre-entry.haiku_feedback_set_targetsMCP tool. Classifier hat lands user-FB triage (target_unit, target_invalidates, reasoning) once and only once — immutable post-classification.haiku_feedback_advance_hatrequires areplystring. Surfaces in SPA as a "Resolved" card with dismiss action; filter chip for unread replies.P1–P9 (this session's priorities)
haiku_run_nexttick. Skip set isnoop/sealed/error. Cuts 47K-char tool-result blobs.start_feedback_hatemits one per-FB subagent block with the canonical FB ID inlined. Kills the 6-retry ID-guessing loop.requires_design_direction: true.pick_design_directionstampsdesign_directions.<stage>on intent.md. software/design opted in.<studio>/stages/<stage>/clarify/*.md. Recordsclarifications.<stage>on intent.md. Stage-conditional. No autopilot bypass.location:fields.ensureOnStageBranchrefuses checkout under a locked worktree. Closes the hijack class — this very worktree got switched mid-session by a stray run_next from a different intent. The recovery walked throughgit stash popofWIP on haiku/v4-engine-refactor.Studios
[classifier, <implementer>, feedback-assessor].requires_design_direction: true.SPA
isIntentTerminal(sealed_at + v3 fallback),deriveUnitStatus(iterations[]-based),resolveWalkthroughForDetail(tab-scoped fallback — fixes the advance/back UX bug Chris flagged).migrated: true.isError: falseon timeout (continuation, not fault), per-sessionannounced_atso retries don't re-post the URL.Website (browse)
parseIntentFromRaw, all three providers (local, github, gitlab).deriveStageStatusFromUnits,deriveV4ActiveStage.npx tsx.Tests
run-all.mjs: silent-test-loss guard +node:testoutput parsing (caught a hidden-coverage bug where 12 tests were invisible to the aggregator).Skill audit
haiku_dashboard,haiku_capacity,haiku_reflect. Smoke tests for previously-untestedhaiku_intent_archive/unarchive,haiku_backlog,haiku_seed,haiku_release_notes,haiku_review,haiku_version_info.repair: documented split-purpose under v4.Known gaps (deferred)
.haiku/intents/fixture run-through. All migrator + cursor fixtures are synthesized.intent_completion_review/intent_completion_fixcursor scenarios stubbed.Test plan
cd packages/haiku && node test/run-all.mjsshows 1244/1244cd packages/haiku-ui && npx vitest runshows 453/453cd website && npx tsx lib/browse/__tests__/v4-derivation.test.tsshows 20/20cd packages/haiku && npx tsc --noEmitclean (pre-existing v4-base errors notwithstanding)cd packages/haiku-ui && npx tsc --noEmitclean (modulorouteTree.gencodegen).haiku/intents/from a current production repo) and verify the field-hygiene report is empty afterwardsmigrated: truehaiku_run_nextfor a different intent from inside it, confirm refusal withworktree_lockedblock🤖 Generated with Claude Code