Skip to content

feat(v4): cursor-driven engine refactor + UX hardening#323

Merged
jwaldrip merged 17 commits into
mainfrom
haiku/v4-engine-refactor
May 8, 2026
Merged

feat(v4): cursor-driven engine refactor + UX hardening#323
jwaldrip merged 17 commits into
mainfrom
haiku/v4-engine-refactor

Conversation

@jwaldrip
Copy link
Copy Markdown
Contributor

@jwaldrip jwaldrip commented May 7, 2026

Summary

Replaces the pre-v4 phase/status FSM with a cursor that walks per-unit frontmatter (iterations[] / reviews{} / approvals{}) instead of per-stage state.json. On top of the v4 base, this PR adds 9 numbered priorities (P1–P9) — each scoped to a single class of bug or UX pain surfaced in real session logs from the 2026-05-04 → 2026-05-06 run (588M tokens, 52h wall-clock, retry loops + branch hijacks + missing collaboration moments).

What landed

Engine

  • v0→v4 soft-scrub migrator (BFS-graph migrate-registry). Strips deprecated FM, stamps plugin_version, synthesizes terminal approvals for v3-completed units, relocates v3 FBs whose upstream_stage pointed elsewhere, preserves replies[] (caught a silent data-loss bug where the migrator's denylist included replies while v4 still reads them), tolerates malformed YAML per-file with structured warnings.
  • New field-hygiene.ts cruft detector for post-migration audit.
  • Cursor walks 26 lifecycle scenarios (was 13). Adds: design_direction_required hard gate (studio/stage-conditional), clarify_required gate at elaborate entry sourced from per-stage clarify/*.md, finished discovery_required (cursor.ts:419 was a stub) via readStageArtifactDefs + per-unit fm.discovery.<agent> records, classifier-first dispatch on unclassified user feedback, merge_stage transition, cross-stage FB priority, mid-wave noop, closed-FB invalidation re-route, reject_hat re-entry.
  • New haiku_feedback_set_targets MCP tool. Classifier hat lands user-FB triage (target_unit, target_invalidates, reasoning) once and only once — immutable post-classification.
  • Reply-on-closure: terminal haiku_feedback_advance_hat requires a reply string. Surfaces in SPA as a "Resolved" card with dismiss action; filter chip for unread replies.
  • Resolution dropdown removed from SPA. Agent classifies, not user.

P1–P9 (this session's priorities)

  • P1: file-backed dispatch DEFAULT for every haiku_run_next tick. Skip set is noop/sealed/error. Cuts 47K-char tool-result blobs.
  • P2: self-contained dispatch prompts. start_feedback_hat emits one per-FB subagent block with the canonical FB ID inlined. Kills the 6-retry ID-guessing loop.
  • P3: design-direction hard gate. STAGE.md requires_design_direction: true. pick_design_direction stamps design_directions.<stage> on intent.md. software/design opted in.
  • P4: clarify gate at elaborate entry. Sources questions from <studio>/stages/<stage>/clarify/*.md. Records clarifications.<stage> on intent.md. Stage-conditional. No autopilot bypass.
  • P5: research only (channels = recommended for server-push, Monitor + file polling = lean fallback). Citations in research subagent output.
  • P6: edit-auto-read-hint hook. PostToolUse on Edit/MultiEdit detects "file not read yet" and surfaces actionable Read+retry message.
  • P7: discovery_required wiring (was stubbed in v4 base).
  • P8: discovery template uniqueness validation. Studio loader throws on colliding location: fields.
  • P9: lock-aware branch enforcement. ensureOnStageBranch refuses checkout under a locked worktree. Closes the hijack class — this very worktree got switched mid-session by a stray run_next from a different intent. The recovery walked through git stash pop of WIP on haiku/v4-engine-refactor.

Studios

  • Classifier hat distributed to all 23 non-software studios (114 stages, 114 fix_hats updated). All software/* stages already had it. Sequence: [classifier, <implementer>, feedback-assessor].
  • design-stage opted in to requires_design_direction: true.

SPA

  • Dual-path for v3↔v4 status / phase / completion across 9 files.
  • New helpers: isIntentTerminal (sealed_at + v3 fallback), deriveUnitStatus (iterations[]-based), resolveWalkthroughForDetail (tab-scoped fallback — fixes the advance/back UX bug Chris flagged).
  • Schema indicator chip (v4 / v3) on IntentReview.
  • Migrated breadcrumb banner when intent or unit approvals carry migrated: true.
  • await_gate UX: 30min → 4h timeout, isError: false on timeout (continuation, not fault), per-session announced_at so retries don't re-post the URL.

Website (browse)

  • Dual-paths in parseIntentFromRaw, all three providers (local, github, gitlab).
  • New helpers: deriveStageStatusFromUnits, deriveV4ActiveStage.
  • Schema indicator chip on IntentDetailView.
  • 20 new derivation/integration tests via npx tsx.

Tests

  • Engine: 1244/1244 (was ~1130 at session start). 73 → 74 test files.
  • SPA: 453/453 (was 441).
  • New test files: cursor-walk (26 scenarios), v0-to-v4-realistic-scenario (11), v0-to-v4-migrator (4), feedback-set-targets (7), haiku-reflect-dual-path (8), field-hygiene (4), skill-tools-smoke (14), announcement-contract (7), await-gate-timeout-classifier (9), await-gate-announce-dedup (4), worktree-lock-guard (4), closure-reply-filter (6), plus SPA-side walkthrough/review-helpers/UnitsTable derivation.
  • run-all.mjs: silent-test-loss guard + node:test output parsing (caught a hidden-coverage bug where 12 tests were invisible to the aggregator).

Skill audit

  • Dual-path on haiku_dashboard, haiku_capacity, haiku_reflect. Smoke tests for previously-untested haiku_intent_archive/unarchive, haiku_backlog, haiku_seed, haiku_release_notes, haiku_review, haiku_version_info.
  • 4 skill docs updated (refine, revisit, adopt, repair).
  • repair: documented split-purpose under v4.

Known gaps (deferred)

  • No real-world .haiku/intents/ fixture run-through. All migrator + cursor fixtures are synthesized.
  • No end-to-end "create → run → seal" pipeline test.
  • Drift sweep cursor-emit path not scenario-tested.
  • intent_completion_review / intent_completion_fix cursor scenarios stubbed.
  • P5 channels / Monitor pattern researched, not built.

Test plan

  • cd packages/haiku && node test/run-all.mjs shows 1244/1244
  • cd packages/haiku-ui && npx vitest run shows 453/453
  • cd website && npx tsx lib/browse/__tests__/v4-derivation.test.ts shows 20/20
  • Type-check: cd packages/haiku && npx tsc --noEmit clean (pre-existing v4-base errors notwithstanding)
  • Type-check: cd packages/haiku-ui && npx tsc --noEmit clean (modulo routeTree.gen codegen)
  • Run a real v3 intent through the migrator (point at an existing .haiku/intents/ from a current production repo) and verify the field-hygiene report is empty afterwards
  • Open the SPA against a v4 intent — verify schema chip shows v4, no v3-only fields render
  • Open the SPA against a migrated v3 intent — verify migrated-from-v3 banner renders if any approval carries migrated: true
  • Verify worktree lock guard: lock a worktree, try to haiku_run_next for a different intent from inside it, confirm refusal with worktree_locked block

🤖 Generated with Claude Code

jwaldrip and others added 2 commits May 6, 2026 15:38
Skip archetype generation when the designer already has finished
designs. The picker now opens with no archetypes — the user either
uploads files (`mode: "upload"`, written to
`stages/<stage>/artifacts/design-direction/uploads/` and surfaced via
a new `design_direction_uploaded` action) or signals the agent to
generate variants (`mode: "generate"`). Generation only happens after
the explicit ask, so we don't burn tokens producing variants the user
is going to throw away.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the pre-v4 phase/status FSM with a cursor that walks per-unit
frontmatter (`iterations[]` / `reviews{}` / `approvals{}`) instead of
per-stage `state.json`. Adds 9 numbered priorities (P1–P9) on top of
the v4 base, each scoped to a single class of bug or UX pain surfaced
in real session logs.

Engine

- v0→v4 soft-scrub migrator (BFS-graph migrate-registry). Strips
  deprecated FM, stamps `plugin_version`, synthesizes terminal
  approvals for v3-completed units, relocates v3 FBs whose
  `upstream_stage` pointed elsewhere, preserves `replies[]` (caught
  silent data-loss bug), tolerates malformed YAML per-file.
- New `field-hygiene.ts` cruft detector for post-migration audit.
- Cursor walks 26 lifecycle scenarios (was 13). Adds:
  • design_direction_required hard gate (studio/stage-conditional)
  • clarify_required gate at elaborate entry, sourced from per-stage
    `clarify/*.md`
  • discovery_required (finished the cursor.ts:419 stub via
    readStageArtifactDefs + per-unit `fm.discovery.<agent>` records)
  • classifier-first dispatch on unclassified user feedback
  • merge_stage transition, cross-stage FB priority, mid-wave noop,
    closed-FB invalidation re-route, reject_hat re-entry
- New `haiku_feedback_set_targets` MCP tool. Classifier hat lands the
  user-FB triage decision (target_unit, target_invalidates,
  reasoning) once and only once — immutable post-classification.
- Reply-on-closure: terminal `haiku_feedback_advance_hat` requires a
  `reply` string; surfaces in SPA as a "Resolved" card with dismiss.
  Filter chip for unread replies in the FB summary bar.
- Resolution dropdown removed from SPA. Agent classifies, not user.
- File-backed dispatch is now the DEFAULT for every haiku_run_next
  tick (P1). Skip set is just noop/sealed/error. Cuts the 47K-char
  tool-result blobs that ate context.
- Self-contained dispatch prompts (P2): `start_feedback_hat` emits
  one per-FB subagent block with the canonical FB ID inlined into
  every tool call. No more `<FB-NN>` placeholders. Kills the 6-retry
  ID-guessing loop pattern.
- Lock-aware branch enforcement (P9): `ensureOnStageBranch` refuses
  to checkout under a locked worktree. Closes the hijack class that
  caused the 2026-05-06 incident where this very worktree got
  switched to a different intent's branch.
- Edit-auto-read-hint hook (P6): PostToolUse hook on Edit/MultiEdit
  detects "file not read yet" and surfaces an actionable Read+retry
  message with the file path inlined.

Studios

- Classifier hat distributed to all 23 non-software studios
  (114 stages, 114 fix_hats updated). software/* stages already had
  it. fix_hats sequence: [classifier, <implementer>, feedback-assessor].
- design-stage opted in to `requires_design_direction: true`.
- Discovery template uniqueness validation (P8): studio-loader
  refuses to load when two discovery templates within a stage share
  the same `location:` field.

SPA

- Dual-pathing for v3↔v4 status / phase / completion across 9 files.
- New helpers: `isIntentTerminal` (sealed_at + v3 fallback),
  `deriveUnitStatus` (iterations[]-based with v3 status fallback),
  `resolveWalkthroughForDetail` (tab-scoped fallback for off-tab
  browsing — fixes Chris Downard's advance/back UX bug).
- Schema indicator chip (v4 / v3) on IntentReview.
- Migrated breadcrumb banner when intent or unit approvals carry
  `migrated: true`.
- await_gate UX: 30min → 4h timeout, isError: false on timeout
  (continuation, not fault), per-session `announced_at` so retries
  don't re-post the URL.

Website (browse)

- Dual-paths for v4 in `parseIntentFromRaw`, all three providers
  (local, github, gitlab), search index inherits derivation via
  `intent.status`.
- New helpers: `deriveStageStatusFromUnits`, `deriveV4ActiveStage`.
- Schema indicator chip on IntentDetailView header.
- 20 new derivation/integration tests via `npx tsx`.

Tests

- Engine: 1244/1244 (was ~1130 at session start). 73 → 74 test files.
- SPA: 453/453 (was 441).
- New test files (this session): cursor-walk (26 scenarios),
  v0-to-v4-realistic-scenario (11), v0-to-v4-migrator (4),
  feedback-set-targets (7), haiku-reflect-dual-path (8),
  field-hygiene (4), skill-tools-smoke (14), announcement-contract (7),
  await-gate-timeout-classifier (9), await-gate-announce-dedup (4),
  worktree-lock-guard (4), closure-reply-filter (6), and SPA-side
  walkthrough/review-helpers/UnitsTable derivation tests.
- run-all.mjs: silent-test-loss guard + node:test output parsing
  (caught a hidden-coverage bug where 12 tests were invisible).

Skill audit

- Refresh + dual-path on `haiku_dashboard`, `haiku_capacity`,
  `haiku_reflect`. Smoke tests for previously-untested
  `haiku_intent_archive/unarchive`, `haiku_backlog`, `haiku_seed`,
  `haiku_release_notes`, `haiku_review`, `haiku_version_info`.
- 4 skill docs updated (refine, revisit, adopt, repair) for v4 wording.
- repair: documented split-purpose under v4 (drift-baseline rebuilds
  + worktree relocation are still relevant; metadata-cleanup paths
  are obsolete and no-op on v4 intents).

Known gaps (deferred):

- No real-world `.haiku/intents/` fixture run-through. All migrator
  + cursor fixtures are synthesized.
- No end-to-end "create → run → seal" pipeline test.
- Drift sweep cursor-emit path not scenario-tested (the sweep itself
  has unit tests).
- intent_completion_review / intent_completion_fix cursor scenarios
  are stubbed.
- P5 channels / Monitor pattern researched, not built. Channels
  (https://code.claude.com/docs/en/channels.md) is the recommended
  path for SPA-decision push when we get to it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

Claude finished @jwaldrip's task in 15m 3s —— View job


PR Review: feat(v4): cursor-driven engine refactor + UX hardening

  • Gather PR context and metadata
  • Review core engine changes (cursor.ts, run-tick.ts)
  • Review migration logic (v0-to-v4.ts, migrate-registry.ts)
  • Review new MCP tools and state-tools changes
  • Review SPA changes
  • Review security / correctness issues
  • Post final review summary

This is a large, architecturally significant PR. The cursor model is cleaner than the handler FSM it replaces. Several issues found range from a guaranteed post-ship breakage to data-loss-class migration bugs. Listed in priority order.


CRITICAL

1. Version bump after ship permanently blocks all v4 intents

File: packages/haiku/src/orchestrator/workflow/run-tick.ts:71

if (sourceVersion !== target) {   // exact string comparison
    migrateIntent(ctx, sourceVersion, target)   // throws if no edge

The registry (migrate-registry.ts) has exactly one edge: "0" → "4.0.0". CI auto-bumps the plugin version on every merge to main. The first merge after this PR ships (say 4.0.1) causes every intent with plugin_version: "4.0.0" to hit migrateIntent("4.0.0", "4.0.1"). findChain finds no edge → throws "no migration path from 4.0.0 to 4.0.1" → every tick on every v4 intent returns action: "error".

Fix: Change line 71 to compare majors only, not full semver:

const sourceMajor = Number(sourceVersion.split(".")[0] ?? "0")
if (sourceMajor !== targetMajor) {

Fix this →


2. result: "rejected" (v3 form) not normalized — migrated units re-dispatch forever

File: packages/haiku/src/orchestrator/migrations/v0-to-v4.ts, packages/haiku/src/orchestrator/workflow/cursor.ts:287

The realistic scenario test at test/v0-to-v4-realistic-scenario.test.mjs:176 shows v3 units writing result: "rejected" (past tense). The v4 cursor at cursor.ts:287 checks last.result === "reject" (present tense). The migrator preserves iterations[] as-is with no normalization pass.

A migrated unit whose last iteration has result: "rejected" falls through both "advance" and "reject" checks and hits return null (line 300) — treated as in-progress on the current hat. walkStageUnitsTrack returns null for this unit at step 3 (in-flight), preventing the wave from progressing. Every tick for this stage returns a noop. The unit is never re-dispatched but also never completed.

Fix (two options):

  1. In migrateUnitFile, add a normalization pass over iterations[]: if (iter.result === "rejected") iter.result = "reject".
  2. Or widen cursor.ts:287 to last.result === "reject" || last.result === "rejected" as a compat shim.

Option 1 is cleaner because it eliminates the drift at the source.

Fix this →


3. withIntentMainLock is imported and documented but never called — stage→main merges are unserialised

File: packages/haiku/src/state-tools.ts:62, packages/haiku/src/orchestrator/prompts/merge_stage.ts:28

$ grep -rn "withIntentMainLock" packages/haiku/src/
packages/haiku/src/state-tools.ts:62:          import { withIntentMainLock, ... }
packages/haiku/src/locks.ts:202:          export function withIntentMainLock<T>(...)
packages/haiku/src/orchestrator/prompts/merge_stage.ts:28:  "... the engine will perform the stage→main merge under `withIntentMainLock`..."

withIntentMainLock is imported in state-tools.ts, defined in locks.ts, and promised to the agent in merge_stage.ts:28 — but it is never called anywhere. The stage-branch-into-main merge runs without serialisation. Two parallel merge_stage ticks (e.g. from concurrent autopilot sessions or a retry during a slow merge) can both call mergeStageBranchIntoMain simultaneously, producing a merge in progress git error or silently clobbering each other's work.

Fix: Wrap mergeStageBranchIntoMain call-sites (wherever that function is invoked in run-tick.ts or the merge handler) in withIntentMainLock(slug, () => mergeStageBranchIntoMain(...)).


HIGH

4. close_feedback has no prompt builder — agent receives "Unknown Action" JSON dump

File: packages/haiku/src/orchestrator/workflow/cursor.ts:425, packages/haiku/src/orchestrator/prompts/index.ts:61–108

The cursor emits { kind: "close_feedback" } when a terminal fix-hat advances (cursor.ts:425). cursorActionToOrchestratorAction maps this to action: "close_feedback". The actionPromptBuilders map has no entry for "close_feedback". buildRunInstructions (orchestrator.ts:255–258) falls through to:

## Unknown Action: close_feedback
{ "kind": "close_feedback", "stage": "...", "feedback_id": "..." }

The agent has no explicit instruction on which tool to call. The cursor re-emits close_feedback on every tick until closed_at is stamped. The agent must self-recover from raw JSON — possible but fragile. Under retry pressure (P2 sessions), this becomes a multi-turn stall.

Fix: Register a "close_feedback" entry in actionPromptBuilders that instructs the agent to call haiku_feedback_update { intent, feedback_id, status: "closed" }.


5. isAlive(pid) catches EPERM as dead — premature lock-stealing on Windows and cross-user Linux

File: packages/haiku/src/locks.ts:57–64

function isAlive(pid: number): boolean {
    try {
        process.kill(pid, 0)
        return true
    } catch {          // ← catches ALL errors, including EPERM
        return false
    }
}

On Windows, process.kill(pid, 0) throws EPERM (not ESRCH) when the target process is alive but belongs to a different user account. The catch-all treats this as "dead" → isStale returns true → steal() deletes the lock while the holder is still running. Two different user sessions sharing a machine (common in CI environments with multiple agents) will steal each other's stage-merge locks.

Fix: Differentiate errors: ESRCH means dead, EPERM means alive-but-unpingable:

} catch (err) {
    return (err as NodeJS.ErrnoException).code === "EPERM"
}

Fix this →


6. Stage[0] fallback when all stages merged runs drift/feedback tracks on completed work

File: packages/haiku/src/orchestrator/workflow/cursor.ts:737–759

const activeStage = firstUnmergedStage(slug, studio)
const stage = activeStage ?? resolveStudioStages(studio)[0] ?? ""

if (stage) {
    const drift = runDriftSweep({ intentDir, stage, studio })  // runs on stage[0]
    ...
}
if (stage) {
    const fbAction = walkFeedbackTrack({ intentDir, studio, currentStage: stage })  // walks 0..0
    ...
}
// "All stages merged" path is never reached if an open FB exists on stage[0]

When firstUnmergedStage returns null (all stages merged), stage falls back to the first stage. Track C runs a drift sweep on stage[0]'s outputs. Track B walks feedback from stages 0..0. Any lingering open FB on stage[0] (e.g. from a user-chat FB that was filed but not triaged) preempts the intent-level approvals track forever — the cursor never reaches line 774 (// All stages merged → intent-level approvals).

Fix: When activeStage === null, skip Tracks C and B entirely. The intent is past all stage work; only the intent-level approval walk should run.


MEDIUM

7. walkFeedbackTrack slices all stages when currentStage is unknown

File: packages/haiku/src/orchestrator/workflow/cursor.ts:363–365

When currentStage is not found in the studio's configured stage list (cutoff === -1), the slice falls back to stages.slice() — the entire stage list. The comment says it covers "stages 0..currentStage" but the fallback covers stages 0..N. This surfaces FBs from future stages (not yet started) if currentStage is an unknown value (e.g., after a studio reconfiguration with a renamed stage).

Fix: When cutoff === -1, return [] (walk no stages) rather than walking all stages.


8. Drift sweep resolves all output paths relative to intentDir — repo-relative outputs never checked

File: packages/haiku/src/orchestrator/workflow/drift-sweep.ts:205 (approx)

Output paths in unit.outputs[] come in two forms: intent-relative (stages/design/foo.md) and repo-relative (src/components/Button.tsx). The drift sweep does join(intentDir, out) for all paths. For repo-relative paths this resolves to <intentDir>/src/components/Button.tsx — a path that doesn't exist. The file is not found, no drift event is emitted, and the drift check silently passes for the most important artifacts (actual code files).

Fix: Distinguish intent-relative from repo-relative paths before resolving. Repo-relative paths (those that don't start with stages/) should be joined against repoRoot, not intentDir.


9. parseFbIdFromFilename fallback returns non-parseable IDs — causes infinite start_feedback_hat dispatch

File: packages/haiku/src/orchestrator/workflow/cursor.ts:438–441

function parseFbIdFromFilename(fbPath: string): string {
    const m = basename(fbPath).match(/^(\d+)-/)
    return m ? `FB-${m[1].padStart(2, "0")}` : basename(fbPath, ".md")

When a feedback file doesn't start with digits, the fallback returns the raw basename (e.g., orphan-note). The downstream tools (haiku_feedback_advance_hat, haiku_feedback_reject_hat) call findFeedbackFile which matches only ^(?:FB-)?(\d+)$. A bare orphan-note ID fails this → feedback_not_found error → the agent retries → cursor re-emits start_feedback_hat every tick.

Fix: In nextActionForFeedback, validate the parsed ID before returning an action. If parseFbIdFromFilename returns a value that doesn't match the expected pattern, skip the file (return null) rather than emitting an unresolvable action.


MINOR

10. Migration error path returns misleading track: "drift" with empty events

File: packages/haiku/src/orchestrator/workflow/run-tick.ts:87–89

When migration fails, the tick returns position: { track: "drift", action: { kind: "drift_detected", events: [] } }. A monitor or SPA reading position.track sees "drift with no events" — a benign no-op. The actual error is in action.action: "error". Consider returning track: "intent" (or a new "error" track) so the position reflects the error state, not a ghost drift.


11. migrateUnitFile line 243–245: no-op double-write of started_at

File: packages/haiku/src/orchestrator/migrations/v0-to-v4.ts:243

started_at is not in DEPRECATED_UNIT_FIELDS, so strip() preserves it unchanged. The if (next.started_at == null) { next.started_at = data.started_at ?? null } block is a no-op — next.started_at already equals data.started_at. Dead code that misleads readers into thinking started_at needs re-application after stripping.


Summary

# Severity File Description
1 Critical run-tick.ts:71 Exact semver comparison → all v4 intents break on next CI bump
2 Critical v0-to-v4.ts, cursor.ts:287 "rejected" not normalized → migrated units stuck as in-progress
3 Critical state-tools.ts:62 withIntentMainLock imported but never called — stage merges unserialised
4 High cursor.ts:425, prompts/index.ts close_feedback has no prompt builder → "Unknown Action" dump
5 High locks.ts:61 isAlive catches EPERM as dead → premature lock-stealing on Windows
6 High cursor.ts:737–759 Stage[0] fallback when all merged blocks intent-level approvals
7 Medium cursor.ts:363–365 Unknown stage → walks entire stage list for FBs
8 Medium drift-sweep.ts Repo-relative output paths resolved against wrong base
9 Medium cursor.ts:438–441 Non-numeric FB filenames produce unresolvable dispatch IDs
10 Minor run-tick.ts:87 Migration error returns misleading track: "drift"
11 Minor v0-to-v4.ts:243 No-op started_at double-write

Issues 1, 2, and 3 should be resolved before merging. Issue 1 in particular is a ticking clock — it activates on the first CI bump after this lands.

jwaldrip and others added 4 commits May 7, 2026 09:13
Checkpoint of in-flight v4 engine work before merging the
design-direction intake/upload PR (#322).

What's in this checkpoint:
- Cursor-driven engine refactor: derivePosition + Track A/B/C walks,
  firstUnmergedStage with strict-ahead semantic, body-sha256 witness
  model replacing baseline.json, drift-sweep dedup by source_ref.
- Filesystem-mode persistence first-class: stages_merged in
  intent.md as canonical merge signal when no git repo.
- All MCP elicitation removed; studio/mode/stage selection routes
  through SPA picker (createPickerSession + runPicker helper).
- Tick-driven select_*: run-tick.ts emits select_studio /
  select_mode / select_stage when fields missing; haiku_run_next
  intercepts and runs the picker inline so the agent sees a
  blocking tick, never "Call haiku_select_*" instructions.
- close_feedback / merge_stage / merge_intent auto-execute in
  haiku_run_next instead of bouncing back to the agent.
- E2E suites: software studio, multi-mode, feedback mid-flight,
  drift mid-flight, filesystem mode, squash-merge fallback,
  picker wire round-trip, SPA wire round-trip.

Next up (not yet done):
- Consolidate URL+await pairs (gate_review, visual question,
  design direction) into single blocking tool calls so the agent
  never sees "post URL + call await_*" two-step instructions.
- Pull in PR #322 (intake-first design picker + upload mode) and
  align the new wire shapes with the engine-side blocking pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls in the intake-first design picker (PR #322) and aligns the
new wire shapes with the v4 cursor-driven engine.

What carried over verbatim:
- haiku-api schemas: DirectionUploadFile + DirectionUploadModeSchema
  + DirectionGenerateModeSchema discriminator branches.
- haiku-ui DirectionPage split into IntakePage + ArchetypePage with
  drag/drop, per-file caption, generate fallback.
- HTTP /direction/:id/select route handles upload mode via
  persistDesignDirectionUploads (decodes data URLs into
  stages/<stage>/artifacts/design-direction/uploads/).
- Studio docs (ARCHITECTURE.md, software/design ELABORATION.md) +
  ModalRouter blurb describe intake-first.
- New prompts: design_direction_required (rewritten for intake-first)
  + design_direction_uploaded.

What had to be ported (PR #322 modified handlers/elaborate.ts; v4
deleted that file because the cursor itself now emits dispatch
actions):

- cursor.ts: design-direction gate is now two-phase. After selection,
  the cursor's surface-once branch emits ONE
  `design_direction_complete` (archetype mode) or
  `design_direction_uploaded` (upload mode) action so the agent can
  Read screenshots / uploaded files before elaborate. Surfaced
  state tracked by `surfaced_at` on the same intent.md record.
- haiku_run_next.ts: stamps `surfaced_at` on intent.md right before
  returning the surface-once action so the next tick falls through
  to elaborate (mirrors the PR's design_direction_surfaced flag,
  but on intent.md FM instead of stage state.json).
- state-tools.ts persistDesignDirectionUploads: also stamps
  `design_directions[<stage>] = { mode: "upload", uploads, … }` on
  intent.md so the v4 cursor's gate check sees the selection
  (state.json is no longer authoritative in v4).
- state-tools.ts persistDesignDirectionSelection (archetype mode):
  the intent.md stamp now carries `mode: "archetype"`, comments,
  annotations, and `at` so the cursor can render the full payload
  on surface-once.

Tests:
- cursor-walk.test.mjs: existing fixtures stamping
  design_directions[stage] now also stamp `surfaced_at` so the
  walk tests assert on downstream gates (clarify, discovery)
  instead of the new surface-once branch.
- New cursor-walk cases: archetype-mode without surfaced_at →
  design_direction_complete; upload-mode without surfaced_at →
  design_direction_uploaded.
- haiku-api test suite: 182 passed.
- haiku-ui test suite: 456 passed.
- haiku focused tests (sad-paths + cursor-walk + multi-tick +
  picker-wire-round-trip + select-mode-stage-constraints +
  select-studio-prompt): 50 passed.

Still open (queued for the URL+await consolidation pass):
- pick_design_direction → haiku_await_design_direction is still a
  two-step (URL + await) pair. Will collapse into a single blocking
  call alongside gate_review and ask_user_visual_question.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The agent used to see "post URL + call await tool" pairs for every
interactive surface — gate review, visual question, design direction.
Each pair was three turns minimum: tool 1 returns URL, agent posts it,
agent calls tool 2 which blocks. Half the time the agent forgot step 2
and the user clicked Approve into the void.

Now every interactive surface is a single blocking tool call. The
engine creates the session, launches the browser best-effort, and
blocks until the user submits. The await tools stay as resume entry
points for the case where the original blocking call timed out.

## ask_user_visual_question + haiku_await_visual_answer

- ask_user_visual_question now creates the session, launches the
  browser, and blocks on awaitVisualAnswerSession inline. Returns the
  answer + screenshot annotations directly.
- haiku_await_visual_answer becomes a thin resume wrapper around the
  same helper.
- Extracted awaitVisualAnswerSession() to share the wait + response
  builder between the two paths.

## pick_design_direction + haiku_await_design_direction

Same shape as visual: pick_design_direction inlines the await; the
old await tool is the resume entry. Extracted
awaitDesignDirectionSession() that handles all four submission modes
(select / regenerate / generate / upload).

## gate_review + haiku_await_gate

haiku_run_next's gate_review block used to return the URL and ask the
agent to call haiku_await_gate. Now it:

1. Prepares the session (creates it, writes the resume pointers to
   intent.md, stamps announced_at on first prepare).
2. Calls haiku_await_gate inline — same code path as the resume entry,
   so post-decision side effects (stampGateApproval,
   workflowAdvancePhase/Stage, writeReviewFeedbackFiles,
   sealIntentState, etc.) all go through the canonical handler.
3. Inspects the returned action name. For advance cases
   (advance_phase / advance_stage / intent_approved) it re-ticks so
   the cursor surfaces the natural-next workflow action. For
   terminal/feedback cases (intent_complete, external_review_requested,
   changes_requested, revise_unit_specs, stage_revisit) it returns
   the await response directly.

The gate_review handling is now a `while` loop, so a chain of gates
(e.g. final stage approval → intent_review pre-completion gate) can
all process in the same blocking tick — the agent sees one tool call
that returns the post-everything action.

Added extractActionFromAwaitResponse() helper that parses the action
name out of haiku_await_gate's rendered response.

## Tests

- Added design_direction_complete / design_direction_uploaded handlers
  to the e2e test scaffolding (e2e-software-studio +
  real-intent-dry-run). The scaffolding mirrors haiku_run_next's
  surface-once stamp by writing surfaced_at on the design_directions
  record.
- All 1340 haiku tests, 182 haiku-api tests, and 456 haiku-ui tests
  pass (1978 total). Type-check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picker UI
- New /picker/:sessionId route in haiku-ui rendering studio (card grid
  + stage-chain preview), mode (cards with mini-timeline showing where
  the engine pauses), stage (simple list), confirm (two-button
  decision), and url_input (free-text input for the new external-review
  URL collection path).
- haiku-api carries PickerSessionPayloadSchema in the SessionPayload
  discriminated union plus PickerSelectRequest/Response schemas and
  /picker/:id GET + /picker/:id/select POST routes.
- session-api projects picker fields (kind / title / prompt / options /
  selection) into the wire response. ApiClient gained submitPicker().

External review URL — engine direction
- haiku_run_next's external_review_requested response now tells the
  agent to open the change request via gh / glab directly (the agent
  has CLI access; no need to ask the user). Fallback: ask the user in
  chat if gh/glab isn't reachable. The url_input picker stays
  available as a future fallback path.
- FeedbackSidebar gained gateAcceptsLocalApprove() — pure-external
  gates no longer render Approve (the merge IS the approval signal).
  Compound [external, ask] still renders both Approve + External
  buttons so the user can pick either path.

Dead code + migration cleanup
- Deleted prompts/gate_review.ts and its companion test
  (gate-review-prompt.test.mjs). Under v4 the gate_review action never
  reaches the agent — haiku_run_next inlines the await + post-decision
  processing in one tool call.
- Removed the gate_review entry from the prompt registry.
- v0-to-v4 migrator now scrubs pre-v4 baseline noise: deletes
  baseline.json, drift-markers.json, and baseline-content/ at both
  intent and per-stage scope. Added a migrator test covering both
  scopes.

Stale tool descriptions reframed
- haiku_await_gate, haiku_await_visual_answer, and
  haiku_await_design_direction descriptions now read as resume entry
  points, not "Call this AFTER X" two-step instructions. Canonical
  flow blocks inside haiku_run_next / ask_user_visual_question /
  pick_design_direction.

Sync surfaces refreshed (per CLAUDE.md sync rule)
- ArchitectureMap orchestrator-actor + webui-actor blurbs rewritten
  for engine-side blocking (no more URL+await two-step language). The
  actor talks_to lists now mention the SPA picker and engine-side
  gate review explicitly.
- plugin/studios/ARCHITECTURE.md action table: select_studio /
  select_mode / select_stage / gate_review entries describe
  engine-side blocking, not "Call haiku_select_*" / "Call
  haiku_await_gate".

Tests
- haiku 1336 / 1336 (added migrator baseline-cleanup case)
- haiku-api 182 / 182
- haiku-ui 456 / 456 (test mocks for ApiClient gained submitPicker)
- All type-checks clean
- Stray packages/haiku/.haiku/intents/picker-mockups/ scaffolding
  removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

Claude finished @jwaldrip's task in 11m 22s —— View job


PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardening

  • Gather PR context and metadata
  • Check status of previously-flagged critical issues (1–3)
  • Review new commits since last review (4 new commits: picker SPA, URL+await collapse, design-direction port, select_* engine dispatch)
  • Check high/medium issues (4–9) from prior review
  • Full re-review: engine, migrations, MCP tools, SPA, security
  • Post final review summary

The new commits (31c6ee1580e77d) close two of the eleven prior issues (Issues #4 and #6), introduce the SPA picker flow and engine-side select_* dispatch, and introduce two new high-priority bugs. The three critical issues from the first review remain unaddressed.

Status of previously-flagged issues

# Severity Status Note
1 Critical STILL OPEN Exact semver comparison on run-tick.ts:82
2 Critical STILL OPEN "rejected" not normalized in migrator
3 Critical STILL OPEN withIntentMainLock never called (comment added on line 471 claims it is)
4 High FIXED ✓ close_feedback.ts added and registered in prompts/index.ts:80
5 High STILL OPEN isAlive bare catch on locks.ts:61
6 High FIXED ✓ Stage[0] fallback removed ("2026-05-06 P11 bug fix" comment in cursor.ts:927)
7 Medium STILL OPEN walkFeedbackTrack cutoff === -1 fallback to stages.slice()
8 Medium STILL OPEN Repo-relative output paths resolved against intentDir (now also in new sign-slot.ts:65)
9 Medium STILL OPEN parseFbIdFromFilename raw-basename fallback → unresolvable dispatch ID
10 Minor STILL OPEN Migration error returns track: "drift"
11 Minor STILL OPEN No-op started_at double-write in migrator

NEW issues from the latest commits

N1 (HIGH): close_feedback handler never finds feedback files

File: packages/haiku/src/tools/orchestrator/haiku_run_next.ts:392

for (const f of readdirSync(fbDir)) {
    if (!f.endsWith(".md")) continue
    if (!f.startsWith(`${fbId}-`) && !f.startsWith(`${fbId}.`)) continue  // ← bug

fbId is "FB-01" (from parseFbIdFromFilename). Real feedback files on disk are named 01-some-title.md (state-tools.ts:4972: const filename = \${nn}-${fileSlug}.md`). "01-some-title.md".startsWith("FB-01-")is alwaysfalse. fbFilestays empty →if (!fbFile) break→ feedback is never closed → cursor emitsclose_feedback` again on the next tick → infinite loop.

The tests in drift-mid-flight-e2e.test.mjs accidentally avoid this because they create files like FB-DRIFT-01-drift.md (the raw-basename fallback path in parseFbIdFromFilename), which do satisfy startsWith("FB-DRIFT-01-drift"). Production files created via haiku_feedback MCP will all hit the broken path.

Fix: Replace the manual readdirSync scan with the existing findFeedbackFile(slug, stage, fbId) from state-tools.ts:5335, which correctly normalises "FB-01" → numeric 1 → finds 01-*.md.

Fix this →


N2 (HIGH): select_* picker infinite loop on timeout

File: packages/haiku/src/tools/orchestrator/haiku_run_next.ts:100–133

async function runSelectionPicker(...): Promise<{ ok: true } | { ok: false; message: string }> {
    ...
    if (result.isError) { return { ok: false, ... } }
    return { ok: true }   // ← fires on timeout too
}

When a studio/mode/stage picker times out, haiku_select_studio.handle() returns text(JSON.stringify({action:"cancelled",...}))isError is undefined (the text() helper at _text.ts:7 doesn't set it). runSelectionPicker returns { ok: true }. The while (result.action === "select_studio" || ...) loop at line 302 re-dispatches, the cursor still emits select_studio (nothing was written), a new 30-minute picker session opens, and the loop runs again — indefinitely until the AbortSignal fires.

Fix: In runSelectionPicker, parse the response text for "action":"cancelled" and return { ok: false }:

const responseText = result.content?.find(c => c.type === "text")?.text ?? ""
if (responseText.includes('"action":"cancelled"')) {
    return { ok: false, message: "Picker timed out or was cancelled." }
}

Or, simpler: have the select tools return isError: true on timeout (consistent with the if (!selectedStudio) path at line ~162 of haiku_select_studio.ts which already returns isError: true).

Fix this →


N3 (MEDIUM): clarify_required prompt instructs an impossible write path

File: packages/haiku/src/orchestrator/prompts/clarify_required.ts:56

"3. Stamp the answers on `intent.md` via `haiku_human_write` ..."

haiku_human_write (tool-defs.ts:225 description) explicitly refuses intent.md: "Workflow-managed files (units, feedback, intent.md, state.json) are refused." And haiku_intent_set rejects clarifications because it's not in AGENT_AUTHORABLE_INTENT_FIELDS (intent.ts:105: Object.keys(INTENT_FRONTMATTER_SCHEMA.properties ?? {})). The P4 clarify gate cannot be satisfied — agents hitting it get an error on whichever tool they try.

Fix: Add clarifications to INTENT_FRONTMATTER_SCHEMA as an optional Type.Unknown() field so it passes the AJV schema and is included in AGENT_AUTHORABLE_INTENT_FIELDS. Then update the prompt to instruct haiku_intent_set { field: "clarifications", value: { [stage]: { answers, at } } }.


Carry-forward: still critical before merge

The three critical items from the prior review are still blocking. The comment at haiku_run_next.ts:471 even says "under the intent-main lock" but no lock is called — that's now actively misleading documentation on top of the underlying bug.

Issue Location What breaks
#1 run-tick.ts:82 All v4 intents fail on the first CI bump after landing (4.0.0 → 4.0.1 → "no migration path" error)
#2 v0-to-v4.ts + cursor.ts:312 Migrated units with v3-style result:"rejected" treated as in-progress forever
#3 haiku_run_next.ts:531, side-effects.ts:129/333 Stage-into-main merges are unserialised; concurrent haiku_run_next calls can clobber each other's merge commit
N1 haiku_run_next.ts:392 Production FB fix loops never close; close_feedback loops infinitely (test coverage only works by accident)
N2 haiku_run_next.ts:302 Picker timeout causes infinite re-open loop; effectively hangs any session where the user doesn't pick

Issues #1, #2, #3, N1, and N2 should all be resolved before merging.

Two real-world footguns in the discrete / external-review path that
broke pickup for User B (the "Pedro issue"):

1. Stage PRs landing on the repo default branch instead of
   `haiku/<slug>/main`. Agents would run `gh pr create` without
   --base, so the PR opened against `main`. Once merged, the cursor's
   `firstUnmergedStage` (which checks against intent main, not the
   repo default) kept the stage pinned and pickup wedged.

2. No graceful recovery when the misroute had already happened. User
   B fetches, sees the stage commits on `main` but not on
   `haiku/<slug>/main`, and the engine has no way to reconcile.

## Engine-side stage PR opening

`openStagePullRequest(slug, stage)` in `git-worktree.ts`:
- Pushes `haiku/<slug>/<stage>` to origin.
- Calls `gh pr create --base haiku/<slug>/main` (or
  `glab mr create --target-branch haiku/<slug>/main`).
- If gh/glab CLI fails or isn't on PATH, builds a provider-specific
  "open MR" URL the user can click:
    GitHub → /compare/<base>...<head>?expand=1 (auto-opens create form)
    GitLab → /-/merge_requests/new?merge_request[source_branch]=…&[target_branch]=… (pre-fills source + target)
- Returns a structured result with createdUrl / compareUrl / pushed /
  errors so callers can surface whichever path succeeded.

Wired into `haiku_await_gate.ts`'s external_review handling: when the
user clicks External Review at the gate, the engine opens the MR
itself with the right base, persists `external_review_url` on
intent.md so downstream pickups see it, and surfaces the URL to the
agent. The agent's instruction set no longer relies on the agent
remembering to pass `--base`.

`haiku_run_next.ts` external_review_requested handling does the same
when the action arrives without an URL already set (the cursor path,
not the gate-await path).

## Misrouted-merge reconciliation

`reconcileMisroutedStageMerges(slug, stages)` in `git-worktree.ts`:
- Walks each stage branch.
- Detects when the branch's commits exist on the repo mainline (local
  or `origin/<mainline>`) but not on `haiku/<slug>/main`.
- Fast-forwards intent main to mainline when safe (i.e. intent main
  is itself an ancestor of mainline). Pushes the FF to origin so
  other clones see the fix.
- Refuses to FF when intent main has divergent commits — surfaces a
  structured error with the exact `git checkout && git merge` recipe
  the operator needs to run.

Pre-cursor reconciliation step in `haiku_run_next.ts.handle()`: when
the intent has `external_review_url` set and we're in a git repo,
fetch from origin and run reconciliation BEFORE the cursor walks.
That's the "User B opens a fresh checkout, runs /haiku:pickup,
expects the engine to figure it out" path. Cleanly reconciled stages
are invisible to the agent (which is the right UX); divergent ones
surface a hard error with a fix recipe.

## Tests

Added `misrouted-stage-merge.test.mjs` covering all three paths:
- Happy path (merged into intent main → no reconciliation needed).
- Misroute + FF-safe (engine fast-forwards intent main, stage now
  appears merged to the cursor).
- Misroute + divergent intent main (engine refuses to FF, surfaces a
  manual-fix error message).

All 1339 haiku tests pass across 94 test files. Type-check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

Claude finished @jwaldrip's task in 7m 1s —— View job


Claude finished @jwaldrip's task in ~15m ——


PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardening


The latest commit (cf1142b) adds misrouted-stage-merge recovery and engine-side PR opening in haiku_await_gate. Three of the prior eleven flagged issues are now fixed. Eight remain open, including all three criticals. One new medium issue is introduced by the latest commit.

Status of previously-flagged issues

# Severity Status Note
1 Critical STILL OPEN run-tick.ts:82 exact semver compare
2 Critical STILL OPEN "rejected" not normalized in migrator
3 Critical STILL OPEN withIntentMainLock still never called
4 High FIXED ✓ close_feedback.ts prompt builder added
5 High STILL OPEN isAlive bare catch in locks.ts:61
6 High FIXED ✓ Stage[0] fallback removed from cursor
7 Medium STILL OPEN walkFeedbackTrack cutoff === -1 walks all stages
8 Medium STILL OPEN Drift sweep resolves repo-relative paths against intentDir
9 Medium STILL OPEN parseFbIdFromFilename raw-basename fallback → unresolvable ID
10 Minor STILL OPEN Migration error returns track: "drift"
11 Minor STILL OPEN No-op started_at double-write in migrator
N1 High STILL OPEN close_feedback handler never finds feedback files
N2 High FIXED ✓ Picker timeout now returns isError: true correctly
N3 Medium STILL OPEN clarify_required prompt writes to haiku_human_write + field not in AGENT_AUTHORABLE_INTENT_FIELDS

CRITICAL — unchanged, still blocking

1. Exact semver comparison still breaks all v4 intents on CI bump

File: packages/haiku/src/orchestrator/workflow/run-tick.ts:82

targetMajor was computed on line 75 but is only used for the outer if (targetMajor >= 4) gate (line 77). The migration guard on line 82 is still an exact string compare:

if (sourceVersion !== target) {          // ← "4.0.0" !== "4.0.1" → triggers
    migrateIntent(ctx, sourceVersion, target)  // no path → throws

The registry has exactly one edge: "0" → "4.0.0". The first CI auto-bump after this PR lands (e.g. 4.0.0 → 4.0.1) makes every v4 intent throw "no migration path from 4.0.0 to 4.0.1" on every tick. This is a ticking clock.

Fix: Change line 82 to compare majors:

const sourceMajor = Number(sourceVersion.split(".")[0] ?? "0") || 0
if (sourceMajor !== targetMajor) {

Fix this →


2. result: "rejected" (v3 form) still not normalized — migrated units stuck forever

File: packages/haiku/src/orchestrator/migrations/v0-to-v4.ts, packages/haiku/src/orchestrator/workflow/cursor.ts:312

cursor.ts:312 checks last.result === "reject" (present tense). v3 units used result: "rejected" (past tense). migrateUnitFile copies iterations[] as-is (next.iterations = data.iterations ?? []) with no normalization pass. A migrated unit whose last iteration has result: "rejected" falls through both checks and returns null — treated as in-flight, blocking the wave every tick.

Fix: In migrateUnitFile (after the iterations copy), add:

for (const iter of next.iterations as Array<Record<string, unknown>>) {
    if (iter.result === "rejected") iter.result = "reject"
    if (iter.result === "advanced") iter.result = "advance"  // belt+suspenders
}

Fix this →


3. withIntentMainLock still never called — stage→main merges unserialised

File: packages/haiku/src/tools/orchestrator/haiku_run_next.ts:601-604

withIntentMainLock is defined in locks.ts:202, imported in state-tools.ts:62, and promised to the agent in two prompt files (merge_stage.ts:28, merge_intent.ts:28). The while (result.action === "merge_stage") loop at line 568 calls:

const mergeOutcome = mergeStageBranchIntoMain(slug, stageToMerge)

…with no lock. Two concurrent haiku_run_next calls targeting the same intent produce a race on the stage→main merge.

Fix: In haiku_run_next.ts, import withIntentMainLock from ../../locks.js and wrap the mergeStageBranchIntoMain call:

const mergeOutcome = withIntentMainLock(slug, () =>
    mergeStageBranchIntoMain(slug, stageToMerge)
)

HIGH — still blocking

N1. close_feedback handler never finds FB files — infinite loop on every production FB

File: packages/haiku/src/tools/orchestrator/haiku_run_next.ts:460-462

parseFbIdFromFilename (cursor.ts:568) produces "FB-01" from a file named 01-slug.md. The close handler matches:

if (!f.startsWith(`${fbId}-`) && !f.startsWith(`${fbId}.`)) continue

"01-slug.md".startsWith("FB-01-")false. fbFile stays empty → break exits without closing → cursor re-emits close_feedback on the next tick → infinite loop. Tests pass accidentally because test fixtures use non-standard FB-DRIFT-01-... naming which happens to satisfy the prefix check.

Fix: Replace the manual readdirSync scan with the existing findFeedbackFile(slug, stage, fbId) from state-tools.ts, which correctly normalises "FB-01" → numeric 1 → matches 01-*.md. Also update the drift test fixtures to use production-standard naming so this stays caught.

Fix this →


5. isAlive catches EPERM as dead — premature lock-stealing on Windows and cross-user Linux

File: packages/haiku/src/locks.ts:57-63

function isAlive(pid: number): boolean {
    try {
        process.kill(pid, 0)
        return true
    } catch {      // ← catches EPERM as dead
        return false
    }
}

Fix: ESRCH = dead, EPERM = alive-but-unpingable:

} catch (err) {
    return (err as NodeJS.ErrnoException).code === "EPERM"
}

Fix this →


NEW — introduced by the latest commit (cf1142b)

N5. reconcileMisroutedStageMerges may leave worktree on wrong branch on unexpected throw

File: packages/haiku/src/git-worktree.ts:732-785

After execFileSync("git", ["checkout", intentMain]) succeeds and restoreBranch is captured, any unhandled exception inside the outer try block before the if (restoreBranch) restore block causes the outer catch to fire — but restoreBranch goes out of scope and git checkout <originalBranch> never runs. The agent's next operation then runs on haiku/<slug>/main instead of the stage branch, where all writes land on the wrong branch.

In practice this requires an uncaught throw after checkout, which is rare (push errors return result objects), but the pattern should use finally to guarantee the restore:

if (restoreBranch) {
    try {
        execFileSync("git", ["checkout", restoreBranch], { stdio: "pipe" })
    } catch { /* non-fatal */ }
}

…should be in a finally {} wrapping the whole operation, not inside the try block.

Fix this →


MEDIUM — carry-forward open

7. walkFeedbackTrack walks all stages when currentStage is not found

File: packages/haiku/src/orchestrator/workflow/cursor.ts:493-495

const cutoff = stages.indexOf(currentStage)
const stagesUpToCurrent =
    cutoff >= 0 ? stages.slice(0, cutoff + 1) : stages.slice()  // walks ALL stages

When currentStage is unrecognised (renamed stage, misconfigured studio), stages.slice() returns the full list and the feedback walk surfaces FBs from future stages.

Fix: Return [] when cutoff === -1 — walk no stages when the current stage can't be located.

Fix this →


8. Drift sweep joins repo-relative output paths against intentDir — check silently passes

File: packages/haiku/src/orchestrator/workflow/drift-sweep.ts:224

const outAbs = join(args.intentDir, outRel)

Unit outputs like src/components/Button.tsx resolve to <intentDir>/src/components/Button.tsx, which doesn't exist. fileSha256(outAbs) returns null → continue → no drift event → drift check silently passes for the most important code artifacts.

Fix: Paths that don't start with stages/ are repo-relative and should resolve against args.repoRoot, not args.intentDir.


9. parseFbIdFromFilename fallback returns raw basename — causes unresolvable dispatch IDs

File: packages/haiku/src/orchestrator/workflow/cursor.ts:568-571

function parseFbIdFromFilename(fbPath: string): string {
    const m = basename(fbPath).match(/^(\d+)-/)
    return m ? `FB-${m[1].padStart(2, "0")}` : basename(fbPath, ".md")
}

The raw-basename fallback (e.g., orphan-note) passes findFeedbackFile's ^(?:FB-)?(\d+)$ check and causes feedback_not_found on every tick. The agent retries, the cursor re-emits, infinite loop.

Fix: Return null when the filename doesn't match the expected pattern; in nextActionForFeedback, skip the file if parseFbIdFromFilename returns null.


N3. clarify_required prompt instructs a permanently-blocked write path

File: packages/haiku/src/orchestrator/prompts/clarify_required.ts:56

"3. Stamp the answers on `intent.md` via `haiku_human_write`..."

Two failures:

  1. haiku_human_write explicitly refuses intent.md (state-tools.ts comment line 32: "Workflow-managed files... are refused").
  2. clarifications is not in INTENT_FRONTMATTER_SCHEMA.properties and therefore not in AGENT_AUTHORABLE_INTENT_FIELDS (intent.ts:105). haiku_intent_set { field: "clarifications" } returns the "Field not in INTENT_FRONTMATTER_SCHEMA" error.

An agent hitting clarify_required cannot satisfy it. The gate is permanently stuck.

Fix (two parts):

  1. Add clarifications: Type.Optional(Type.Unknown()) to INTENT_FRONTMATTER_SCHEMA in packages/haiku/src/state/schemas/intent.ts.
  2. Update the prompt to use haiku_intent_set { field: "clarifications", value: { [stage]: { answers, at } } }.

Fix this →


Summary

Issue Severity Status File
#1 Critical OPEN run-tick.ts:82 — semver exact compare → all v4 intents break on next CI bump
#2 Critical OPEN v0-to-v4.ts + cursor.ts:312"rejected" not normalized → migrated units stuck
#3 Critical OPEN haiku_run_next.ts:601withIntentMainLock never called
N1 High OPEN haiku_run_next.ts:462close_feedback never finds files; production infinite loop
#5 High OPEN locks.ts:61isAlive bare catch steals live locks on Windows
N5 Medium NEW git-worktree.ts:732 — restore after checkout not in finally; wrong-branch writes on throw
#7 Medium OPEN cursor.ts:495 — cutoff === -1 walks all stages for FBs
#8 Medium OPEN drift-sweep.ts:224 — repo-relative outputs joined against intentDir
#9 Medium OPEN cursor.ts:568 — raw-basename fallback in parseFbIdFromFilename
N3 Medium OPEN clarify_required.ts:56 — writes via blocked path; field not in schema
#4 High FIXED ✓ close_feedback prompt builder added
#6 High FIXED ✓ Stage[0] fallback removed
N2 High FIXED ✓ Picker timeout returns isError: true
#10–11 Minor OPEN Cosmetic

Issues #1, #2, #3, and N1 must be resolved before merging. #1 activates on the first CI bump after this lands. N1 affects every production fix-loop closure.

…boundary

Matt (designer) tried to upload `Travel_Delivery_Example_Header_Only.html`
to the knowledge uploader and got "File type not accepted." That
allowlist (.md .pdf .png .jpg .jpeg .svg .txt) rejected real artifacts:
designers exporting .html mockups from Sketch / Figma, researchers
attaching .docx / .xlsx / .csv notes, anyone pasting a .json data
bundle.

The upload-side allowlist + extension blocklist on the knowledge route
were defense against V-01 (stored XSS via uploaded HTML rendering in
the privileged tunnel origin). That defense moved to serve time when
serveFile (http/path-safety.ts) gained the inverted MIME map: anything
not in SAFE_INLINE_MIME_TYPES gets `application/octet-stream` +
`Content-Disposition: attachment`, so the reviewer's browser downloads
unknown types instead of executing them. Once that landed, the
upload-side rejection became redundant friction.

What this commit does:
- KnowledgeDropZone DEFAULT_ACCEPT changes from a fixed extension list
  to "*" — accepts any file. The size cap (10 MB) is the only real
  limit. The "drop files here" caption renders "any file type · max
  10 MB each" in this mode.
- Knowledge upload route in http/upload-routes.ts drops both the
  BLOCKED_EXTENSIONS check and the ALLOWED_MIMES_KNOWLEDGE allowlist.
  The route accepts any file; serveFile downgrades dangerous MIMEs at
  serve time.
- Stage-output uploads keep their existing allowlist — those files are
  intentionally rendered inline in the SPA review UI and the inverted
  MIME map alone isn't enough cover.

Tests:
- upload-routes.test.mjs: replaced 4 "knowledge: X rejected with 415"
  cases with positive "X uploads successfully and lands on disk"
  assertions covering .html (Sketch export shape), .svg, .js (snippet
  reference), and octet-stream payloads.
- red-team-unit-01-upload-bypass.test.mjs: R-05 was the V-01 bypass
  via .js + octet-stream on the knowledge route. Same conversion —
  upload now succeeds (200) and the test documents that V-01 closure
  moved to serveFile, with a comment guarding against reintroducing
  the upload-side guard.
- 1336 / 1336 haiku tests green; UI tests 456 / 456; type-check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

Claude finished @jwaldrip's task in 4m 6s —— View job


PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardening

  • Gather PR context and prior review history
  • Review new commits since last review (869c445, cf1142b)
  • Verify status of all previously-flagged issues
  • Full re-scan of engine, migration, MCP tools changes
  • Post final review summary

Two new commits reviewed (cf1142b misrouted-merge recovery, 869c445 knowledge upload filter drop). The upload security change is sound — serveFile's inverted MIME map is verified in packages/haiku/src/http/path-safety.ts:174-181 and the target_filename path-traversal guard is at upload-routes.ts:887-895. No new security issues introduced.

Six of eleven previously-flagged issues are still open, including all three criticals and the highest-impact high (N1). Status below.


Status of all previously-flagged issues

# Severity Status File
1 Critical STILL OPEN run-tick.ts:82 — exact semver compare
2 Critical STILL OPEN v0-to-v4.ts + cursor.ts:312"rejected" not normalized
3 Critical STILL OPEN haiku_run_next.ts:604withIntentMainLock never called
4 High FIXED ✓ close_feedback.ts prompt builder added
5 High STILL OPEN locks.ts:61isAlive bare catch steals live locks
6 High FIXED ✓ Stage[0] fallback removed from cursor
7 Medium STILL OPEN cursor.ts:495 — cutoff === -1 walks entire stage list
8 Medium STILL OPEN sign-slot.ts:65 + drift-sweep.ts:224 — repo-relative outputs never witnessed
9 Medium STILL OPEN cursor.ts:568 — raw-basename fallback in parseFbIdFromFilename
10 Minor STILL OPEN run-tick.ts:99 — migration error returns track: "drift"
11 Minor STILL OPEN v0-to-v4.ts:247 — no-op started_at double-write
N1 High STILL OPEN haiku_run_next.ts:462close_feedback never finds files
N2 High FIXED ✓ Picker timeout returns isError: true
N3 Medium STILL OPEN clarify_required.ts:56 — writes via haiku_human_write; field not in schema
N5 Medium STILL OPEN git-worktree.ts:775restoreBranch restore not in finally

CRITICAL — still blocking merge

1. Exact semver compare still breaks all v4 intents on first CI bump

packages/haiku/src/orchestrator/workflow/run-tick.ts:82

targetMajor is computed on line 75 but line 82 still compares full version strings:

if (sourceVersion !== target) {          // "4.0.0" !== "4.0.1" → triggers
    migrateIntent(ctx, sourceVersion, target)  // no path → throws

The registry has one edge: "0" → "4.0.0". The first CI bump after this lands (4.0.0 → 4.0.1) causes every v4 intent to throw "no migration path" on every tick. This is a ticking clock — it activates on the first merge to main after this PR.

Fix: Change line 82 to:

const sourceMajor = Number(sourceVersion.split(".")[0] ?? "0") || 0
if (sourceMajor !== targetMajor) {

Fix this →


2. result: "rejected" still not normalized — migrated units stuck as in-progress forever

packages/haiku/src/orchestrator/migrations/v0-to-v4.ts:247, packages/haiku/src/orchestrator/workflow/cursor.ts:312

cursor.ts:312 checks last.result === "reject" (present tense). v3 units used result: "rejected" (past tense). Lines 246–247 copy iterations[] as-is with no normalization pass. A migrated unit whose last iteration has result: "rejected" falls through both the "advance" and "reject" branches and returns null — treated as in-flight, blocking the wave every tick with no re-dispatch.

Fix: After next.iterations = data.iterations ?? [] in migrateUnitFile, add:

for (const iter of next.iterations as Array<Record<string, unknown>>) {
    if (iter.result === "rejected") iter.result = "reject"
    if (iter.result === "advanced") iter.result = "advance"
}

Fix this →


3. withIntentMainLock still never called — concurrent merge_stage ticks can clobber each other

packages/haiku/src/tools/orchestrator/haiku_run_next.ts:604

withIntentMainLock is defined in locks.ts:202, imported (unused) in state-tools.ts:62, and documented in merge_stage.ts and merge_intent.ts prompt files. The actual call at line 604 is:

const mergeOutcome = mergeStageBranchIntoMain(slug, stageToMerge)

No lock wraps this. Two concurrent haiku_run_next calls on the same intent trigger two simultaneous stage-into-main merges, producing a merge in progress git error or silently clobbering each other's work. The comment in the prompt file says "under withIntentMainLock" — that's false documentation.

Fix: Import withIntentMainLock from ../../locks.js in haiku_run_next.ts and wrap the call:

const mergeOutcome = await withIntentMainLock(slug, () =>
    mergeStageBranchIntoMain(slug, stageToMerge)
)

HIGH — still blocking

N1. close_feedback handler never finds files in production — infinite loop

packages/haiku/src/tools/orchestrator/haiku_run_next.ts:462

parseFbIdFromFilename (cursor.ts:568) produces "FB-01" from a file named 01-slug.md. The close handler at line 462 then checks:

if (!f.startsWith(`${fbId}-`) && !f.startsWith(`${fbId}.`)) continue
// "01-slug.md".startsWith("FB-01-") → false → always skips

fbFile stays empty → break exits without stamping closed_at → cursor re-emits close_feedback on the next tick → infinite loop. Every production fix-loop closure is stuck. The tests pass accidentally because their fixtures use non-standard FB-DRIFT-01-* naming that happens to match.

Fix: Replace the readdirSync scan with the existing findFeedbackFile(slug, stage, fbId) from state-tools.ts which correctly normalises "FB-01" → numeric 1 → matches 01-*.md. Also update the test fixtures to use production-standard 01-*.md naming so this doesn't silently regress.

Fix this →


MEDIUM — should fix before merge

5. isAlive bare catch steals live locks on Windows / cross-user Linux

packages/haiku/src/locks.ts:61

} catch {      // catches ESRCH (dead) AND EPERM (alive-but-cross-user)
    return false
}

On Windows and cross-user Linux, process.kill(pid, 0) throws EPERM when the target process is alive but owned by a different account. The bare catch treats this as dead → isStale() returns true → lock stolen while the holder is still running. Common in CI environments with multiple agent users.

Fix: } catch (err) { return (err as NodeJS.ErrnoException).code === "EPERM" }

Fix this →


N5. restoreBranch checkout not in finally — wrong-branch writes on unexpected throw

packages/haiku/src/git-worktree.ts:775

reconcileMisroutedStageMerges checks out intentMain at line 741 and captures restoreBranch. The restore at line 775–780 is inside the outer try (lines 732–782). If pushBranchToOrigin(intentMain) at line 769 throws an unhandled exception, the outer catch at line 782 fires and the restore is skipped — leaving the worktree on intentMain instead of the stage branch. The agent's next operation then writes to the wrong branch.

Fix: Move the if (restoreBranch) block into a finally {} that wraps the whole operation.

Fix this →


7. walkFeedbackTrack walks all stages when current stage is unknown

packages/haiku/src/orchestrator/workflow/cursor.ts:494–495

const cutoff = stages.indexOf(currentStage)
const toWalk = cutoff >= 0 ? stages.slice(0, cutoff + 1) : stages.slice()  // walks ALL stages

When currentStage is unrecognised (renamed stage, misconfigured studio), stages.slice() returns the full list and surfaces FBs from future stages not yet started.

Fix: Return [] when cutoff === -1.


8. Repo-relative outputs never witnessed — drift on real code files silently passes

packages/haiku/src/orchestrator/workflow/sign-slot.ts:65, packages/haiku/src/orchestrator/workflow/drift-sweep.ts:224

Both buildOutputWitnesses (sign-slot.ts:65) and the drift check (drift-sweep.ts:224) join output paths against intentDir. A unit with outputs: ["src/components/Button.tsx"] resolves to <intentDir>/src/components/Button.tsx — a path that doesn't exist. The file is never added to the witnesses map at sign time; the drift sweep finds no entry; drift in the actual src/components/Button.tsx is permanently invisible.

Fix: Distinguish intent-relative paths (those starting with stages/) from repo-relative paths. Repo-relative paths should be joined against repoRoot, not intentDir.


9. parseFbIdFromFilename raw-basename fallback creates unresolvable dispatch IDs

packages/haiku/src/orchestrator/workflow/cursor.ts:568–570

return m ? `FB-${m[1].padStart(2, "0")}` : basename(fbPath, ".md")
// non-numeric file → "orphan-note" → findFeedbackFile rejects → infinite dispatch

Fix: Return null when the filename doesn't match; skip in nextActionForFeedback.


N3. clarify_required prompt writes via permanently-blocked path

packages/haiku/src/orchestrator/prompts/clarify_required.ts:56

The prompt instructs agents to call haiku_human_write on intent.md. Two independent blockers:

  1. haiku_human_write explicitly refuses workflow-managed files including intent.md.
  2. clarifications is not in INTENT_FRONTMATTER_SCHEMA.properties (intent.ts:105), so haiku_intent_set { field: "clarifications" } also returns an error.

An agent hitting clarify_required cannot satisfy it — the gate is permanently stuck.

Fix (both parts required):

  1. Add clarifications: Type.Optional(Type.Record(Type.String(), Type.Unknown())) to INTENT_FRONTMATTER_SCHEMA in packages/haiku/src/state/schemas/intent.ts.
  2. Update the prompt to use haiku_intent_set { field: "clarifications", value: { [stage]: { answers, at } } }.

Fix this →


Summary

Issues #1, #2, #3, and N1 must be resolved before merging. #1 is a ticking clock — it activates on the first CI bump after this lands, at which point every v4 intent in every production repo returns action: "error" on every tick. N1 means every fix-loop closure loops infinitely in production.

The two latest commits (misrouted-merge recovery and upload filter drop) are sound. The upload change correctly moves the XSS boundary to serveFile and the path-traversal guard at upload-routes.ts:887 remains in place.

Issues #5, N5, #7, #8, #9, N3 are lower-urgency but each represents a real operational failure mode that will surface in practice.

jwaldrip and others added 3 commits May 7, 2026 15:21
…te repair

Hardens the v4 engine against failure modes surfaced by debugging real
sessions (Matt's designer pickup, the overtime-ac product stage, Mike's
quality-gate loop). Single bundle because the fixes touch the same set
of dispatch / cursor / lifecycle paths.

ENGINE
- Cursor Track B walks merged stages too. Pre-fix: once every stage
  merged, FBs filed on prior stages were silently ignored and the
  pipeline sealed over them. New e2e regression pins both
  interpretations of "user on stage 4 leaves FB on stage 1."
- Drift sweep is FM-aware. New `outputSha256` body-hashes
  markdown/text outputs (FM stripped) and full-file-hashes binaries.
  `buildOutputWitnesses` and the comparator both route through it.
  Backward-compat: dual-strategy comparator accepts legacy whole-file
  witnesses so in-flight intents don't false-positive on first sweep.
- Model routing cascade wired into every dispatch site that emits an
  LLM subagent: start_unit_hat, start_feedback_hat, dispatch_review,
  dispatch_approval, discovery_required, review_fix.
  dispatch_quality_gates is intentionally exempt (engine tool, no
  LLM). Studio default_model: sonnet now flows to fix-hat dispatches
  instead of being silently overridden by the parent's Opus default.
- Feedback hat-rejection escalates the model tier (sonnet→opus) on
  the FB FM, mirroring the unit-hat reject path. Next bolt picks up
  the bumped tier automatically.
- haiku_unit_set lifecycle exemption for `quality_gates`. Mirrors
  `outputs`: gate definitions are check specs, not workflow state, so
  letting completed units repair them doesn't violate forward-only.
  Closes Mike's loop: agent had been telling users "edit the file
  outside Claude Code" because no MCP path existed.
- fix_quality_gates prompt rewritten to point agents at the
  haiku_unit_set repair path with the exact call signature, splitting
  "code issue" vs "gate definition issue" failure shapes.

REVIEW SPA
- FeedbackItem renders iterations[] as a collapsed disclosure under
  the closure_reply card. Reviewers see the chain of hats that fired,
  each with result + reason + truncated commit, instead of just a
  bare "closed" status.
- session-api scrubs body_sha256 + witnesses[] from the FM projection
  before the SPA sees it. The cursor needs them on disk for drift;
  reviewers don't need to see "scary sha artifacts" (Matt's session).
- Click-to-flash boundary documented + tested. deriveExistingAnchors
  exported with a comment block making the persistent-vs-flash split
  explicit. Closed FBs stay out of the saved highlight layer (no
  clutter) but still flash on click.
- haiku_feedback MCP tool accepts inline_anchor at create time.
  Adversarial-review and studio-review hats can now attach an excerpt
  the SPA flashes on click — same flow as user-authored anchored
  comments. Strict sub-schema gates the shape at the input layer.

TESTS
- 1352 haiku tests, 465 SPA tests, all green (was 1336 + 462).
- New: cross-stage-fb-rewalk (2 e2e), feedback-model-routing (4),
  unit-hat-model-routing (2), scrub-witness-fields (1),
  FeedbackItem.iterations (4), deriveExistingAnchors (5).
- Existing test files extended: drift-no-false-positives (3 backward-
  compat scenarios), feedback (2 inline_anchor cases), state-tools-
  handlers (3 lifecycle exemption + YAML round-trip).

Followup tracked: #324 (inline file diff in review
pane, deferred per scope discussion).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…B + units

Two surfaces, same migration shape. Both keep working for in-flight
intents authored with the legacy 2-digit width.

feedback_id (the bigger change):
- Schema: `feedback_id: integer (1..999)`. No more `FB-NN` string
  guessing on input. The handler returns canonical `FB-NNN` in
  responses.
- New helpers: `formatFeedbackId(num)` and
  `deriveFeedbackIdFromFilename(filename)` — single source of truth
  for the on-disk → wire-format translation.
- Lookup is numeric-prefix match against `^(\d+)-`, so legacy
  `08-foo.md` files resolve when the agent passes `feedback_id: 8`.
  New writes pad to 3 digits (`008-foo.md`).
- Cross-stage `haiku_feedback_move` (panda's v3 regression) lands
  with a regression test pinning the cross-stage relocation
  contract.
- All 8 FB handlers unified on `feedbackId` local var (was a mix of
  `feedbackId` / `fbId`).

Units (smaller, narrower scope — slug stays human-readable for the
depends_on graph):
- `unitPath()` now resolves width-flexibly: if the requested name
  doesn't match an exact file, fall back to `(numeric prefix, slug)`
  match. Both `state-tools.ts` and `state/shared.ts` carry the
  identical implementation since both export `unitPath`.
- `validateUnitFrontmatter` depends_on resolver uses the same
  width-flexible match — a fresh 3-digit unit can declare
  `depends_on: [unit-001-foo]` against a 2-digit on-disk sibling
  (and vice versa) and validate cleanly.
- Engine prompts now teach `unit-NNN-slug.md` (3-digit pad, max 999)
  with an explicit migration note that legacy 2-digit names still
  resolve.

Tests:
- `numeric-id-migration.test.mjs` — pins FB read with 2-digit and
  3-digit on-disk forms, FB write produces NNN-padded files, and
  unitPath resolves both directions across the width boundary.
  Includes a depends_on width-flexible cross-reference test.
- `cross-stage-feedback-move.test.mjs` — locks in panda's v3
  cross-stage move regression as fixed under v4.
- `_v4-fixtures.mjs` — `makeFeedback` now accepts numeric / digit-
  string / FB-NN ids, derives the trailing integer, and writes
  3-digit padded filenames. Permissive parser supports legacy
  `FB-DRIFT-NN` test ids by extracting trailing digits.
- All 38 affected test files updated (perl bulk-substitution then
  per-file fixups for assertions on the canonical `FB-NNN` shape).

Sweep: 1362 passed, 0 failed across 100 test files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…related intent dirs

The 2026-05-07 incident: the user noticed `.haiku/intents/cowork-mcp-apps-integration/`
files (`action-log.jsonl`, `.last_action.json`, `stages/development/`) showing up
as untracked in `git status` even on `main` branch. The intent dir wasn't
hardcoded anywhere, no hook or migration script was at fault — the engine
itself was targeting it.

Root cause (`haiku_run_next` auto-resolve):

  1. Caller passes `intent` directly                       ← explicit
  2. Current git branch is `haiku/<slug>/main` or
     `haiku/<slug>/<stage>`                                ← explicit
  3. Else: pick the sole active intent on disk             ← BUG

Step 3 violated the user's invariant: "if I'm not on an intent's branch and
I haven't said 'intent: foo', the engine has no business writing to any
intent dir." The cowork intent had `status: active` (committed by PRs
#174/#180/#238), and it was the only such intent on disk, so every workflow
tick — including ones triggered by hooks the user didn't initiate — picked
it up via step 3 and wrote runtime journals into its dir.

The fix: in git mode, drop step 3 entirely. If the user isn't on an intent
branch and didn't pass `intent`, the engine refuses to auto-target. The
error message is explicit: "Pass `intent` explicitly, or `git switch
haiku/<slug>/main` to scope the engine to a specific intent."

Filesystem mode (no git) has no branch signal — keep the "sole active
intent on disk" fallback there since it's the only auto-resolve available.

Three companion changes ship in the same commit:

- `.gitignore` patterns for the engine's runtime journals
  (`.haiku/**/action-log.jsonl`, `.last_action.json`, `write-audit.jsonl`).
  Even when the engine is correctly scoped, these are WAL-style journals
  that don't belong in git. Adding them to the project's gitignore protects
  every plugin user from the same pollution at the source.

- `intent-scope-isolation.test.mjs` — pins the contract:
   * non-intent branch + no explicit `intent` → refuse, don't name the
     stranger intent in the error
   * intent branch → auto-resolve from branch
   * explicit `intent` arg → flow through regardless of branch

- Remove the tracked `.haiku/intents/cowork-mcp-apps-integration/` files.
  These are zombies from PRs #174/#180/#238 that should never have been
  checked in. The engine fix prevents the dir from getting recreated.

Sweep: 1365 passed, 0 failed across 101 test files.

Note: the bundled `plugin/bin/haiku.mjs` is from v3 (last touched at
`ffd22c3f3`) and still writes `.last_action.json` — that's a stale-bundle
issue, not an engine bug. Will resolve on the next plugin release when CI
rebuilds the bundle from current source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 8, 2026

Claude finished @jwaldrip's task in 8m 57s —— View job


Claude finished @jwaldrip's task ——


PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardening

Three new commits reviewed since the last review:

  • 37ec691 — numeric ids refactor (feedback_id: integer, 3-digit padding)
  • b877e52 — session debug gap fixes (cross-stage FB walk, drift FM-aware, model routing)
  • 1ccae8d — intent scope isolation (drop sole-active fallback in git mode)

The intent scope isolation fix (1ccae8d) is well-implemented with solid tests. The drift FM-awareness in b877e52 is a genuine improvement. The numeric id refactor (37ec691) is a good long-term fix but introduced a new critical breakage in the fix loop that blocks every fix-hat dispatch.


Status of all previously-flagged issues

# Severity Status File
1 Critical OPEN run-tick.ts:82 — exact semver compare
2 Critical OPEN v0-to-v4.ts + cursor.ts:312"rejected" not normalized
3 Critical OPEN haiku_run_next.ts:628withIntentMainLock never called
4 High FIXED ✓ close_feedback.ts prompt builder added
5 High OPEN locks.ts:61isAlive bare catch steals live locks
6 High FIXED ✓ Stage[0] fallback removed from cursor
7 Medium OPEN cursor.ts:495 — cutoff === -1 walks all stages
8 Medium OPEN sign-slot.ts:105 — repo-relative outputs joined against intentDir
9 Medium OPEN cursor.ts:570 — raw-basename fallback in parseFbIdFromFilename
10 Minor OPEN run-tick.ts:99 — migration error returns track: "drift"
11 Minor OPEN v0-to-v4.ts:247 — no-op started_at double-write
N1 High OPEN haiku_run_next.ts:486close_feedback uses "FB-NNN" as file prefix; never matches "NNN-*.md"
N2 High FIXED ✓ Picker timeout returns isError: true
N3 Medium OPEN clarify_required.ts:56 — instructs haiku_human_write on intent.md (blocked)
N5 Medium OPEN git-worktree.ts:775restoreBranch restore not in finally

CRITICAL — new issue from 37ec691

N7. start_feedback_hat emits feedback_id: "FB-001" (string) but every haiku_feedback_* tool now requires integer — fix loop broken for every dispatch

File: packages/haiku/src/orchestrator/prompts/start_feedback_hat.ts:153,159,163

The 37ec691 commit changed feedback_id to Type.Integer({ minimum: 1, maximum: 999 }) across all 8 haiku_feedback_* tools. The start_feedback_hat prompt builder was not updated. It still emits tool call templates with the old string form:

haiku_feedback_read { intent: "foo", stage: "bar", feedback_id: "FB-001" }
haiku_feedback_advance_hat { intent: "foo", stage: "bar", feedback_id: "FB-001" }
haiku_feedback_reject_hat { intent: "foo", stage: "bar", feedback_id: "FB-001" }

The schema description (state/schemas/inputs/feedback-variants.ts:37) explicitly says: "Just a number; no FB- prefix, no zero-padding, no string form." Every agent following this prompt verbatim hits haiku_feedback_advance_hat_input_invalid on every call, and the prompt additionally says "pass each feedback_id verbatim, no substitution required" — which locks the agent into the failing behavior.

The fix loop is completely non-functional in production as of 37ec691.

Fix: In start_feedback_hat.ts, extract the integer from fbId for tool-call templates:

// fbId is "FB-001" from parseFbIdFromFilename; strip prefix → integer
const fbNum = parseInt(fbId.replace(/^FB-/i, ""), 10)

Then replace the template strings:

  • Line 153: feedback_id: ${fbNum} (no quotes)
  • Line 159: feedback_id: ${fbNum} (no quotes)
  • Line 163: feedback_id: ${fbNum} (no quotes)

Also update the preamble to show the integer as the canonical form, consistent with the schema description.

Fix this →


CRITICAL — carry-forward (all three still blocking)

1. Exact semver compare breaks all v4 intents on first CI bump after merge

packages/haiku/src/orchestrator/workflow/run-tick.ts:82

if (sourceVersion !== target) {   // "4.0.0" !== "4.0.1" → triggers
    migrateIntent(ctx, sourceVersion, target)  // no path → throws

targetMajor is computed on line 75 but the migration guard on line 82 still uses exact string compare. The registry has one edge: "0" → "4.0.0". First CI bump after this lands (4.0.0 → 4.0.1) makes every v4 intent throw "no migration path" on every tick.

Fix: const sourceMajor = Number(sourceVersion.split(".")[0] ?? "0") || 0; if (sourceMajor !== targetMajor)

Fix this →


2. result: "rejected" still not normalized — migrated units stuck as in-progress forever

packages/haiku/src/orchestrator/migrations/v0-to-v4.ts:247

cursor.ts:312 checks last.result === "reject" (present tense). v3 units wrote result: "rejected" (past tense). The migrator at line 247 copies iterations[] as-is. Migrated units with result: "rejected" return null from walkUnitTrack — treated as in-flight, blocking the wave on every tick with no re-dispatch.

Fix: After next.iterations = data.iterations ?? [], add:

for (const iter of next.iterations as Array<Record<string, unknown>>) {
    if (iter.result === "rejected") iter.result = "reject"
    if (iter.result === "advanced") iter.result = "advance"
}

Fix this →


3. withIntentMainLock still never called — stage→main merges unserialised

packages/haiku/src/tools/orchestrator/haiku_run_next.ts:628

const mergeOutcome = mergeStageBranchIntoMain(slug, stageToMerge)

No lock. withIntentMainLock is defined in locks.ts:202, imported (unused) in state-tools.ts:62, and promised in merge_stage.ts:28 and merge_intent.ts:28. Concurrent haiku_run_next calls can race on the same stage→main merge.

Fix: Import withIntentMainLock in haiku_run_next.ts and wrap: await withIntentMainLock(slug, () => mergeStageBranchIntoMain(slug, stageToMerge))


HIGH — still open

N1. close_feedback handler never finds production feedback files — infinite loop

packages/haiku/src/tools/orchestrator/haiku_run_next.ts:486

parseFbIdFromFilename("001-foo.md") returns "FB-001". The close handler checks:

if (!f.startsWith(`${fbId}-`) && !f.startsWith(`${fbId}.`)) continue
// "001-foo.md".startsWith("FB-001-") → false → always skips

fbFile stays empty → break exits without stamping closed_at → cursor re-emits close_feedback on the next tick → infinite loop.

The drift test fixtures use FB-DRIFT-01.md (non-standard naming) which accidentally satisfies f.startsWith("FB-DRIFT-01.") — tests pass, production is broken.

After 37ec691, the state-tools.ts lookup path correctly uses numeric matching (^(\d+)-). The close_feedback handler needs the same treatment:

Fix: Replace the readdirSync loop with findFeedbackFile(slug, stage, fbId) from state-tools.ts, which normalises "FB-001" → numeric 1 → matches 001-*.md. Also update the drift-mid-flight-e2e test fixtures to standard NNN-*.md naming so this can't silently regress.

[Fix this →](https://claude.ai/code?q=In%20packages%2Fhaiku%2Fsrc%2Ftools%2Forchestrator%2Fhaiku_run_next.ts%20in%20the%20close_feedback%20while%20loop%20(around%20line%20480-495)%2C%20replace%20the%20readdirSync%20loop%20%60for%20(const%20f%20of%20readdirSync(fbDir))%20%7B%20if%20(!f.startsWith(%60%24%7BfbId%7D-%60)%20...%60%20with%20a%20call%20to%20findFeedbackFile(slug%2C%20stage%2C%20fbId)%20from%20state-tools.ts.%20parseFbIdFromFilename%20returns%20%22FB-001%22%20but%20files%20on%20disk%20are%20named%20%22001-foo.md%22%20so%20startsWith(%22FB-001-%22)%20is%20always%20false.%20Also%20update%20test%20fixtures%20in%20drift-mid-flight-e2e.test.mjs%20to%20use%20standard%20001-*.md%20naming%20so%20the%20bug%20can%27t%20silently%20regress.%20Branch%3A%20haiku%2Fv4-engine-refactor&repo=gigsmart%2Fhaiku-method)


5. isAlive bare catch steals live locks on Windows / cross-user Linux

packages/haiku/src/locks.ts:61

process.kill(pid, 0) throws EPERM (not ESRCH) when the target process is alive but owned by a different user. The bare catch treats EPERM as dead and steals the lock.

Fix: } catch (err) { return (err as NodeJS.ErrnoException).code === "EPERM" }

Fix this →


MEDIUM — carry-forward open

7. walkFeedbackTrack walks all stages when current stage is unknown

packages/haiku/src/orchestrator/workflow/cursor.ts:495

const toWalk = cutoff >= 0 ? stages.slice(0, cutoff + 1) : stages.slice()  // walks ALL on cutoff === -1

Note: the "all stages merged" path is now correct — derivePosition passes the last stage as currentStage in that case (cursor.ts:958). The remaining risk is a renamed/unknown stage: if stages.indexOf(currentStage) === -1, stages.slice() walks all stages including future ones.

Fix: stages.slice()[] when cutoff === -1.

8. Repo-relative output paths still joined against intentDir

packages/haiku/src/orchestrator/workflow/sign-slot.ts:105

buildOutputWitnesses at line 105 still does join(intentDir, out) for all output paths. A unit with outputs: ["src/components/Button.tsx"] resolves to <intentDir>/src/components/Button.tsx — a path that doesn't exist. The drift sweep at drift-sweep.ts:282 has the same pattern. The b877e52 commit added FM-awareness to the hash comparison but did not fix the base-dir selection for repo-relative paths.

Fix: Paths not starting with stages/ are repo-relative and should join against repoRoot.

N3. clarify_required prompt instructs a permanently-blocked write path

packages/haiku/src/orchestrator/prompts/clarify_required.ts:56

The prompt tells agents to call haiku_human_write on intent.md — the hook explicitly refuses workflow-managed files. haiku_intent_set { field: "clarifications" } also fails because clarifications is not in INTENT_FRONTMATTER_SCHEMA.properties (intent.ts:105). The clarify gate is permanently stuck.

Fix (both parts needed):

  1. Add clarifications: Type.Optional(Type.Record(Type.String(), Type.Unknown())) to INTENT_FRONTMATTER_SCHEMA in state/schemas/intent.ts
  2. Update the prompt to use haiku_intent_set { field: "clarifications", value: { [stage]: { answers, at } } }

N5. restoreBranch checkout not in finally — wrong-branch writes on unexpected throw

packages/haiku/src/git-worktree.ts:775

reconcileMisroutedStageMerges checks out intentMain at line 742 and restores at line 775, but the restore is inside the outer try. An unhandled exception between checkout and restore leaves the worktree on intentMain.

Fix: Move the if (restoreBranch) block into a finally {} that always runs.


Newly fixed (since last review)

Fix Commit
N2 — picker timeout loops 580e77d
Cross-stage FB walk blocks after all stages merged b877e52
Intent auto-resolve scope pollution 1ccae8d
Drift sweep FM-aware (no false-positives on markdown FM edits) b877e52
Model routing cascade in all dispatch sites b877e52

Summary

Must fix before merge:

Issue File What breaks
N7 (NEW) start_feedback_hat.ts:153,159,163 Fix loop broken — every agent attempt to advance/reject a hat gets schema-rejected because prompt emits "FB-001" string but schema requires integer
#1 run-tick.ts:82 All v4 intents break on the first CI bump (4.0.0→4.0.1 → "no migration path" on every tick)
#2 v0-to-v4.ts:247 Migrated units with v3-style result:"rejected" are permanently stuck as in-flight
#3 haiku_run_next.ts:628 Stage→main merges unserialised; concurrent ticks can race and corrupt merge state
N1 haiku_run_next.ts:486 Every production fix-loop closure loops infinitely; close_feedback never finds NNN-*.md files

N7 is new and was not in any prior review — it was introduced by 37ec691 and means the fix loop is currently broken for all production feedback dispatch.

Three reviews flagged the same issues across the engine. Closing them all
in one commit.

CRITICAL:

#1 (run-tick.ts:82) — Exact semver compare on plugin_version would have
broken every v4 intent on the first CI bump after release (4.0.0 → 4.0.1
→ migrateIntent throws "no migration path"). Compare majors only; the FM
stamp marks schema generation, not the build that touched it. Same edit
also fixes #10 — the migration-failure error path now returns
`track: "intent"` instead of the misleading `track: "drift"` ghost event.

#2 (v0-to-v4.ts) — v3 wrote `result: "rejected"` / `"advanced"`; v4's
cursor only matches `"reject"` / `"advance"`. A migrated unit's last
iteration with a past-tense result fell through both checks and was
treated as in-flight on the current hat — the wave never progressed.
Added a normalization pass in both migrateUnitFile and migrateFeedbackFile
so the drift dies at the source. The v0-to-v4-realistic-scenario test
that asserted the buggy preserve-as-rejected behavior now asserts the
correct normalized form. Same edit removes the no-op `started_at`
double-write (issue #11).

#3 (haiku_run_next.ts:628 + side-effects.ts) — `withIntentMainLock` was
imported in state-tools.ts and promised to the agent in merge_stage.ts /
merge_intent.ts but never actually called. Stage→intent-main merges ran
unserialised; two concurrent ticks (autopilot retry overlapping a manual
run) could race on the merge commit. Wrapped all three call sites
(haiku_run_next merge_stage loop + side-effects' pre-stage cleanup +
finalize-stage paths).

HIGH:

#5 (locks.ts:61) — `isAlive` bare-catch returned false for every error
including EPERM. On Windows / cross-user Linux, `process.kill(pid, 0)`
throws EPERM (not ESRCH) when the target process is alive but
unsignalable. Lock-stealing kicked in against live holders. Differentiate:
ESRCH = dead, EPERM = alive-but-unpingable.

N1 (haiku_run_next.ts:484) — close_feedback handler scanned the FB dir
with `f.startsWith("FB-01-")` but real files are named `01-slug.md`. The
match never fired → fbFile stayed empty → break → cursor re-emits
close_feedback every tick → infinite loop. Replaced with the existing
`findFeedbackFile` helper (single source of truth that already handles
both wire-form FB-NNN and on-disk NNN-slug.md). Tests passed accidentally
because fixtures used non-standard `FB-DRIFT-NN-` naming.

MEDIUM:

#7 (cursor.ts:495) — walkFeedbackTrack fell back to the entire stages
list when currentStage wasn't found. Walking unknown future stages was
the opposite of what "walk up to current" meant. Fall back to `[]`
instead.

#8 (drift-sweep.ts:282 + sign-slot.ts:104) — Output paths come in two
shapes: intent-relative (`stages/design/foo.md`) and repo-relative
(`src/components/Button.tsx`). Both were joined against intentDir, so
repo-relative paths resolved to non-existent locations and silently
skipped drift detection — the most important code artifacts had no
witness. Distinguish by leading `stages/`: intent-relative goes to
intentDir, everything else to repoRoot. Applied at both sign time and
check time.

#9 (cursor.ts:574) — `parseFbIdFromFilename` fallback returned the raw
basename for non-numeric filenames; downstream `findFeedbackFile`
rejected the unparseable ID with `feedback_not_found`, which the cursor
ticked into infinite re-dispatch. Return null instead, and skip the FB
in `nextActionForFeedback` when the ID can't be parsed. Also bumped the
padStart to 3 (matches the 2026-05-07 numeric-ID refactor).

N3 (intent.ts + clarify_required.ts:56) — The clarify gate instructed
agents to write `intent.md` via `haiku_human_write` (which refuses
intent.md) and the field `clarifications` wasn't in the intent schema's
properties so `haiku_intent_set` rejected it too. Both paths blocked. Add
`clarifications` as `Type.Record(Type.String(), Type.Unknown())` to the
schema (so it's in AGENT_AUTHORABLE_INTENT_FIELDS) and rewrite the
prompt to use `haiku_intent_set` with a merge note (the field overwrites,
so the agent must read prior stages first).

N5 (git-worktree.ts:732) — `restoreBranch` checkout-restore was inside
the same try block as the operation. An unhandled throw between
`git checkout intentMain` and the restore would skip the restore and
leave the worktree on the wrong branch — every subsequent agent write
lands on intentMain instead of the stage branch. Moved restore into a
`finally` block so it always runs when restoreBranch is set.

Sweep: 1365 passed, 0 failed across 101 test files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 8, 2026

Claude finished @jwaldrip's task in 11m 30s —— View job


PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardening


The 3938863 commit closes all 12 issues that prior reviews had flagged. Every fix was verified against the actual diff. Well done on the thoroughness — the migration normalizer, lock serialization, close_feedback lookup, and finally-based branch restore all land correctly.

Two issues remain blocking merge: one carry-forward (N7, not in the 12) and two new CI failures introduced by the latest commit.


Status of all previously-flagged issues

# Severity Status
1 Critical FIXED ✓ — major-only version compare (sourceMajor !== targetMajor)
2 Critical FIXED ✓ — normalization pass in both migrateUnitFile + migrateFeedbackFile
3 Critical FIXED ✓withIntentMainLock wraps all three merge call-sites in haiku_run_next + side-effects.ts
4 High FIXED ✓ (prior)
5 High FIXED ✓catch (err) { return err.code === "EPERM" }
6 High FIXED ✓ (prior)
7 Medium FIXED ✓cutoff === -1 now returns []
8 Medium FIXED ✓ — repo-relative paths (!startsWith("stages/")) joined against repoRoot in both drift-sweep.ts and sign-slot.ts
9 Medium FIXED ✓parseFbIdFromFilename returns null; nextActionForFeedback skips the file
10 Minor FIXED ✓ — migration error now returns track: "intent"
11 Minor FIXED ✓ — no-op started_at double-write removed
N1 High FIXED ✓findFeedbackFile replaces broken startsWith prefix scan
N2 High FIXED ✓ (prior)
N3 Medium FIXED ✓clarifications added to INTENT_FRONTMATTER_SCHEMA; prompt updated to use haiku_intent_set
N5 Medium FIXED ✓restoreBranch checkout moved into finally block
N7 Critical STILL OPEN — see below

CRITICAL — not in the 12, still open

N7. start_feedback_hat emits feedback_id: "FB-001" (string) — fix loop schema-rejected on every call

File: packages/haiku/src/orchestrator/prompts/start_feedback_hat.ts:153,159,163

The 37ec691 refactor changed all 8 haiku_feedback_* tool schemas to feedback_id: Type.Integer({ minimum: 1, maximum: 999 }). The description is explicit: "Just a number; no FB- prefix, no zero-padding, no string form."

start_feedback_hat.ts was not updated. The feedbackIds array comes from the cursor as strings like "FB-001". The prompt template on lines 153, 159, and 163 embeds them with quotes:

`feedback_id: "${fbId}"` // → feedback_id: "FB-001"

Line 128 also instructs agents: "pass each feedback_id verbatim, no substitution required." An agent following this verbatim passes "FB-001" (string) to every haiku_feedback_read, haiku_feedback_advance_hat, and haiku_feedback_reject_hat call. AJV rejects with haiku_feedback_read_input_invalid (type mismatch: string vs integer). The fix loop cannot proceed.

Fix:

for (const fbId of feedbackIds) {
    const fbNum = parseInt(fbId.replace(/^FB-/i, ""), 10)
    // ...
    `feedback_id: ${fbNum}` // no quotes — integer literal

Also update line 128 to clarify the agent should use the integer, not the "FB-NNN" display form.

Fix this →


CI FAILURES — blocking merge

C1 (Biome Lint — 42 errors)

The latest commit introduced 42 lint errors and 12 warnings across files touched by the fix pass. All are safe-fixable formatting / import-order issues (no logic changes required). Affected files:

  • packages/haiku/src/http/session-routes.ts — import ordering, formatting
  • packages/haiku/src/locks.ts — formatting (multi-line function signatures collapsed differently than Biome expects)
  • packages/haiku/src/orchestrator/migrate-registry.ts — formatting (throw new Error(...) line wrapping)
  • packages/haiku/src/orchestrator/migrations/field-hygiene.ts — import order (existsSync, readFileSync, readdirSyncexistsSync, readdirSync, readFileSync), formatting
  • packages/haiku/src/orchestrator/migrations/v0-to-v4.ts — import order, formatting
  • And others (42 total errors reported)

Run npx @biomejs/biome check --write packages/haiku/src/ to auto-fix all of them, then verify and commit.

Fix this →

C2 (MCP Tests — drift coverage guard)

scripts/check-ci-covers-drift-tests.mjs requires four named test files. The v4 engine refactor (b743524) deleted drift-detection-gate.test.mjs and upstream-reconciliation.test.mjs because the pre-tick gates (drift-detection-gate.ts, upstream-reconciliation.ts) were absorbed into the v4 cursor. But the CI guard was not updated. Result: CI fails on every push.

The two missing files were added in commit 7af2c54 (which predates the v4 branch) and deleted in b743524. Neither exists anywhere in the current branch's git history.

Two options:

  1. Update the guard — remove the two deleted filenames from REQUIRED_FILES in check-ci-covers-drift-tests.mjs and add the v4-equivalent coverage files that actually test the cursor's drift and feedback-track paths (e.g. cursor-walk.test.mjs, drift-scenarios.test.mjs).
  2. Recreate the test files — write new drift-detection-gate.test.mjs and upstream-reconciliation.test.mjs that test the equivalent v4 cursor scenarios (the cursor now handles what those two gate modules did).

Option 1 is faster and more honest — the v4 cursor distributes that logic into the main cursor walk rather than standalone gate modules, so pointing the guard at the files that actually test the cursor is the right call.

Fix this →


Summary

Must fix before merge:

Issue File What breaks
N7 start_feedback_hat.ts:153,159,163 Fix loop completely broken — agents pass "FB-001" (string) to tools that require integer feedback_id; every advance/reject/read call is schema-rejected
C1 Multiple files (42 errors) CI Biome Lint fails; merge blocked
C2 scripts/check-ci-covers-drift-tests.mjs CI MCP Tests fails; two required test files deleted in v4 refactor, guard not updated

All 12 issues from prior reviews are resolved. The three items above are what stands between this PR and a clean CI.

…uards

Three blockers from the latest review (2026-05-08):

N7 (Critical) — start_feedback_hat / review_fix / intent_completion_fix
emitted `feedback_id: "FB-001"` (string) into every tool-call template.
The 37ec691 numeric-id refactor flipped the schema to
`Type.Integer({ minimum: 1, maximum: 999 })` with explicit description
*"Just a number; no FB- prefix, no zero-padding, no string form."* —
agents following the prompt verbatim hit `<tool>_input_invalid` on
every advance/reject/read call, so the entire fix loop was dead.

Fix: derive `fbNum` once at dispatch time (parse the digit suffix off
`FB-NNN`) and embed it as a JSON integer literal — `feedback_id: 1`,
no quotes. Headings keep the `FB-NNN` display form for human
readability; only the JSON args change. Updated all three prompt
builders that compose fix-loop subagent blocks.

C1 — 42 Biome lint errors + 12 warnings introduced by the prior
commit's hand-edits (import ordering, optional-chain folding, etc.).
Ran `biome check --write` then `biome check --write --unsafe` to
clear the unused-imports backlog. Net effect across 35 files: pure
formatting / dead-code removal, no logic changes. Spot-checked the
files I'd actually touched; the rest are auto-formatter passes on
files that just hadn't been re-formatted since their last edit.

C2 — `scripts/check-ci-covers-drift-tests.mjs` required four named
test files: `drift-detection-gate.test.mjs` and
`upstream-reconciliation.test.mjs` were deleted in the v4 cursor
refactor (b743524) when the standalone pre-tick gates were absorbed
into the cursor walk; `drift-baseline.test.mjs` and
`drift-markers.test.mjs` survived. CI failed on every push since v4
landed.

Replaced REQUIRED_FILES with the v4-equivalent coverage:
`drift-baseline`, `drift-markers`, `drift-mid-flight-e2e`,
`drift-scenarios`, and `cursor-walk` — the files that actually
exercise the cursor's drift behavior, marker bookkeeping, and
track-walk priorities. Updated the header docstring + failure
message to match.

Sweep: 1365 passed, 0 failed across 101 test files.
Biome: 0 errors, 0 warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 8, 2026

Claude finished @jwaldrip's task in 7m 57s —— View job


Claude finished @jwaldrip's task in ~15m ——


PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardening

The bb08b27 commit addressed all three previously-open blockers (N7 integer feedback_id in prompt templates, C1 Biome lint, C2 drift-test CI guard). Every issue from prior reviews is now resolved. Two new CI failures were introduced.


Status of all previously-flagged issues

# Severity Status
1 Critical FIXED ✓ — major-only semver compare (sourceMajor !== targetMajor)
2 Critical FIXED ✓ — normalization pass in both migrateUnitFile + migrateFeedbackFile
3 Critical FIXED ✓withIntentMainLock wraps all three merge call-sites
4 High FIXED ✓ (prior)
5 High FIXED ✓catch (err) { return err.code === "EPERM" }
6 High FIXED ✓ (prior)
7 Medium FIXED ✓cutoff === -1 returns []
8 Medium FIXED ✓ — repo-relative paths joined against repoRoot in both drift-sweep.ts and sign-slot.ts
9 Medium FIXED ✓parseFbIdFromFilename returns null; nextActionForFeedback skips the file
10 Minor FIXED ✓ — migration error returns track: "intent"
11 Minor FIXED ✓ — no-op started_at double-write removed
N1 High FIXED ✓findFeedbackFile replaces broken startsWith prefix scan
N2 High FIXED ✓ (prior)
N3 Medium FIXED ✓clarifications added to INTENT_FRONTMATTER_SCHEMA; prompt updated to haiku_intent_set
N5 Medium FIXED ✓restoreBranch checkout in finally block
N7 Critical FIXED ✓fbNum integer literal in all three prompt builders (start_feedback_hat.ts, review_fix.ts, intent_completion_fix.ts)
C1 CI FIXED ✓ — 0 Biome errors/warnings
C2 CI FIXED ✓ — check-ci-covers-drift-tests.mjs updated to reference v4-equivalent files

CI FAILURES — blocking merge

C3 (Medium): packages/haiku-api/src/schemas/session.ts — 61 exports exceeds 60-export budget

Failure: Biome Lint job → "Check for god files" step

The PR added PickerSelectRequestSchema and PickerSelectResponseSchema to session.ts (packages/haiku-api/src/schemas/session.ts), pushing it from 59 to 61 exports — one over the 60-export budget enforced by scripts/check-no-god-files.mjs.

Fix (two options):

  1. Extract picker schemas to packages/haiku-api/src/schemas/picker.ts (preferred — matches the per-concern split pattern).
  2. Add an allowlist entry to scripts/check-no-god-files.mjs with a written reason if the file is cohesive.

Fix this →


C4 (Medium): squash-merge-fallback.test.mjs — glab fallback test fails on GitHub Actions

Failure: MCP Tests job → "Run MCP test suite" step
Test: "provider fallback success: glab reports merged MR → returns true" (squash-merge-fallback.test.mjs:245)

The test stubs only glab (not gh) and uses withStubbedPath(stub, fn) which sets process.env.PATH = "${stub}:/usr/bin:/bin". The test author expected that without a gh stub, detectPrTool() would return "glab". But GitHub Actions runners have gh at /usr/bin/gh — which is still in the restricted PATH. So which gh finds the real gh, detectPrTool() returns "gh", and isBranchMerged queries the real GitHub CLI (which returns nothing useful), so the function returns false instead of true.

Fix: In the glab-specific test case, add a stub gh script that returns [] (empty JSON — simulating no merged PRs) so the function falls through to glab:

const stub = makeStubDir({
    gh: "#!/bin/sh\necho '[]'\n",  // gh finds no merged PR → fall through to glab
    glab: "#!/bin/sh\necho '!42  Some merged MR  haiku/sq/foo  merged'\n",
})

Fix this →


MINOR — not blocking

feedback-variants.ts:14 — comment says maximum: 9999, code has maximum: 999

File: packages/haiku/src/state/schemas/inputs/feedback-variants.ts:14

The comment block at the top says Type.Integer({ minimum: 1, maximum: 9999 }). The actual schema on line 38 uses maximum: 999 (consistent with the description string "NNN-…md 3-digit padding (max 999)"). Pure comment typo — the schema value is correct.


What changed since last review (bb08b27)

Fix Verified
N7 — feedback_id: ${fbNum} integer in all three prompt builders
C1 — Biome lint 42 errors cleared
C2 — drift-test guard references v4-equivalent test files
Normalization pass in migrator (issues #2)
withIntentMainLock wraps merge call-sites (issue #3)
restoreBranch in finally block (N5)
clarifications in INTENT_FRONTMATTER_SCHEMA (N3)
findFeedbackFile in close_feedback handler (N1)
EPERM-aware isAlive (#5)
walkFeedbackTrack cutoff === -1 returns [] (#7)
Repo-relative output paths against repoRoot (#8)

Summary

Two CI failures must be resolved before merging:

Issue File What breaks
C3 packages/haiku-api/src/schemas/session.ts 61 exports > 60-export budget → Biome Lint job fails on every push
C4 packages/haiku/test/squash-merge-fallback.test.mjs:245 glab fallback test assumes gh not in /usr/bin; fails on GitHub Actions where gh is at /usr/bin/gh

All prior critical/high issues are resolved. The two CI failures are straightforward to fix. Once cleared, this PR is in good shape to merge.

…k, god-file budget

Three CI failures from the latest claude-review automation run, all
landing in `bb08b27`:

silent-test parser (run-all.mjs): the aggregator regex matched
`ℹ pass N` / `ℹ fail N` (node:test default reporter), but Linux
runners default to TAP-style output (`# pass N` / `# fail N`). 30 of
101 test files fell through both regex sets and reported as 0/0.
The silent-test-loss guard (added 2026-05-06 after the v0-to-v4
migrator tests went silent) then failed the run. Locally on macOS
the `ℹ` glyph form fires, which is why this didn't reproduce.

Fix: extend each regex pair to also match the TAP form. Same handler
for the success-exit and crash-exit paths.

squash-merge-fallback test (CI gh-on-PATH leak): the `provider
fallback success: glab reports merged MR` test stubbed glab and
relied on `withStubbedPath` to hide gh. The PATH stub kept
`/usr/bin` for `which` itself, but GitHub Actions ships `gh` at
`/usr/bin/gh` — so `detectPrTool` ran `which gh` against the real
binary, returned "gh", queried the un-stubbed real CLI, got nothing,
and isBranchMerged returned false. Test passed locally on macOS
(no /usr/bin/gh) and failed on Linux runners.

Fix: inject a stub `which` script alongside the other stubs that
only resolves tools the test deliberately stubbed. Uses `[ -x ... ]`
against the stub dir so `which gh` returns nothing in this test
(only glab was stubbed) regardless of what /usr/bin contains.

god-file budget (haiku-api/src/schemas/session.ts): the file has 61
exports, 1 over the 60-export budget. It's a single concern (SPA
session/wire-payload schemas) and every shape gets a `Schema` const
+ inferred `Type` export — the export count doubles by design.
Splitting it would force every consumer to import from three barrels
for one wire shape.

Fix: add an ALLOWLIST entry with a written reason. Documented the
revisit trigger (a sub-protocol like design-direction growing its own
object graph) so the entry isn't a permanent waiver.

Sweep: 1365 passed, 0 failed across 101 test files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 8, 2026

Claude finished @jwaldrip's task in 4m 2s —— View job


Claude finished @jwaldrip's task ——

  • Gather PR context and prior review history
  • Review latest commit (eb9b36d: "green up merge gate")
  • Verify status of C3 (god-file budget) and C4 (glab fallback test)
  • Full re-scan of engine, migration, MCP tools, SPA
  • Post final review summary

PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardening

The eb9b36d commit closes the final three CI failures flagged in the previous review (C3, C4, and a newly-discovered TAP parser gap). Every issue raised across all prior review rounds is now resolved.


C3, C4, + TAP parser (eb9b36d) — all fixed

Issue Fix
C3session.ts 61 exports over 60-export budget ALLOWLIST entry added to check-no-god-files.mjs with a written reason (single-concern schema file where Schema + Type doubles the count by design)
C4 — glab fallback test fails on GH Actions (gh at /usr/bin/gh leaks into test PATH) Stub which script injected by withStubbedPath — only resolves tools the test explicitly stubbed, so which gh returns nothing unless gh was stubbed
New — TAP output format (# pass N / # fail N) not parsed by run-all.mjs Both regex pairs extended to match TAP (^# pass (\d+)) alongside the existing glyph form (ℹ pass N). Applied to both success-exit and crash-exit paths

The TAP gap is worth noting: 30 of 101 test files were silently reporting 0/0 on Linux CI because tsx defaults to TAP output on Linux runners while emitting the glyph form on macOS. The fix is correct (both forms now matched) and the silent-test-loss guard provides the right backstop so this class of bug stays caught.


Complete issue history — all resolved

# Severity Status Summary
1 Critical FIXED ✓ run-tick.ts exact semver compare → major-only sourceMajor !== targetMajor
2 Critical FIXED ✓ v3 "rejected"/"advanced" not normalized in migrator → normalization pass added to both migrateUnitFile and migrateFeedbackFile
3 Critical FIXED ✓ withIntentMainLock never called → wraps all three merge call-sites in haiku_run_next + side-effects.ts
4 High FIXED ✓ close_feedback had no prompt builder → close_feedback.ts added and registered
5 High FIXED ✓ isAlive bare catch stole live locks on Windows → catch(err) { return err.code === "EPERM" }
6 High FIXED ✓ Stage[0] fallback when all stages merged blocked intent-level approvals → fallback removed
7 Medium FIXED ✓ walkFeedbackTrack cutoff === -1 walked entire stage list → returns []
8 Medium FIXED ✓ Repo-relative output paths joined against intentDir → distinguished by startsWith("stages/"), joined against repoRoot in both drift-sweep.ts and sign-slot.ts
9 Medium FIXED ✓ parseFbIdFromFilename raw-basename fallback caused unresolvable dispatch IDs → returns null; callers skip
10 Minor FIXED ✓ Migration error returned track: "drift" → now track: "intent"
11 Minor FIXED ✓ No-op started_at double-write in unit migrator → removed with explanatory comment
N1 High FIXED ✓ close_feedback handler used "FB-NNN" prefix scan; files are NNN-*.md → replaced with findFeedbackFile()
N2 High FIXED ✓ Picker timeout caused infinite re-open loop → select tools return isError: true on timeout
N3 Medium FIXED ✓ clarify_required instructed haiku_human_write on blocked path → clarifications added to INTENT_FRONTMATTER_SCHEMA; prompt updated to haiku_intent_set
N5 Medium FIXED ✓ reconcileMisroutedStageMerges restore not in finally → moved into finally block
N7 Critical FIXED ✓ start_feedback_hat emitted "FB-001" string; tools require integer → fbNum integer literal in all three prompt builders
C1 CI FIXED ✓ Biome lint 42 errors → 0
C2 CI FIXED ✓ Drift-test guard referenced deleted files → updated to v4-equivalent cursor test files
C3 CI FIXED ✓ God-file budget 61/60 on session.ts → ALLOWLIST entry with written reason
C4 CI FIXED ✓ glab fallback test failed on Linux (real gh leaked in) → stub which injected
TAP CI FIXED ✓ run-all.mjs missed TAP output format → both regex pairs extended

Assessment

All 21 issues across 6 review iterations are resolved. The fixes are thorough and well-implemented — the normalization pass covers both migrateUnitFile and migrateFeedbackFile, the lock serialization wraps all three merge call-sites, the findFeedbackFile replacement in the close handler is correct, and the restoreBranch is now guaranteed by finally.

CI is running now for this commit. No new issues identified. This PR is ready to merge once CI passes.

Response to Brian Suh's "Agents Need Control Flow, Not More Prompts."
Frames the v4 engine refactor (PR #323) as the lived version of his
argument: prompts no longer carry workflow guarantees, the surface
area refuses invalid actions, the cursor decides workflow position
without a model in the loop.

Grounded in Mike's quality_gates session, Panda's silent cross-stage
move, today's intent-scope auto-resolve fix, and the four-round
review cycle that landed the bundle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 8, 2026

Claude finished @jwaldrip's task in 5m 2s —— View job


Claude finished @jwaldrip's task ——

  • Gather PR context and prior review history
  • Check CI status for latest commits
  • Review full PR diff (engine, migration, cursor, MCP tools, SPA, tests)
  • Verify status of all previously-flagged issues
  • Post final review summary

PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardening

One new commit since the last review: 26bac7b — a blog post ("A Guardrail Isn't a Sign"). All code fixes from the six prior review rounds are still in place and verified.


Status of all previously-flagged issues

# Severity Status
1 Critical FIXED ✓sourceMajor !== targetMajor in run-tick.ts:85
2 Critical FIXED ✓ — normalization pass in both migrateUnitFile (line 253) and migrateFeedbackFile (line 364)
3 Critical FIXED ✓withIntentMainLock wraps mergeStageBranchIntoMain at haiku_run_next.ts:597
4 High FIXED ✓ (prior round)
5 High FIXED ✓catch(err) { return err.code === "EPERM" } in locks.ts:67
6 High FIXED ✓ (prior round)
7 Medium FIXED ✓cutoff === -1 returns [] in cursor.ts:515
8 Medium FIXED ✓ — repo-relative paths branch on startsWith("stages/") in both drift-sweep.ts:295 and sign-slot.ts:115
9 Medium FIXED ✓parseFbIdFromFilename returns null on non-numeric filename; nextActionForFeedback skips with cursor.ts:554 guard
10 Minor FIXED ✓ — migration error returns track: "intent"
11 Minor FIXED ✓ — no-op started_at double-write removed
N1 High FIXED ✓findFeedbackFile(slug, stage, fbId) replaces broken startsWith prefix scan in haiku_run_next.ts:465
N2 High FIXED ✓ (prior round)
N3 Medium FIXED ✓clarifications added to INTENT_FRONTMATTER_SCHEMA at intent.ts:98; prompt uses haiku_intent_set
N5 Medium FIXED ✓restoreBranch checkout moved into finally block at git-worktree.ts:775
N7 Critical FIXED ✓fbNum integer literal in all prompt builders (start_feedback_hat.ts:154,160,164)
C1 CI FIXED ✓ — 0 Biome errors
C2 CI FIXED ✓ — drift-test guard updated to v4-equivalent files
C3 CI FIXED ✓ — god-file allowlist entry with written reason
C4 CI FIXED ✓ — withStubbedPath now stubs which so real /usr/bin/gh doesn't leak in
TAP CI FIXED ✓ — run-all.mjs handles both TAP (# pass N) and glyph (ℹ pass N) output

New commit: blog post (26bac7b)

website/content/blog/a-guardrail-isnt-a-sign.md — a post building on Brian Suh's "Agents Need Control Flow, Not More Prompts". The core argument and the concrete grounding (file paths, commit SHAs, tool names) are all accurate. Two minor factual issues worth correcting before publication:

1. Review round count is understated

Line 35: "Four rounds of automated Claude review on the action runner."
Line 41: "all four review rounds went green."

There have been six prior review rounds (six "Claude finished @jwaldrip's task" entries in the PR comments), with this current run being round seven. "Four" understates the actual iteration count, which is part of the post's point about the review rigor.

2. Test count appears to be from an intermediate snapshot

Line 35: "1365 tests across 101 files"

The PR description (written at the start of the 52-hour session) reports 1244 haiku engine tests + 453 SPA tests + 20 website tests = 1717 total. The current branch has 102 haiku test files alone. 1365 looks like an intermediate count from mid-session. The exact current number is the one that can be verified (node packages/haiku/test/run-all.mjs + npx vitest run totals). The phrase "all green at merge" is also premature since CI is still running at time of this review — though given the trajectory, likely accurate once it finishes.

These are editorial; no code changes needed.


Assessment

All 21 code issues across six review rounds are resolved. The engine, migrator, MCP tool schemas, lock serialization, prompt builders, SPA dual-paths, and CI guards are all in the correct state. The only open items are two factual inaccuracies in the blog post that can be corrected before the post goes live.

CI is running — pending outcome of run #25536096645. Based on the fix history and test coverage, this PR is in good shape to merge once CI passes.

Three docs were lagging behind the v4 cursor refactor (commit b743524):

ARCHITECTURE.md (canonical structural reference): updated frontmatter
spec to drop per-stage state.json, switch to NNN-padded filenames,
and document the FM witness fields the cursor walks. Forward-only
lifecycle table now derives status from FM (started_at + iterations[-1])
instead of a stored enum, with a new paragraph clarifying that stages
aren't sealed (only intents are) and firstUnmergedStage rewinds the
cursor when corrective work commits to an earlier stage. Phases section
renamed "cursor-derived" with the v4 five-row table (elaborate, execute,
review, approve/gate, merge). §5 "Workflow tick semantics" rewritten
end-to-end: cursor model with three tracks (drift / feedback / intent),
priority order, derivation function semantics, the actual cursor `kind`
set, and a new §5.8 documenting the v0→v4 migrator (strips deprecated
FM, normalizes past-tense iteration results "rejected"→"reject" /
"advanced"→"advance", deletes per-stage state.json, synthesizes user
approvals on completed units). FB lifecycle section dropped v3 revisit
actions and added classifier-hat triage via haiku_feedback_set_targets.
Hook boundary section gets a new §7.1 covering the MCP input gates
(TypeBox + AJV with stable named error codes).

haiku-method.md (paper): five modes now (continuous / discrete /
discrete-hybrid / autopilot / quick) — was "two." Stage loop reframed
as five conceptual phases derived by the cursor (was a "fixed six-step
loop"); pre-execution adversarial review removed (v4 runs review AFTER
execute, not before). Drift detection section rewritten: hashes embed
in witness records on FM, no baseline.json, no human-via-mcp /
human-implicit authorship classes. "Two commands" became "Commands and
the Workflow Tool" — the agent's contract is haiku_run_next, end of
story. Studio count nudged from "fifteen" to "two-dozen-plus" with the
catalog framed as open-ended. Glossary picked up Workflow Cursor and
Feedback Classification entries; Bolt redefined off iterations[]
history; Spec Review reframed as the engine-built first reviewer.

one-instruction-at-a-time.md (blog): response to Brian Suh's "Agents
Need Control Flow, Not More Prompts." Frames v4 as the lived version
of his argument — H·AI·K·U has always been workflow-first; v4 is where
that lands cleanly. Foregrounds the cursor (derivePosition,
nextHatForUnit), the one-instruction-at-a-time loop semantic, drift
detection (sweep + content hashes), and the per-finding fix loop
dispatch. Slug renamed to match new title.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 8, 2026

Claude finished @jwaldrip's task in 4m 44s —— View job


Claude finished @jwaldrip's task ——


PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardening

One new commit reviewed: 222a863 docs(v4): align paper + arch doc, ship blog post. All 21 code issues from prior review rounds remain resolved. CI is green (Biome Lint ✅, MCP Tests ✅). One new medium issue: the architecture map's tick semantics modal was not updated as part of the v4 cursor model change.


Status of all previously-flagged issues

# Severity Status
1–11, N1–N7 Critical/High/Medium ALL FIXED ✓ — verified unchanged
C1–C4, TAP CI ALL FIXED ✓

What landed in 222a863

plugin/studios/ARCHITECTURE.md — accurate ✓

The full §5 rewrite correctly describes the v4 cursor model. Three-track architecture (drift / feedback / intent), per-stage walk table, §5.8 migrator documentation, v4 action surface table — all verified against the actual cursor.ts implementation. The forward-only lifecycle table now correctly derives status from started_at + iterations[-1].result rather than a stored enum. The new "stages are not sealed, only intents are" paragraph is accurate to firstUnmergedStage behavior.

website/content/papers/haiku-method.md — accurate ✓

Five modes updated (was two). Stage loop reframed as five cursor-derived phases. Drift detection section correctly describes body-sha256 in witness FM with no baseline.json. "Two commands" → "Commands and the Workflow Tool" with haiku_run_next as the single workflow verb. Studio count updated to "two-dozen-plus." Glossary entries for Workflow Cursor and Feedback Classification are accurate.

website/content/blog/one-instruction-at-a-time.md — accurate ✓

The blog cites real artifacts and all claims check out:

  • derivePosition is in packages/haiku/src/orchestrator/workflow/cursor.ts
  • nextHatForUnit at line 291 of the same file ✓ (confirmed with grep)
  • sign-slot.ts:79 is outputSha256, which dispatches to the two hash helpers ✓
  • Commit b743524ab is the v4 base commit ✓
  • The description of nextHatForUnit behavior ("advance → next hat, reject → walk back one") is accurate to the code ✓
  • Word count: ~1,240 words, within the 800–1200 target ✓

Previous review (2026-05-08T04:31:04Z) noted two factual inaccuracies in the old a-guardrail-isnt-a-sign.md post (understated review round count, stale test count). Both are resolved by replacing the post entirely with the new one-instruction-at-a-time.md, which makes no such claims.


MEDIUM — new issue

Architecture map tickSemantics modal is stale — describes v3 pre-advance check model

File: website/app/studios/[slug]/architecture/_components/ModalRouter.tsx:503,534,540,693

The tickSemantics modal (accessible via the "⏱ tick semantics" button on the architecture map page) still describes the v3 pre-advance check architecture:

  • Line 503: "Engine reads on-disk state (intent.md frontmatter, stage state.json, unit/feedback frontmatter)…"state.json doesn't exist in v4.
  • Line 534: "layer 1 — pre-advance checks (run-tick.ts, every tick)" — v4 replaced this with the three-track cursor model.
  • Line 540: "pre-tick consistency — repairs cached active_stage drift, state.json invariants"active_stage and state.json are both gone in v4.
  • Line 693: "…but ≥ 1 sits on a stage earlier than active_stage → emit revisited" — the revisited sideline action was replaced by cursor Track B's cross-stage FB walk.

The .claude/rules/architecture-prototype-sync.md rule at line 57 is explicit: "New tick / pre-advance check / sideline action → Update the tickSemantics modal in ModalRouter.tsx; sync with plugin/studios/ARCHITECTURE.md §5." The v4 cursor model is a fundamental replacement of the pre-advance check system — the modal should describe the three-track cursor (drift / feedback / intent) and the cursor action surface from §5.5 of the newly-updated ARCHITECTURE.md.

Fix this →


CI Status

Check Status
Biome Lint ✅ pass
MCP Tests ✅ pass

Assessment

The code is clean. All 21 issues from six prior review rounds are resolved. CI is green. The 222a863 docs commit is factually accurate and well-aligned to the v4 implementation.

The one remaining gap is the tickSemantics modal in the architecture map — it's user-facing documentation that now describes a model the engine no longer uses. The project's own sync rules require it to be updated when the tick contract changes, and the v4 cursor is the largest tick-contract change in the PR's history. This can land as a follow-up or be addressed before merge, but it's worth calling out explicitly.

The interactive arch map at /studios/<slug>/architecture and the
auto-generated per-studio Mermaid diagrams were lagging behind the v4
cursor refactor. Per .claude/rules/architecture-prototype-sync.md, the
map is canonical — divergence is a bug.

Map data files (`website/app/studios/[slug]/architecture/_data/`):

- `payload-for.ts` — every TransitionKey now maps to a real cursor
  action (`start_unit_hat`, `start_feedback_hat`, `dispatch_review`,
  `dispatch_approval`, `dispatch_quality_gates`, `user_gate`,
  `close_feedback`, `merge_stage`, `merge_intent`, `drift_detected`).
  Removed v3-only `manual-change-assessment`,
  `coverage-review-required`, `output-liveness-review-required`. Mode
  shaping wired in (autopilot collapses reviews to `[spec]` and
  approvals to `[spec, quality_gates]`). Cross-stage feedback routing
  now framed as "purely by file location, no `upstream_stage:`
  field." `withIntentMainLock` noted on merge actions. Visual
  TransitionKey names (`hat-to-hat`, `wave-to-wave`,
  `gate-to-next-stage`) retained — they describe map UI positions,
  not cursor actions. The prose under each makes the v4 reality
  explicit: hats advance via subagent `haiku_unit_advance_hat` calls,
  waves emerge from the cursor's wave-ready predicate.

- `actors.ts` — five-actor section rewritten. User mode taxonomy
  updated to continuous/discrete/discrete-hybrid/autopilot/quick;
  HITL/OHOTL/AHOTL labeled deprecated. Orchestrator notes rewritten
  around cursor model + `firstUnmergedStage` invariant + numeric
  `feedback_id` at the wire. Web UI reframed around `user_gate`
  variants.

- `hooks.ts` — 12 entries → 8. Removed nonexistent ones (`track-
  outputs`, `ensure-deps`, `inject-context`, `subagent-context`,
  `subagent-hook`, `quality-gate`); added `stamp-agent-write` and
  `edit-auto-read-hint`. Verified against `plugin/hooks/hooks.json`
  and `packages/haiku/src/hooks/`.

- `types.ts` — `ModalKind` variant `"preTickTriage"` →
  `"cursorTracks"`.

Map components (`_components/`):

- `ArchitectureMap.tsx` — gate badge label updated
  ("pre-tick triage gate" → "cursor track walk: Track C drift →
  Track B feedback → Track A intent"); spec gate description
  rewritten around `dispatch_review` per role; reject branch points at
  `start_feedback_hat (Track B)`; fix-loop section drops v3
  `feedback_revisit` / `integrate_fix_chains` callouts; pre-intent
  card orientation chain updated.

- `ModalRouter.tsx` — `tickSemantics` modal rewritten around the
  three-track cursor walk and the v4 action surface (round 8 review
  flag); `cursorTracks` modal (was `preTickTriage`) describes Track
  C/B/A; `gateDetail` rewritten; `subagent` modal tool list updated;
  `validation` modal now points at TypeBox + AJV input gate.

Auto-generated diagrams: regenerated all 24 `website/public/workflow-
diagrams/*.mmd` files via `bun run --cwd packages/haiku export:
workflow-diagrams`. Last regeneration was pre-v4
(`df4a4758c`); the .mmd files now match the v4 cursor structure.

Type check (`cd website && npx tsc --noEmit`): clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 8, 2026

Claude finished @jwaldrip's task in 4m 37s —— View job


Claude finished @jwaldrip's task ——

  • Gather PR context and prior review history
  • Review new commit (d35c79d: arch prototype map + workflow diagrams)
  • Verify tickSemantics modal update (flagged in last review as stale)
  • Full re-scan: engine, migration, MCP tools, SPA, architecture map
  • Check CI status
  • Post final review summary

PR Re-Review: feat(v4): cursor-driven engine refactor + UX hardening

One new commit since the last review: d35c79d docs(arch): align prototype map + workflow diagrams with v4 cursor. This commit directly addresses the one remaining gap flagged in the prior review — the tickSemantics modal and broader architecture map were still describing the v3 pre-advance check model.


What landed in d35c79d

The commit updates five data files and two components in website/app/studios/[slug]/architecture/, plus regenerates all 24 website/public/workflow-diagrams/*.mmd files.

ModalRouter.tsxtickSemantics modal fully rewritten ✓

The three specific stale lines called out in the prior review are gone:

Prior round flag Old (v3) New (v4)
state.json reference "Engine reads … stage state.json…" "Engine reads … derives cursor position via derivePosition"
active_stage invariant entry "pre-tick consistency — repairs active_stage drift, state.json invariants" "Track C — drift sweep. Re-hashes signed witnesses…"
revisited sideline action "…emit revisited targeting the earliest such stage" "Track B — feedback. Walks stages 0..active… Cross-stage routing is purely by file location"

The new modal adds: pre-cursor selection gates section, complete v4 action surface, three-track priority walk (C → B → A), four v4 principles ("State on disk is the truth", "Composition is pure", etc.), and a canonical reference pointer to ARCHITECTURE.md §5.

actors.ts — accurate ✓

HITL/OHOTL/AHOTL terminology correctly labeled as deprecated. v4 mode taxonomy (continuous / discrete / discrete-hybrid / autopilot / quick) matches intent.ts. Orchestrator notes rewritten around firstUnmergedStage invariant and numeric feedback_id. All five-actor section verified.

hooks.ts — accurate ✓

Trimmed from 12 to 8 entries. Verified against packages/haiku/src/hooks/ — every entry maps to a real file:
redirect-plan-mode.ts, guard-workflow-fields.ts, workflow-guard.ts, prompt-guard.ts, inject-state-file.ts, context-monitor.ts, stamp-agent-write.ts, edit-auto-read-hint.ts. The four removed entries (track-outputs, ensure-deps, inject-context, subagent-context, subagent-hook, quality-gate) do not exist in the codebase.

payload-for.ts — accurate ✓

All v3-only actions (manual-change-assessment, coverage-review-required, output-liveness-review-required, feedback_dispatch, feedback_revisit, integrate_fix_chains) removed. Every mapped cursor action verified against cursor.ts: start_unit_hat, start_feedback_hat, dispatch_review, dispatch_approval, dispatch_quality_gates, user_gate, close_feedback, merge_stage, merge_intent, drift_detected — all real. withIntentMainLock noted on merge actions and verified against the actual call at haiku_run_next.ts:597. select_studio / select_mode / select_stage pre-cursor gates documented and verified against run-tick.ts:137-163.

types.ts — accurate ✓

ModalKind variant "preTickTriage""cursorTracks". Both ModalRouter.tsx (line 639) and ArchitectureMap.tsx (line 358) updated to use cursorTracks. No dangling references to the old name.

Workflow diagrams — accurate ✓

All 24 .mmd files regenerated via bun run --cwd packages/haiku export:workflow-diagrams. Spot-checked compliance/scope: STAGE.md declares fix_hats: [classifier, compliance-analyst, feedback-assessor]; the generated diagram shows scope_review_fix_bolt_1_classifier → scope_review_fix_bolt_1_compliance_analyst → scope_review_fix_bolt_1_feedback_assessor. ✓


Status of all previously-flagged issues

# Severity Status
1 Critical FIXED ✓sourceMajor !== targetMajor in run-tick.ts
2 Critical FIXED ✓ — normalization pass in migrateUnitFile + migrateFeedbackFile
3 Critical FIXED ✓withIntentMainLock wraps all three merge call-sites
4 High FIXED ✓ (prior)
5 High FIXED ✓catch(err) { return err.code === "EPERM" }
6 High FIXED ✓ (prior)
7 Medium FIXED ✓cutoff === -1 returns []
8 Medium FIXED ✓ — repo-relative paths joined against repoRoot
9 Medium FIXED ✓parseFbIdFromFilename returns null on non-numeric filename
10 Minor FIXED ✓ — migration error returns track: "intent"
11 Minor FIXED ✓ — no-op started_at double-write removed
N1 High FIXED ✓findFeedbackFile replaces broken prefix scan
N2 High FIXED ✓ (prior)
N3 Medium FIXED ✓clarifications in schema; prompt uses haiku_intent_set
N5 Medium FIXED ✓restoreBranch in finally block
N7 Critical FIXED ✓fbNum integer literal in all prompt builders
C1 CI FIXED ✓ — Biome 0 errors
C2 CI FIXED ✓ — drift-test guard updated
C3 CI FIXED ✓ — god-file allowlist entry
C4 CI FIXED ✓ — stub which in glab fallback test
TAP CI FIXED ✓ — run-all.mjs handles both output formats
stale arch map Medium FIXED ✓tickSemantics modal, cursorTracks modal, payload-for.ts, actors.ts, hooks.ts, all 24 .mmd diagrams updated

CI Status

Check Status
Biome Lint ✅ pass
MCP Tests ⏳ in progress (test suite running)

Assessment

All 22 issues across 8 review rounds are resolved. The code is clean, the architecture map is accurate, the workflow diagrams are freshly regenerated, and the blog post cites real artifacts with verifiable line numbers. No new issues identified in d35c79d.

This PR is ready to merge once MCP Tests completes. Based on the clean run of Biome Lint and the drift-test coverage verification passing, the test suite is in good shape.

@jwaldrip jwaldrip merged commit f977488 into main May 8, 2026
4 checks passed
@jwaldrip jwaldrip deleted the haiku/v4-engine-refactor branch May 8, 2026 10:20
jwaldrip added a commit that referenced this pull request May 8, 2026
The 3.17.0 auto-bump (commit b3c09f1) was the bumper bot misclassifying
PR #323 — the squash commit subject was `feat(v4): cursor-driven engine
refactor + UX hardening` (no `!:` and no `BREAKING CHANGE:` footer), so
the regex pass returned `minor` and the bumper short-circuited before
Claude pass 2 could escalate. The cursor refactor is a major release —
the migrator gate at `run-tick.ts:77` keys on `targetMajor >= 4`, so
shipping under a 3.x stamp meant every v0/v3 intent's auto-migration to
v4 silently no-op'd.

Cleanup of the bad release:
- v3.17.0 GitHub release deleted
- v3.17.0 git tag deleted (local + remote)
- haiku-method@3.17.0 unpublished from npm
- This commit bumps to 4.0.0 explicitly; CHANGELOG rewrites the entry
  as a major-release announcement covering the breaking changes (per-
  stage state.json removed, numeric feedback_id at the wire, auto-
  resolve refusal in git mode, upstream_stage:/triaged_at: gone,
  manual_change_assessment action removed).

Bumper-script fix lands in a follow-up commit so the workflow run on
that push correctly bumps 4.0.0 → 4.0.1 (patch) and publishes the
fixed bumper alongside the now-current bundle. Until then this commit
ships v4.0.0 metadata + a fresh bundle built locally with v4.0.0
baked into MCP_VERSION.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jwaldrip added a commit that referenced this pull request May 8, 2026
)

The 2026-05-08 v3.17.0-was-supposed-to-be-v4.0.0 incident: PR #323's
squash subject was `feat(v4): cursor-driven engine refactor + UX
hardening` — `feat:` not `feat!:`, no `BREAKING CHANGE:` footer. The
regex pass returned `minor` and short-circuited before Claude pass 2
could escalate. The cursor refactor + numeric-id schema flip + per-
stage state.json removal shipped under a 3.x stamp, which silently
disabled the v0→v4 migrator (gate keys on `targetMajor >= 4`).

Two changes:

1. Pass 2 (Claude haiku) now runs whenever pass 1 returned anything
   below `major` — not just `patch`. The model can upgrade
   `minor → major` if the diff smells major (engine source rewrites,
   schema wire-type flips, ARCHITECTURE.md structural rewrites,
   migrator-gated version comparisons).

2. Pass 2's result is rank-floored against pass 1: it can upgrade
   (patch → minor → major) but never downgrade. Pass 1 stays the
   floor — an explicit `BREAKING CHANGE:` marker still beats Claude.

Prompt updated with concrete major-signal hints (engine workflow
file renames, schema wire-type flips, ARCHITECTURE/paper structural
rewrites, `targetMajor >= N` migrator gates) so future cursor-class
refactors don't slip through under a `feat:` prefix again.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant