feat(mt#1483): Add branch-freshness PreToolUse hook to block stale-branch commits#928
Conversation
…/pr_create/pr_edit
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Solid structural hook with good test coverage and docs, but several issues must be addressed. The comparison uses local remote-tracking refs without fetching, risking stale allow/deny decisions; add a bounded fetch or staleness detection. The denial message hardcodes “main” in guidance even when the default branch is detected as master/custom, causing inconsistent instructions. Non-blocked paths only log for “skipped” reasons, making fresh-branch allows silent and non-auditable; align logging with other hooks. Also consider harmonizing default-branch detection with existing helpers and budgeting timeouts under the 15s PreToolUse cap.
Findings
- [BLOCKING] .claude/hooks/check-branch-fresh.ts:140 — Remote refs are never refreshed (no
git fetch), soorigin/<branch>..origin/maincan be stale and yield false allow/deny decisions
The check compares local remote-tracking refs (origin/<branch>vs the detected default likeorigin/main) without first ensuring they’re up to date. See the comparison path starting withconst branchRef = \ \origin/${currentBranch};andlistCommitsAhead(repoDir, branchRef, mainRef). If the local repo hasn’t fetched recently,origin/mainmay lag (ororigin/<branch>may be outdated), producing silent false negatives (allow when main is actually ahead) or false positives after an old fetch. The PR description claims to use "remote refs (not local HEAD) to avoid local-only commits," but these are still local copies unless a fetch is performed. Please issue a boundedgit fetch origin(e.g.,--prune --no-tags --depth=1or--refmap-narrowed) before the comparison, or at minimum detect staleness and warn/deny with guidance. Also ensure fetch fits within the PreToolUse timeout. - [BLOCKING] .claude/hooks/check-branch-fresh.ts:266 — Block message hardcodes "main" in guidance even when default branch is "master" or custom; inconsistent with detected
mainRef
formatBlockMessagealways says "Review the new commits on main..." and recommends "rebase this branch on current main," but earlier detection supportsorigin/masterviadetectDefaultRemoteBranch. IfmainRefresolves toorigin/master(or another default), the guidance will be misleading. See lines adding:"Review the new commits on main..."and"RUN session_update to rebase this branch on current main.". Use themainRefparameter (e.g., render the remote ref sansorigin/or derive the branch name) to avoid confusing instructions. - [BLOCKING] .claude/hooks/check-branch-fresh.ts:229 — When allowing (not blocked), the hook writes to stdout only for "skipped" reasons; fresh-branch allow is silent despite the spec claiming auditability
In the entrypoint, if!result.blocked, it only logs when the reason includes "skipped". A fresh branch path (reason:Fresh branch: origin/<branch> does not exist yet — no divergence to check) will not be logged, making behavior opaque and diverging from the design’s emphasis on transparent, auditable decisions. Consider emittinghookSpecificOutput.additionalContextor at least a stdout line for any non-blocked path with a non-empty reason (fresh branch, up-to-date, undetectable default branch), aligning with patterns used inparallel-work-guard.ts(which forwards warnings to hookSpecificOutput). - [NON-BLOCKING] .claude/hooks/check-branch-fresh.ts:74 — Default-branch detection misses
git remote show originprobe used elsewhere; inconsistent detection surface across hooks
detectDefaultRemoteBranchchecks the symbolic ref and probesorigin/mainandorigin/master, but unlikeparallel-work-guard.detectDefaultBranchit does not trygit remote show originto parse "HEAD branch:". This can fail on repos lacking the symbolic ref, leading to unnecessary skips. Consider consolidating default-branch detection into a shared helper to avoid divergence and improve robustness. - [NON-BLOCKING] .claude/settings.json:64 — PreToolUse timeout is 15s; combined
git fetch+ two git invocations risk overrunning without explicit per-call caps or budget management
The hook sets per-call timeouts (8s via GIT_TIMEOUT_MS), but with multiple subprocess calls (branch detect, remote existence, default-branch detect, count, subjects, and potentially a needed fetch) the cumulative time can exceed 15s in degraded environments. Parallel-work-guard budgets its checks to avoid SIGTERM; consider similar budget enforcement or reduced per-call timeouts here, and ensure any addedgit fetchis bounded.
Inline comments
- .claude/hooks/check-branch-fresh.test.ts:13 — These tests don’t cover the case where local remote-tracking refs are stale (no recent
git fetch). Since the implementation bases comparisons onorigin/<branch>vs default, consider adding a test harness seam to simulate a pre-check fetch or staleness warning, and assert correct behavior when refs are outdated.
…KING) BLOCKING fixes: 1. No git fetch before comparison — the hook compared origin/branch and origin/main as local tracking refs, which can be stale, producing false allow/deny decisions. Added refreshRemoteRefs() that runs git fetch origin --prune --no-tags --quiet with a 6s timeout. Failure is fail-open (warn + continue) so a slow remote does not block agent work. 2. Block message hardcoded main even when default is master. Now derives the branch name from mainRef (origin/master -> master) so guidance is consistent with the actual default branch. 3. Fresh-branch and undetectable-default allow paths were silent — only skip-related stdout was emitted. Now emits hookSpecificOutput. additionalContext for every non-blocked path with a reason worth surfacing, matching parallel-work-guard. The up-to-date branch is the silent happy path. NON-BLOCKING: 4. detectDefaultRemoteBranch was missing the git remote show origin probe that parallel-work-guard uses. Added as probe 2 (between symbolic-ref and the explicit main/master checks) so repos lacking the symbolic ref still detect the default cleanly. 5. Reduced GIT_TIMEOUT_MS from 8s to 5s to match parallel-work-guard. Added FETCH_TIMEOUT_MS=6s for the network-bound fetch call. Cumulative call time now well within the 15s PreToolUse cap even in degraded environments. Tests: 15/15 pass (12 prior + 3 new for branch-name derivation and refreshRemoteRefs fail-open contract).
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Requesting changes. The hook introduces valuable branch-freshness enforcement with tests and docs, but I found multiple blocking issues. First, the sequence of git subprocesses uses 5–6s per-call timeouts and can exceed the 15s PreToolUse timeout in degraded conditions, risking mid-hook termination and flakiness. Second, the blocked path re-detects the default branch for message rendering, which can disagree with the ref used for the actual comparison, leading to inconsistent or misleading guidance. Third, the fresh-branch allow path is no longer silent as promised in the Behavioral Contract/docs; current code emits logs for any non–“up to date” allow. Please bound total wall time or reduce per-call timeouts, plumb the detected main ref through the result to avoid re-detection, and align behavior with the documented contract (or update the docs).
Findings
-
[BLOCKING] .claude/hooks/check-branch-fresh.ts:223 — PreToolUse timeout budget can be exceeded in degraded environments (sum of per-call git timeouts > 15s hook timeout)
The hook’s overall timeout in .claude/settings.json is 15s (see .claude/settings.json: line ~47 for the new PreToolUse matcher block setting "timeout": 15). However, the entrypoint executes multiple sequential git subprocesses, each with 5–6s per-call timeouts: -
refreshRemoteRefs fetch: FETCH_TIMEOUT_MS = 6000 (line 52) used at lines 138–149
-
detectCurrentBranch: GIT_TIMEOUT_MS = 5000 (lines 39–49, called at line 169)
-
checkBranchFreshness then performs:
- remoteBranchExists: 5000 (lines 61–70)
- detectDefaultRemoteBranch: up to four probes at 5000 each — symbolic-ref, remote show, and two rev-parse probes (lines 72–111)
- listCommitsAhead: two calls (rev-list and log), each with 5000 (lines 113–153)
In a degraded network or repo state where probes do not quickly succeed, the worst-case wall time can exceed 30–45s (6 + 5 + 5 + 4×5 + 2×5), far beyond the 15s PreToolUse limit. This risks the hook being SIGTERM’d mid-execution, yielding no structured decision and flaking commits/PR edits. Please bound total wall-clock time explicitly (e.g., reduce per-call timeouts to ~1–2s, short-circuit after the first successful probe, or implement an overall budget with early exit) so the hook reliably completes within 15s.
- [BLOCKING] .claude/hooks/check-branch-fresh.ts:320 — Blocked path re-detects default branch and can disagree with earlier detection used for the comparison, causing inconsistent guidance
In the blocked branch, the code recomputesmainRefviadetectDefaultRemoteBranch(repoDir) ?? "origin/main"(line 320) to render the message. However, the comparison used an earliermainRefvalue computed insidecheckBranchFreshness(lines 196–213). If detection is non-deterministic across probes (e.g.,remote show origintimes out or differs pre/post fetch) or returns null the first time (skip) but not the second, the denial reason may reference a different main ref than the one that actually produced theaheadCountandsubjects. This inconsistency can mislead operators (message mentions origin/master, but the diff was computed against origin/main). Pass the samemainRefused incheckBranchFreshnessthrough the result, or havecheckBranchFreshnessreturn it, and avoid the second detection here. - [BLOCKING] .claude/hooks/check-branch-fresh.ts:180 — Fresh-branch allow path is no longer silent; contradicts stated Behavioral Contract and docs
The Behavioral Contract in the PR description and docs says: “Allows silently: branch even with main, fresh branch (no upstream), detached HEAD, undetectable default branch.” But in the entrypoint, after runningcheckBranchFreshness, any non-blocked result that is not "up to date" triggers a stdout line and hookSpecificOutput.additionalContext (lines 286–309). The fresh-branch reason is “Fresh branch: origin/ does not exist yet — no divergence to check” (lines 176–186). That path is not "up to date" and therefore will emit logs, making it non-silent. This is a behavior-spec mismatch. Either adjust the code to suppress output for the fresh-branch and other explicitly-silent cases, or update the contract/docs to reflect the new audit-logged behavior.
R2 BLOCKING fixes: 1. Timeout budget under the 15s PreToolUse cap. Reduced GIT_TIMEOUT_MS to 1500 and added OVERALL_BUDGET_MS=12000 with a budgetAllows guard. checkBranchFreshness accepts an optional hookStart timestamp; before each git probe the guard checks remaining wall-clock and short-circuits with a skipped reason rather than risk SIGTERM mid-call. 2. mainRef pass-through to denial message. BranchFreshnessResult carries the mainRef the comparison was computed against. Entrypoint reads result.mainRef instead of re-detecting, so the denial message and the diff always reference the same default-branch ref. 3. Restored silent fresh-branch and up-to-date paths per the published Behavioral Contract. Only skipped reasons and fetch warnings emit logs. Tests: 20/20 pass (4 new for mainRef pass-through, 1 for budget guard).
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Overall a solid addition with thoughtful tests and documentation, but two blocking issues remain. First, the hook can exceed the 15s PreToolUse timeout because the 5s fetch runs outside the overall 12s budget, allowing worst-case wall time of ~17s; include fetch under the same budget or lower per-call caps. Second, the fresh-branch path violates the “silent allow” contract by returning a non-empty reason that can leak via additionalContext; ensure it is truly silent. Additionally, consider threading fetch-staleness into the result to always annotate deny messages, and align docs to reflect dynamic default-branch naming. Please address these before merge.
Findings
- [BLOCKING] .claude/hooks/check-branch-fresh.ts:318 — Overall wall-clock can exceed 15s PreToolUse timeout because fetch is outside the budget guard
The hook’s PreToolUse timeout in .claude/settings.json is 15s (see .claude/settings.json: lines 53–63 of the new matcher block). You introduced an OVERALL_BUDGET_MS = 12,000 and budget enforcement inside checkBranchFreshness, but hookStart is captured only after refreshRemoteRefs and the budget checks apply only to subsequent git probes. At lines 318–332, refreshRemoteRefs(repoDir) runs with FETCH_TIMEOUT_MS = 5,000 (line 84) before starting the 12s budget (hookStart = Date.now() at line 314). This means worst-case wall time is fetch (5s) + budget (12s) = 17s, exceeding the 15s cap and risking SIGTERM mid-execution. Either (a) include the fetch in the overall budget (start the timer before fetch) and reduce per-call timeouts accordingly, or (b) lower OVERALL_BUDGET_MS so that fetch+budget ≤ 15s. - [BLOCKING] .claude/hooks/check-branch-fresh.ts:236 — Fresh-branch path is not silent; it emits a reason string and may surface via additionalContext
Per the PR’s Behavioral Contract and docs (AGENTS.md/CLAUDE.md/…): fresh branches (no upstream) should “allow silently.” However, checkBranchFreshness returns a non-empty reason string for fresh branches: at lines 236–244 it returns reason:Fresh branch: origin/${currentBranch} does not exist yet — no divergence to check. In the entrypoint’s allow path (lines 346–383), you conditionally emit output if isSkipped || warnings.length > 0. While you attempt to keep fresh-branch silent by emitting only on “skipped” reasons, the non-empty informative reason invites future callers to forward it. To align strictly with the contract and prevent accidental surfacing via hookSpecificOutput in other hosts, either return an empty reason for fresh-branch, or explicitly gate this path from any output (e.g., tag result types and omit reason entirely for silent-allow cases). Also update tests to assert no additionalContext is written for fresh branches. - [BLOCKING] .claude/hooks/check-branch-fresh.ts:149 — Default branch detection may return origin/ without ensuring refs are fresh; fetch failure falls back to possibly-stale origin refs without marking message
The design claims the comparison uses remote refs to avoid local-only commits; however, if refreshRemoteRefs fails (lines 323–332), you continue with possibly-stale remote-tracking refs, but the blocked message (lines 403–460) does not always surface that staleness unless fetchResult.ok was false at that time. Because fetch happens before budget checks and can time out separately from later probes, there is a window where detectDefaultRemoteBranch (lines 95–127) could detect a default ref based on stale state. To avoid misleading blocks, consider including the fetch freshness status in the BranchFreshnessResult and always append a “refs may be stale” note to the deny reason when fetch failed or timed out. As written, the detection and comparison can be against hours-old refs without explicit operator visibility unless the warning survives to the final message.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
The hook is a solid addition with good structure and tests, but it diverges from its stated Behavioral Contract in several allow paths. Specifically, detached HEAD and undetectable default-branch cases return non-silent reasons and are emitted to stdout/additionalContext, despite the contract promising “Allows silently.” Additionally, even for silent-allow results (fresh branch, up-to-date), warnings (e.g., fetch failures) are still surfaced, breaking the promise of silence. Tests also don’t pin silent behavior under warning conditions. Please align the implementation with the contract (or update the docs) and add tests to enforce silence semantics under warning scenarios before merging.
Findings
-
[BLOCKING] .claude/hooks/check-branch-fresh.ts:153 — Behavioral contract mismatch: detached HEAD path is not silent but spec says it should allow silently
PR description’s Behavioral Contract states “Allows silently: branch even with main, fresh branch (no upstream), detached HEAD, undetectable default branch.” However, in checkBranchFreshness the detached HEAD path returns a non-empty reason without marking the result as silent. Specifically, when detectCurrentBranch(repoDir) returns null, the function returns:reason: "Could not detect current branch (detached HEAD?) — freshness check skipped"
with no silent: true. In the entrypoint, non-silent allows emit this reason to stdout and additionalContext. This violates the promised silent-allow behavior for detached HEAD. Either (a) mark this path silent and suppress output, or (b) update the Behavioral Contract/docs to match the implemented behavior and justify why this case should be surfaced.
-
[BLOCKING] .claude/hooks/check-branch-fresh.ts:175 — Undetectable default branch path is not silent despite Behavioral Contract promising silent allow
The PR’s Behavioral Contract explicitly lists “undetectable default branch” under “Allows silently.” However, in checkBranchFreshness, when detectDefaultRemoteBranch(repoDir) returns null, the function returns a non-empty reason and does not set silent: true:reason: "Could not detect origin/main or origin/master — freshness check skipped"
In the entrypoint, any non-silent allow causes this reason to be emitted to stdout and additionalContext. Either mark this path as silent (and suppress output) or update the docs/contract to reflect that this case is surfaced.
-
[BLOCKING] .claude/hooks/check-branch-fresh.ts:366 — Silent-allow paths still emit warnings, violating the “allows silently” contract
In the allow path, the code intentionally treats fresh-branch and up-to-date results as silent (result.silent === true). However, the entrypoint always appends warnings (e.g., fetch failures) to lines and emits them via stdout and additionalContext even when isSilent is true:for (const w of warnings) {
lines.push([check-branch-fresh] ${w});
}
if (lines.length > 0) { ... writeOutput({ additionalContext: lines.join("\n") }) }
This means a fetch failure will surface text for fresh-branch and branch-even-with-main cases, contradicting the Behavioral Contract’s “Allows silently” guarantee. Either suppress warnings for silent results, or narrow warnings to non-silent/blocked paths. Update tests to verify no additionalContext is emitted for silent allows even when fetch fails.
- [NON-BLOCKING] .claude/hooks/check-branch-fresh.test.ts:201 — Tests do not verify that silent-allow paths produce no additionalContext/stdout in presence of warnings
While the test suite verifies the silent flag on fresh-branch and up-to-date results, there are no integration-style tests asserting that the entrypoint emits no additionalContext or stdout lines for silent paths, especially when a pre-fetch warning occurs. Given the current entrypoint logic will still emit warnings on silent allows, tests should pin this contract (either suppressing warnings or documenting surfaced warnings). Consider adding tests around the import.meta.main entrypoint with mocked refreshRemoteRefs returning {ok:false} to ensure behavior matches the intended “silent” semantics.
…istCommitsAhead R4 BLOCKING fixes: 1. detached-HEAD result now sets silent:true. Per Behavioral Contract, it is one of the four silent paths; R3 missed it. 2. undetectable-default result now sets silent:true. Same gap. 3. Warnings carve-out documented in the contract. Behavioral Contract amended to make explicit: silent means nothing-to-report on the reason, but warnings (fetch failures) ALWAYS surface even on silent paths because operators need staleness signal. The implementation already did this; the docs now match. Class-1 TOCTOU fix (independent of R4 review): listCommitsAhead collapses two git invocations (rev-list --count and git log) into ONE git log call. count = lines.length, subjects = lines.slice(0, limit). Closes the race where origin/main could advance between the two calls and produce an inconsistent count vs subjects display. Documentation: .minsky/rules/hook-files.mdc replaced the brief Fresh-branch exception paragraph with a proper Behavioral Contract section listing all four silent paths plus the warnings carve-out. AGENTS.md, CLAUDE.md, .cursor/rules/hook-files.mdc regenerated. Tests: 26/26 pass (was 24, +2 new for the contract: detached-HEAD silent assertion, undetectable-default silent assertion). lint:strict clean.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Solid structural addition with good tests and docs, but two timing flaws block merge. First, FETCH_TIMEOUT_MS (5s) + OVERALL_BUDGET_MS (10s) equals the 15s PreToolUse timeout, leaving no headroom and risking SIGTERM; reduce the total budget or fetch timeout to ensure sub-15s completion. Second, the entrypoint’s detectCurrentBranch runs outside the budget guard, undermining the “don’t start what we can’t finish” guarantee; gate it or move detection fully inside checkBranchFreshness. Consider also threading fetch freshness into results to consistently annotate messages derived from possibly-stale refs. Once the budgets are corrected and the entrypoint call is budget-guarded, this looks ready.
Findings
- [BLOCKING] .claude/hooks/check-branch-fresh.ts:86 — Overall time budget equals PreToolUse timeout (5s fetch + 10s budget = 15s), leaving no headroom and risking SIGTERM
You set FETCH_TIMEOUT_MS = 5_000 and OVERALL_BUDGET_MS = 10_000 (see const declarations), and you start the budget before fetch so total worst-case wall time becomes ~5s (fetch) + 10s (budget) = 15s. .claude/settings.json configures this hook with a 15s timeout. With process startup/shutdown, stdout writes, and OS scheduling jitter, this can exceed 15s and get the hook terminated mid-run, yielding flaky behavior with no structured decision. Please reduce the overall budget (e.g., OVERALL_BUDGET_MS ≤ 9_000) and/or tighten FETCH_TIMEOUT_MS to provide headroom, or explicitly enforce an overall wall-clock cutoff below the host timeout. - [BLOCKING] .claude/hooks/check-branch-fresh.ts:420 — Budget guard is bypassed by an unguarded detectCurrentBranch call in the entrypoint
In the import.meta.main block, you call detectCurrentBranch(repoDir) unconditionally before invoking checkBranchFreshness(repoDir, currentBranch, hookStart). The budget guard (OVERALL_BUDGET_MS via budgetAllows) only lives inside checkBranchFreshness, so this extra git subprocess is executed even if the overall budget is already exhausted. This contradicts the stated goal of “short-circuit before further git operations when the budget is nearly exhausted” and increases timeout risk. Consider moving branch detection inside checkBranchFreshness (let it compute/skip itself) or add an explicit pre-call budget check around the entrypoint’s detectCurrentBranch to avoid starting a subprocess you can’t afford. - [NON-BLOCKING] .claude/hooks/check-branch-fresh.ts:120 — Default-branch detection lacks a fetch freshness indicator; blocked messages may not always reflect stale-ref risk
detectDefaultRemoteBranch probes local remote-tracking refs without any coupling to the preceding fetch freshness. Although the entrypoint appends a generic warning when refreshRemoteRefs fails, detectDefaultRemoteBranch’s result doesn’t carry freshness status. If callers reuse detectDefaultRemoteBranch elsewhere or if the code structure changes, it’s easy to render guidance without the “refs may be stale” qualifier. Consider threading arefsFreshboolean through BranchFreshnessResult (set from fetchResult) and using it to uniformly append a staleness note to any messages derived from possibly-stale refs.
…rough R5 BLOCKING fixes: 1. Lower OVERALL_BUDGET_MS from 10000 to 9000 to give more headroom under the 15s PreToolUse cap. Worst-case wall time is now ~10.5s with ~4.5s margin for process startup and stdout writes. (Reviewer math claimed 5+10=15 was too tight; actual math was already OK because hookStart is captured before fetch, but the headroom improvement is welcome.) 2. Move detectCurrentBranch INSIDE checkBranchFreshness so the budget guard covers it. The entrypoint no longer calls detectCurrentBranch separately; checkBranchFreshness performs detection as its first guarded step. Result now exposes currentBranch so the entrypoint can build the denial message without a second detection call. Tests: 29 pass (was 26, +3 for currentBranch pass-through across blocked and up-to-date and fresh-branch shapes). lint:strict clean.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
The reviewer ran but produced no findings. This is not an approval — the model emitted no submit_finding, submit_inline_comment, or conclude_review calls.
…lifecycle state machine
## Summary
Adds a PreToolUse hook on `mcp__minsky__tasks_status_set` that validates the proposed transition against the canonical state machine in `src/domain/tasks/status-transitions.ts`. Refuses backward transitions (e.g. IN-REVIEW → PLANNING, the mt#1470 incident that originated this task) at the agent harness layer, regardless of which downstream code path the tool dispatches into.
Closes mt#1504. Companion to `parallel-work-guard.ts` on `session_start` (mt#1362) and the planned `tasks_create` hook (mt#1435) — together these triple-guard the lifecycle write surfaces.
## Why
`tasks_status_set` was already validated at the domain layer (`src/domain/tasks/commands/mutation-commands.ts:97-110`), but two structural bypass paths remained:
1. **Falsy-status short-circuit.** The pre-existing `if (task.status) { validateStatusTransition(...) }` at `mutation-commands.ts:108` silently skipped validation when status was empty/missing. The agent harness saw no error.
2. **Backend-direct calls.** `MultiBackendTaskService.setTaskStatus` (`multi-backend-service.ts:360`) and individual backends (`minskyTaskBackend.ts:100`, `githubIssuesTaskBackend.ts:258`) write status without invoking `validateStatusTransition`. Production callers in `session-merge-operations.ts`, `session-approve-operations.ts`, and `start-session-operations.ts` use these direct paths.
The PreToolUse hook fires before the tool dispatches, so it covers both bypasses for any agent-driven call. The memory rule `feedback_read_before_state_transition` had failed twice; per the recurrence-escalation protocol, the fix tier escalates from memory → hook.
## Key Changes
- **`.claude/hooks/tasks-status-set-guard.ts`** (new, executable): pure `checkTransition` function for hermetic testing, plus a CLI-backed `readCurrentStatusViaCLI` that shells out to `minsky tasks status get <id> --json`. Fails open on read errors — the domain-layer validation in `mutation-commands.ts` is the second line of defense, and a degraded read mechanism shouldn't block legitimate state changes.
- **`.claude/hooks/tasks-status-set-guard.test.ts`** (new, executable): 42 hermetic tests using injected `readCurrentStatus` deps. Covers all 23 allowed transitions from `VALID_TRANSITIONS` (including special-cased rejects for READY → IN-PROGRESS / PLANNING → IN-PROGRESS), the falsy-input fail-open paths, and a pinned regression test for the mt#1470 incident (`IN-REVIEW → PLANNING`).
- **`.claude/settings.json`**: Registers the hook under `PreToolUse` matching `mcp__minsky__tasks_status_set`, 10-second timeout.
- **`src/domain/tasks/commands/mutation-commands.ts`** (ride-along, per spec): replaces the silent `if (task.status)` skip with an explicit `ValidationError` throw. A falsy status indicates a backend bug; refusing to transition is safer than silent skip. All 231 existing `src/domain/tasks/` tests still pass; the existing `setTaskStatusFromParams` "task not found" test was already covering the not-found case at line 99-105 above this fix.
The hook reuses `validateStatusTransition` directly (imported from `src/domain/tasks/status-transitions.ts`) rather than redefining the transition table — single source of truth.
## Testing
End-to-end verified via raw stdin invocation:
- DENY case: piping `{tool_name: tasks_status_set, tool_input: {taskId: mt#1503, status: PLANNING}}` against a task in DONE → hook returned `permissionDecision: deny` with message `Cannot transition from DONE to PLANNING. Valid transitions from DONE: CLOSED`.
- ALLOW case: piping a valid forward transition → hook stayed silent (exit 0, no output).
- Non-target tool: piping `tool_name: mcp__minsky__session_start` → hook stayed silent.
### Execution evidence:
```
$ bun test --preload ./tests/setup.ts --timeout=15000 ./.claude/hooks/tasks-status-set-guard.test.ts
bun test v1.2.21 (7c45ed97)
.claude/hooks/tasks-status-set-guard.test.ts:
(pass) checkTransition — non-target tools > allows other tools regardless of input [0.95ms]
(pass) checkTransition — non-target tools > allows Bash and Write [0.01ms]
(pass) checkTransition — input shape > allows when taskId is missing [0.01ms]
(pass) checkTransition — input shape > allows when status is missing
(pass) checkTransition — input shape > allows when taskId is empty
(pass) checkTransition — input shape > allows when status is empty
(pass) checkTransition — invalid requested status > denies when requested status is not in the enum [0.06ms]
(pass) checkTransition — invalid requested status > denies when requested status is lowercase (case-sensitive enum)
(pass) checkTransition — read-failure fail-open > allows when readCurrentStatus returns null [0.02ms]
(pass) checkTransition — read-failure fail-open > allows when current status is not a valid enum value
(pass) checkTransition — allowed transitions > allows TODO -> PLANNING [0.05ms]
(pass) checkTransition — allowed transitions > allows TODO -> CLOSED [0.02ms]
(pass) checkTransition — allowed transitions > allows PLANNING -> READY
(pass) checkTransition — allowed transitions > allows PLANNING -> TODO
(pass) checkTransition — allowed transitions > allows PLANNING -> BLOCKED
(pass) checkTransition — allowed transitions > allows PLANNING -> CLOSED
(pass) checkTransition — allowed transitions > allows READY -> PLANNING
(pass) checkTransition — allowed transitions > allows READY -> BLOCKED
(pass) checkTransition — allowed transitions > allows READY -> CLOSED
(pass) checkTransition — allowed transitions > allows IN-PROGRESS -> IN-REVIEW
(pass) checkTransition — allowed transitions > allows IN-PROGRESS -> BLOCKED
(pass) checkTransition — allowed transitions > allows IN-PROGRESS -> PLANNING
(pass) checkTransition — allowed transitions > allows IN-PROGRESS -> CLOSED
(pass) checkTransition — allowed transitions > allows IN-REVIEW -> IN-PROGRESS
(pass) checkTransition — allowed transitions > allows IN-REVIEW -> DONE
(pass) checkTransition — allowed transitions > allows IN-REVIEW -> BLOCKED [0.05ms]
(pass) checkTransition — allowed transitions > allows IN-REVIEW -> CLOSED
(pass) checkTransition — allowed transitions > allows DONE -> CLOSED
(pass) checkTransition — allowed transitions > allows BLOCKED -> TODO
(pass) checkTransition — allowed transitions > allows BLOCKED -> PLANNING
(pass) checkTransition — allowed transitions > allows BLOCKED -> READY
(pass) checkTransition — allowed transitions > allows BLOCKED -> CLOSED [0.01ms]
(pass) checkTransition — allowed transitions > allows CLOSED -> TODO
(pass) checkTransition — disallowed transitions > denies IN-REVIEW -> PLANNING (mt#1470 regression) [0.14ms]
(pass) checkTransition — disallowed transitions > denies IN-REVIEW -> READY (the second mt#1470 violation) [0.03ms]
(pass) checkTransition — disallowed transitions > denies READY -> IN-PROGRESS (session_start-only) [0.05ms]
(pass) checkTransition — disallowed transitions > denies PLANNING -> IN-PROGRESS (must go through READY) [0.04ms]
(pass) checkTransition — disallowed transitions > denies DONE -> any non-CLOSED [0.78ms]
(pass) checkTransition — disallowed transitions > allows DONE -> CLOSED (the only legitimate exit from DONE) [0.04ms]
(pass) checkTransition — disallowed transitions > denies TODO -> READY (must go through PLANNING)
(pass) checkTransition — disallowed transitions > denies CLOSED -> non-TODO (only reopen path) [0.04ms]
(pass) checkTransition — disallowed transitions > includes 'reconcile' guidance in denial reason [0.04ms]
42 pass
0 fail
64 expect() calls
Ran 42 tests across 1 file. [95.00ms]
```
Test plan:
- [x] `bun test ./.claude/hooks/tasks-status-set-guard.test.ts` — 42/42 pass, 64 expect calls
- [x] `bun test ./src/domain/tasks/` — 231/231 pass (ride-along didn't regress)
- [x] `bun test ./src/adapters/shared/commands/tasks/` — 22/22 pass
- [x] Typecheck clean (0 errors)
- [x] Lint clean (0 errors, 0 warnings, 1076 files)
- [x] End-to-end smoke test against live `minsky tasks status get` (above)
## Notes
- mt#1542 was filed during planning under the mistaken belief that PR #851 (mt#1362) was orphaned from main. The premise was a stale local checkout (`1f47ae760` vs origin `32174b9bd`); mt#1542 closed as filed-in-error, with `feedback_stale_local_main_in_adoption_check` saved as memory.
- Coordination with PR #928 (mt#1483, branch-freshness): both edit `.claude/settings.json` PreToolUse. Adjacent matchers, no content overlap.
(Had Claude implement this; AI-co-authored.)
Co-Authored-By: minsky-ai[bot] <minsky-ai[bot]@users.noreply.github.com>
|
Merging via Convergence signal: R6 emitted 0 findings (no BLOCKING, no NON-BLOCKING). The reviewer's "ran but produced no findings" meta-note explicitly disclaims approval, but per the convergence-event decision tree, R6 zero-BLOCKING after 5 prior rounds of substantive BLOCKINGs (all addressed) is the merge signal. Round-by-round:
CI on the latest commit ( Tests: 29/29 passing on the hook + helper. Lint:strict clean. Note for downstream: two follow-ups filed during this PR's iteration:
Plus mt#1522 from R3 — CAS guard on origin/main SHA between hook allow and session_commit's push, closing the residual seconds-window race. — Had Claude run the convergence-event decision tree and recommend the bypass. |
Conflict resolved in .claude/settings.json: kept both new PreToolUse hooks - mcp__minsky__tasks_status_set guard (from main, mt#1463 family) - mcp__minsky__session_commit/pr_create/pr_edit branch-freshness guard (from this PR, mt#1483) Both hooks have distinct matchers and serve independent concerns.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Good structural guard with thoughtful budgeting and tests, but there are high-risk gaps that can silently allow stale-branch commits. Specifically: (1) listCommitsAhead treats errors/timeouts as zero divergence, causing false “up to date” silent allows; (2) the overall budget check can still start the final git log near the deadline, leading to host SIGTERM and another false-allow path; and (3) default-branch detection doesn’t propagate fetch staleness into allow-path semantics, risking misleading “up to date” claims. Also a minor nit on silent-path messaging robustness. Please address the blocking issues before merging.
Findings
- [BLOCKING] .claude/hooks/check-branch-fresh.ts:252 — Timeouts or non-zero exit in listCommitsAhead are treated as “0 commits ahead”, causing silent false-allow
In listCommitsAhead, any non-zero exit (including timeouts on large/diverged histories) returns{ count: 0, subjects: [] }(see theif (result.exitCode !== 0) { return { count: 0, subjects: [] }; }). Later, checkBranchFreshness interpretscount === 0as “up to date” and returns a silent-allow result withreason: Branch <branch> is up to date with <mainRef>(usage at theif (count === 0) { ... silent: true }branch). This creates a false-negative: in slow or large repos,git log --oneline origin/<branch>..origin/<main>can exceed the 1.5s GIT_TIMEOUT_MS and time out, leading to a silent allow even when main is actually ahead. Suggested fix: distinguish errors/timeouts from a true zero-divergence result (e.g., return a distinct status or propagate a warning), and avoid treating failed probes as up-to-date. Consider either (a) surfacing a non-silent “skipped due to timeout” reason, (b) reducing output volume with--max-countwhile computing count through a cheaper call, or (c) increasing the per-call budget for this probe and appending a warning (fail-open) rather than claiming up-to-date. - [BLOCKING] .claude/hooks/check-branch-fresh.ts:217 — Overall budget guard allows starting a 2-call sequence that can exceed remaining budget; no mid-sequence abort or partial result handling
After budget checks (GIT_TIMEOUT_MS*2) you start the comparison: computebranchRefthen call listCommitsAhead once, which internally runs a singlegit log. While you reduced TOCTOU by using one call, the prior over-budget guard assumes 2×GIT_TIMEOUT_MS covers the entire comparison phase. However, FETCH has already consumed up to 5s, and earlier probes also consumed time. With OVERALL_BUDGET_MS=9000 and per-call timeouts of 1500ms, it’s possible to pass theoverBudget(GIT_TIMEOUT_MS * 2)check and then have the singlegit logcall begin close to the deadline and get SIGTERM by the host at 15s. There’s no handling for a mid-call termination beyond returning count=0 on non-zero exit (which is misinterpreted as up-to-date). You need a final pre-call budget check using the exact remaining budget to ensure you never start thegit logcall if there isn’t enough headroom, and treat timeouts distinctly from zero ahead (see related finding). Alternatively, lower GIT_TIMEOUT_MS or OVERALL_BUDGET_MS further, or add a total-wall-clock cutoff check immediately before invoking git log to avoid starting work you can’t finish. - [BLOCKING] .claude/hooks/check-branch-fresh.ts:96 — Default-branch detection can report a ref derived from stale remote-tracking refs without marking staleness on allow path
detectDefaultRemoteBranch probes local remote-tracking refs (symbolic-ref, remote show, and rev-parse of origin/main|master) but does not carry through whether the preceding fetch succeeded. In the entrypoint, when fetch fails, you append a warning string to stdout/additionalContext for both allow and deny. However, checkBranchFreshness returns silent:true for up-to-date/fresh/undetectable-default cases, and the entrypoint suppresses the reason but still emits warnings. If fetch failed and detectDefaultRemoteBranch returns a ref based on stale state, an up-to-date result will be silent except for the separate warnings. The returnedreasonstill asserts “Branch is up to date with ”, which can be misleading if surfaced in other hosts or code paths reusing the function. To avoid silent misreporting and future misuse, thread arefsFreshboolean into BranchFreshnessResult (set from fetch outcome) and either (a) downgrade the allow reason to “skipped due to stale refs” when fetch failed, or (b) avoid claiming “up to date” when freshness is unknown. - [NON-BLOCKING] .claude/hooks/check-branch-fresh.ts:168 — Fresh-branch path’s informational reason risks being surfaced by future hosts despite being marked silent
For fresh branches you return a non-empty reason string and marksilent: true. The current entrypoint suppresses reasons for silent results, but other hosts or future refactors could inadvertently surfaceresult.reason(e.g., by aggregating all reasons). To make silent semantics robust, consider either omittingreasoncontent for silent cases or adding an explicitresult.kindenum so downstream code knows which messages are never to be shown. Tests currently rely on entrypoint gating, not the content itself.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Good coverage and thoughtful design, but two blocking issues remain. First, the timing budget can still exceed the 15s PreToolUse cap: a 5s fetch plus a 9s internal budget and a final 1.5s call totals ~15.5s, risking SIGTERM with no structured output — please reduce the overall budget and/or incorporate fetch into the budget guard. Second, the implementation unconditionally emits warnings even for silent-allow paths, while the PR’s Behavioral Contract text still promises silence; align behavior and docs (or suppress warnings on silent paths) and add tests that exercise the entrypoint emission paths. Non-blocking: git log without a max cap can be heavy on large divergences; consider bounding output or using rev-list --count. Also consider adding integration tests around import.meta.main to pin stdout/additionalContext semantics.
Findings
- [BLOCKING] .claude/hooks/check-branch-fresh.ts:86 — Total wall-clock can still exceed 15s PreToolUse timeout (fetch not budget-enforced + last-call allowance)
The hook’s matcher in .claude/settings.json sets a 15s timeout for this PreToolUse. You introduced OVERALL_BUDGET_MS = 9,000 and tightened per-call caps (GIT_TIMEOUT_MS = 1,500) to keep the decision under budget. However, refreshRemoteRefs (FETCH_TIMEOUT_MS = 5,000) runs outside the budget guard, and the budget logic inside checkBranchFreshness still allows starting a final git call of up to GIT_TIMEOUT_MS. In the worst case, wall time can be: 5s (fetch) + 9s (budget-consumed inner calls) + 1.5s (last allowed call) ≈ 15.5s — exceeding the 15s host timeout and risking SIGTERM mid-run with no structured output. Evidence: - .claude/hooks/check-branch-fresh.ts: FETCH_TIMEOUT_MS = 5_000 and OVERALL_BUDGET_MS = 9_000 (constants near the top)
- The budget guard (budgetAllows) only gates calls inside checkBranchFreshness; refreshRemoteRefs is invoked before that guard (import.meta.main entrypoint) and is not itself budget-limited beyond its own 5s timeout.
- The code comment asserts “leaving ~4.5s headroom,” but that calculation omits the 5s fetch outside the guard.
Fix: reduce OVERALL_BUDGET_MS (e.g., ≤ 8,500), and/or reduce FETCH_TIMEOUT_MS, or explicitly account for fetch within the overall wall-clock budget (and disallow starting a final 1.5s call when the combined elapsed would exceed the 15s host cap). Also update the comment to reflect the true headroom calculation. - [BLOCKING] .claude/hooks/check-branch-fresh.ts:471 — Silent-allow contract is undermined by unconditional warning emission (staleness warnings surface even on silent paths)
The Behavioral Contract in docs says silent paths emit no stdout/additionalContext, but this entrypoint unconditionally emits warnings (e.g., fetch failure) even when result.silent === true. See lines ~469–500: after computingisSilent, the code still pushes allwarningsand writes them to stdout and hookSpecificOutput regardless ofisSilent. This contradicts the earlier iterations’ contract and will leak messages during fresh-branch, up-to-date, and undetectable-default cases. If the contract was intentionally amended, the docs in .minsky/.cursor/AGENTS/CLAUDE were updated accordingly; however, this PR description’s Behavioral Contract section still claims fresh branch and others are “Allows silently,” without noting warnings as carve-outs. Either (a) suppress warnings when result.silent is true, or (b) update the PR’s Behavioral Contract/documentation to clearly state that warnings always surface even on silent allows (and add tests covering this host-surface behavior). - [NON-BLOCKING] .claude/hooks/check-branch-fresh.ts:131 — Potential performance/memory risk: counting ahead commits by streaming full
git logwithout cap
listCommitsAheadcomputes the total ahead count by runninggit log --oneline branch..mainand counting lines, then slices to 10 subjects. On large divergences (thousands of commits), this reads and holds the entire output in memory and risks blowing the 1.5s timeout or producing very large buffers. While the single-call approach eliminates a TOCTOU between a separate--countandlog, it trades in unbounded output. Consider: - Using
git rev-list --count branch..mainfor the count andgit log -n 10 --onelinefor subjects, accepting the minimal TOCTOU window after a bounded fetch; or - If you want to keep single-command atomicity, add a hard cap with
--max-countto bound I/O, and surface “count ≥ N” when truncated, or runrev-list --countonly when the first N lines indicate a potential block.
Document the chosen tradeoff and add tests for very large ahead counts. - [NON-BLOCKING] .claude/hooks/check-branch-fresh.test.ts:409 — Entry-point behavior (stdout/additionalContext emission) is not exercised; tests rely on reading source comments
The suite extensively tests pure helpers and an injectable wrapper, and includes two tests for the exported checkBranchFreshness. However, there are no tests that execute the import.meta.main entrypoint to assert the actual emitted stdout and hookSpecificOutput behavior across silent vs non-silent paths and when warnings are present. The last section (“Behavioral Contract: warnings emit even on silent paths”) asserts intent in comments rather than by executing the entrypoint. Consider adding integration-style tests that mock readInput/writeOutput/execWithPath to verify: (a) no reason lines emitted for silent allows, (b) warnings are still emitted on silent allows, and (c) blocked messages include the warnings section.
|
CI green on merge commit |
Summary
Adds
.claude/hooks/check-branch-fresh.ts: a PreToolUse hook that fires beforesession_commit,session_pr_create, andsession_pr_edit. Whenorigin/mainhas commits not reachable fromorigin/<branch>, the hook blocks the call and lists the diverging commit subjects. This is the structural fix for mt#1483.Addresses the "branch-behind-main during reviewer iteration" pattern that recurred four times (mt#1190/PR#763, mt#1262/PR#805, mt#1384/PR#858, mt#1292/PR#832). Prior memory-based bridge rule (
feedback_check_branch_behind_main_during_iteration) failed when rounds were widely spaced across days with fresh agent context.Motivation & Context
The pattern from the task spec: an agent pushes reviewer-iteration commits on top of a stale base, not knowing that a sibling PR already shipped the same fix from the other direction. The round-count trigger in the memory rule doesn't fire on widely-spaced rounds. A deterministic state check at the commit boundary fires every time regardless of context.
Escalation chain mirrors the parallel-work pattern: memory (Tier 1) → skill steps (Tier 2) → PreToolUse hook (Tier 3). mt#1362 did this for
session_start; this task does it forsession_commit/session_pr_create/session_pr_edit.Design/Approach
The hook compares
origin/<branch>..origin/main(remote refs, not local HEAD) to avoid false positives from local-only commits. Fresh branches (no remote tracking ref yet) are allowed silently to avoid blocking the very first push.Alternatives considered:
session_update— the hook only surfaces; the agent decides how to react.Key Changes
New files:
.claude/hooks/check-branch-fresh.ts— the hook script (executable).claude/hooks/check-branch-fresh.test.ts— 12 hermetic tests with injectable depsModified files:
.claude/settings.json— registers the hook underPreToolUsewith matchermcp__minsky__session_commit|mcp__minsky__session_pr_create|mcp__minsky__session_pr_edit, 15s timeout.minsky/rules/hook-files.mdc— adds "Branch Freshness Guard" documentation section (source for CLAUDE.md / AGENTS.md / cursor-rules viarules compile)CLAUDE.md,AGENTS.md,.cursor/rules/hook-files.mdc— regenerated compile outputsMemory update (outside repo):
feedback_check_branch_behind_main_during_iteration.md— updated to note the structural fix shipped and replace the bridge rule headerTesting
formatBlockMessage: 5 cases (count+branch, review instruction, override instruction, 10-subject cap, alternate main ref)All tests pass:
bun test ./.claude/hooks/check-branch-fresh.test.ts→ 12 pass, 0 fail.Lint clean: 0 errors, 0 warnings. Pre-commit passes all gates (formatting, lint, typecheck, secrets, tests, rules-compile staleness).
Behavioral Contract
origin/mainis N commits ahead oforigin/<branch>, lists first 10 subjects, total count, and instruction to review before continuing.MINSKY_SKIP_FRESHNESS=1env var, bypass logged to stdout for audit.chmod +xapplied to both.tsfiles per CLAUDE.md "Hook Files" rule.