feat(mt#1459): Add PreToolUse hook to require execution evidence before merging PRs with new tests#909
Conversation
…re merging PRs with new tests Adds require-execution-evidence-before-merge.ts hook that intercepts mcp__minsky__session_pr_merge and blocks merges when a PR introduces new test files (*.test.ts, *.integration.test.ts, *.spec.ts) but the PR body lacks an Execution evidence: block. Two escape hatches: - [unverified-tests] title prefix: allows merge with a warning - Execution evidence: heading in PR body: evidence of an actual test run Registers in .claude/settings.json alongside require-review-before-merge.ts on the same mcp__minsky__session_pr_merge matcher. 42 tests cover all four spec acceptance tests plus unit tests for each exported pure function (isTestFile, findNewTestFiles, hasExecutionEvidence, hasBypassPrefix, checkExecutionEvidence). Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
conclude_review call. Event derived from severity counts: REQUEST_CHANGES (4 BLOCKING / 1 NON-BLOCKING / 0 PRE-EXISTING findings). Executive summary unavailable.
Findings
- [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:255 — Hook emits multiple JSON outputs by calling writeOutput in a loop, violating the single-JSON contract
In the top-level entrypoint, warnings are surfaced via a for-loop that calls writeOutput once per warning (see the block starting withfor (const warning of result.warnings) { ... }). types.writeOutput writes a raw JSON object to stdout without a delimiter. Emitting multiple JSON objects concatenated produces invalid JSON for consumers expecting a single HookOutput object, risking parse errors or dropped outputs. Contrast with parallel-work-guard.ts, which aggregates warnings into a single additionalContext string and calls writeOutput once. Fix by aggregating warnings and emitting exactly one writeOutput (and one for a deny), or by printing warnings to stdout (plain text) and reserving writeOutput for a single structured payload. - [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:210 — Hardcoded owner/repo slug and PR resolution logic breaks on forks or non-edobry/minsky remotes
The hook resolves the PR number viagh pr list --repo edobry/minsky --head task/${task}and then usesmakeProdPrDepswith the same hardcoded owner/repo for subsequent API calls (lines ~210–245). This makes the hook non-portable to forks or when the working directory's origin is not edobry/minsky (e.g., a teammate's fork PR). Other hooks (parallel-work-guard.ts) derive the owner/repo slug fromgit remote get-url origin. Align with that pattern to avoid false negatives (fail-open) or querying the wrong repo. Derive repo from git and pass it through, with a fail-open warning if unavailable. - [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:167 — PR files fetch does not paginate GitHub API results — may miss added test files beyond first page
makeProdPrDeps.fetchPrFiles callsgh api repos/${repo}/pulls/${prNumber}/fileswithout--paginate. GitHub's PR files API is paginated (default 30 items/page). On PRs changing >30 files, added tests beyond the first page will be silently ignored, potentially allowing merges without required evidence. Use--paginateand aggregate pages, or switch togh pr view --json files --jq .files(which returns all files) to ensure complete coverage. - [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:119 — Evidence detection regex will produce false positives (e.g., "No Execution evidence:"), allowing merges without actual evidence
hasExecutionEvidence uses/execution evidence:/i(line ~119), which matches any occurrence of that substring, including negations like "No Execution evidence:" or template placeholders. This defeats the guard by allowing merges when the body explicitly states that there is no evidence. Tighten the parser to require a section heading (e.g., start-of-line/Markdown heading like^##\s*Execution evidence:\s*$followed by non-empty content), or at minimum require non-empty lines following the marker to reduce false positives. Add tests for negative phrases and empty blocks. - [NON-BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:37 — Duplicate exec helper instead of reusing shared execSync or established helper
This file introduces a bespoke execWithPath wrapper (lines 37–58) while types.ts already exports execSync and parallel-work-guard.ts has a similar execWithPath with timedOut semantics. The divergence (PATH augmentation here, but not in types.execSync) creates inconsistent behavior and duplicated code paths. Consider centralizing the PATH-augmented exec helper in types.ts (or reusing the existing one if available) to reduce maintenance risk and keep timeout/ENV behavior consistent across hooks.
…ivation, pagination, regex tightening BLOCKING #1: aggregate warnings into single additionalContext string; emit exactly one writeOutput call per code path (deny or allow-with-warnings). BLOCKING #2: derive owner/repo from https://github.com/edobry/minsky.git via parseGitHubRemoteUrl and deriveRepoFromGit; no more hardcoded edobry/minsky slug; fail-open with warning if derivation fails. BLOCKING #3: add --paginate to gh api repos/.../pulls/.../files call; parse multi-page output by splitting on newlines and flattening JSON arrays. BLOCKING #4: tighten hasExecutionEvidence to require line-start anchor (prevents mid-sentence matches) and negation guard (skips lines containing no before the marker) and require non-empty content after the heading. NON-BLOCKING #5: centralize execWithPath in types.ts; both hooks import from there. Tests: add negative cases for hasExecutionEvidence (negation, empty body, whitespace-only body); add parseGitHubRemoteUrl suite covering all URL forms and null cases. 59 pass.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Good structural addition with solid unit tests and prior JSON-output concerns addressed, but two blocking issues remain. First, the PreToolUse timeout is set to 20s while the hook can legitimately spend up to ~40s+ across sequential gh/git calls, risking premature termination and a silent allow. Second, PR resolution assumes a task/ branch naming convention; if branches differ, the guard exits early without warning, skipping enforcement. Please raise the hook timeout (or tighten per-call budgets) and make PR number derivation robust to alternate branch names (e.g., use current HEAD or search). Once addressed, this should be ready to merge.
Findings
- [BLOCKING] .claude/settings.json:19 — Hook timeout (20s) is lower than cumulative gh/git timeouts (10s + 15s + 15s), risking premature kill and silent allow
The PreToolUse entry for mcp__minsky__session_pr_merge sets timeout: 20 (line 19). However, the hook performs up to three sequential subprocesses: (1)gh pr listwith default execWithPath timeout 10s (types.ts: execWithPath defaults to 10000ms), (2)gh api .../pulls/.../fileswith timeout 15000ms (require-execution-evidence-before-merge.ts: makeProdPrDeps > fetchPrFiles), and (3)gh api .../pulls/...with timeout 15000ms (makeProdPrDeps > fetchPrMeta). In degraded conditions this can exceed 20s, causing the hook process to be terminated by the runner before it emits a single JSON output. Since the hook is fail-open (and termination yields no deny), this can silently allow merges that should be blocked. Fix by either increasing the settings.json timeout (e.g., 45–60s) or tightening per-call timeouts and short-circuiting earlier to keep worst-case under the hook budget. - [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:325 — Deriving PR number assumes branch naming
task/<taskId>; mismatch will skip enforcement without warning
At lines 325–343, the hook constructsbranch =task/${task.replace("#","-")}and then queriesgh pr list --head ... --jq '.[0].number'. This silently relies on the PR’s head ref exactly matching that pattern. Any repository using different branch conventions (e.g.,feature/mt-1459, forks withusername:branch, or GitHub UI-created branches) will fail to resolve a PR number (parseInt('')=> NaN) and the process exits(0) without emitting any warning or additionalContext, effectively disabling the guard. Given the stated goal is to enforce on mcp__minsky__session_pr_merge regardless of contributor branch naming, this is a silent behavior gap. Suggested fix: resolve PR by commit SHA (gh pr view --json) or by searching PRs with--searchfor the task token in title/body, or derive current branch viagit rev-parse --abbrev-ref HEADand use--head` .
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Requesting changes: while the feature is well-scoped with strong unit coverage, there are structural issues that create bypass and timeout risks. The PreToolUse timeout (20s) is lower than the cumulative gh/git timeouts in the hook, making premature termination likely; PR resolution relies on a brittle branch-name convention and exits silently when not found; and PR file fetch failures return an empty list without surfacing a warning, allowing merges when the check couldn’t run. Please raise or reconcile timeouts, make PR identification robust (and warn on failure), and propagate warnings for file-fetch errors. Non-blocking nits: broaden the negation guard and centralize remote-URL parsing to avoid duplication.
Findings
- [BLOCKING] .claude/settings.json:19 — PreToolUse timeout (20s) is lower than cumulative gh/git timeouts in the hook, risking premature kill and silent allow
The PreToolUse entry for mcp__minsky__session_pr_merge sets "timeout": 20 (see .claude/settings.json:19). In require-execution-evidence-before-merge.ts, the sequential subprocess calls can cumulatively exceed 20s under normal latency: deriveRepoFromGit uses a 5s timeout, the PR resolution step (gh pr list) uses the execWithPath default of 10s, and each of fetchPrFiles and fetchPrMeta uses 15s (see makeProdPrDeps). In a degraded network or paginated PR, this sequence can exceed 20s, causing the host to SIGTERM the hook mid-flight. Because the hook is designed fail-open only when it can emit a warning, a hard timeout will bypass enforcement without a structured message. Raise the hook timeout (or proportionally lower per-call budgets and ensure total fits under the configured ceiling) to prevent premature termination. - [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:290 — Brittle PR-number resolution assumes task/ branch naming and silently skips enforcement when not found
At lines ~290–319, the hook constructsconst branch =task/${task.replace("#","-")}and usesgh pr list --head --jq '.[0].number'to infer the PR number. If the active PR's branch does not follow this exact naming convention (e.g., a different prefix, or a teammate/fork uses a different branch name),gh pr listwill return no match and the codeif (!prNumber || isNaN(prNumber)) process.exit(0);exits early without any warning or audit trail. Additionally, for fork-based PRs,gh pr list --headtypically requiresowner:branch; using onlyagainst--repo owner/repooften yields no results, again causing a silent allow. This creates a reliable bypass path where the guard never runs. Please (a) add robust PR identification (e.g., resolve current HEAD’s associated PR viagh pr view --json number` or search by SHA), (b) support fork PR heads (owner:branch), and (c) emit a warning when PR resolution fails instead of exiting silently. - [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:142 — Silent allow when PR files fetch fails — returns [] without surfacing a warning to the top-level
In makeProdPrDeps.fetchPrFiles (around lines 142–180), non-zero exit or JSON parse errors return an empty array. The top-level enforcement then computesnewTestFilesfrom this empty list and may allow merge even when the API call failed. Unlike the PR metadata path (which fails open with a user-visible warning via writeOutput), file-fetch failures have no warning or audit. This creates a stealth bypass ifgh apiis unavailable or times out. Please propagate a warning (e.g., by returning a tuple with warnings or emitting a writeOutput additionalContext) and consider failing open with an explicit warning instead of silently returning [].
Inline comments
- .claude/hooks/require-execution-evidence-before-merge.ts:307 — Consider switching to
gh pr view --json number --jq .numberwithout--headto resolve the current PR from the checked-out branch, or usegh pr list --search sha:<HEAD>to avoid relying on an exacttask/<id>head name. Also, fork PRs requireowner:branchfor--headmatching.
…, fileFetch warning BLOCKING #1: raise require-execution-evidence hook timeout from 20s to 60s in .claude/settings.json to cover worst-case sequential subprocess budget (deriveRepoFromGit 5s + gh pr list 10s + fetchPrFiles 15s + fetchPrMeta 15s = 45s). BLOCKING #2: replace brittle branch-name PR resolution with a two-path resolvePrNumber() helper: primary path uses (current branch, handles forks and non-standard names); fallback uses 921 fix(mt#1486): mt#1486: widen prior-review parsers for bare [BLOCKING] markers task/mt-1486 OPEN 2026-04-30T22:04:07Z 920 feat(mt#1465): mt#1465: severity-monotonicity prototype + A/B replay measurement task/mt-1465 OPEN 2026-04-30T21:12:37Z 914 feat(mt#1071): Add attention accounting module, attention.report command, and --show-attention flag task/mt-1071 OPEN 2026-04-30T19:03:03Z 909 feat(mt#1459): Add PreToolUse hook to require execution evidence before merging PRs with new tests task/mt-1459 OPEN 2026-04-30T18:06:16Z 908 feat(mt#1460): Add execution evidence requirement for new test files to prepare-pr skill task/mt-1460 OPEN 2026-04-30T17:59:59Z 907 feat(mt#1348): Auto-apply review-state labels to PRs on review submission task/mt-1348 OPEN 2026-04-30T09:19:25Z 906 feat(mt#1341): Add suggestion field to ReviewComment for GitHub suggestion blocks task/mt-1341 OPEN 2026-04-30T09:13:52Z 900 feat(mt#1467): Wire mergeSessionPr to emit quality.review Ask before each merge task/mt-1467 OPEN 2026-04-30T07:31:42Z 898 feat(mt#1342): Add GitHub PR review thread resolve/unresolve via GraphQL task/mt-1342 OPEN 2026-04-30T07:16:28Z 896 feat(mt#1343): reviewThreads field in session_pr_review_context task/mt-1343 OPEN 2026-04-30T07:11:30Z 889 feat(mt#1346): GitHub Check Run annotations as parallel review surface task/mt-1346 OPEN 2026-04-30T07:02:20Z 887 feat(mt#1347): Add pre-flight diff anchor validation for PR review comments task/mt-1347 OPEN 2026-04-30T06:56:49Z 886 feat(mt#1340): Update reviewer prompt to emit line-anchored comments[] from parsedDiff task/mt-1340 OPEN 2026-04-30T06:51:22Z 882 feat(mt#1437): Typed TS env-var synthesizer for Railway services (minsky-mcp) task/mt-1437 OPEN 2026-04-28T22:03:56Z 872 feat(mt#1309): Merge-gate denies PRs with zero CI check_runs (PR #763 webhook-miss regression detection) task/mt-1309 OPEN 2026-04-28T07:36:22Z 858 fix(mt#1384): write canonical github-pr ref so reconciler can route Asks task/mt-1384 OPEN 2026-04-27T23:13:27Z 855 docs(mt#1311): Document merge-flow operational reality in /review-pr and /orchestrate skills task/mt-1311 OPEN 2026-04-27T21:00:05Z 849 feat(mt#1306): Reviewer convergence metrics: durable Postgres persistence task/mt-1306 OPEN 2026-04-27T19:25:19Z 809 feat(mt#1255): replace console.* with structured winston logger in reviewer service task/mt-1255 OPEN 2026-04-26T16:57:53Z 781 fix(mt#1274): [SUPERSEDED by mt#1281] propagate sessionId to internal session_update task/mt-1274 OPEN 2026-04-25T19:11:02Z 777 fix(mt#1261): [SUPERSEDED by mt#1281] session_pr_create sessionId-shape fix task/mt-1261 OPEN 2026-04-25T19:01:41Z 774 test(mt#1263): Add runReview sanitize wiring integration tests task/mt-1263 OPEN 2026-04-24T18:54:29Z 710 feat(mt#1028): KB activity source — GitHub Issues/PRs provider task/mt-1028 OPEN 2026-04-22T23:17:34Z 703 docs(mt#1109): E2E smoke artifact — add v1-deployed status to reviewer README task/mt-1109 OPEN 2026-04-22T22:07:14Z 684 feat(mt#1066): PostToolUse hook enforcing /review-pr after session_pr_create task/mt-1066 OPEN 2026-04-22T17:45:49Z 682 docs(mt#1050): surface attention-allocation frame in README task/mt-1050 OPEN 2026-04-22T17:44:30Z 538 chore(deps-dev): bump ts-morph from 26.0.0 to 28.0.0 dependabot/npm_and_yarn/ts-morph-28.0.0 OPEN 2026-04-20T00:32:48Z 537 chore(deps-dev): bump @types/node from 22.19.17 to 25.6.0 dependabot/npm_and_yarn/types/node-25.6.0 OPEN 2026-04-20T00:32:43Z 536 chore(deps): bump ai from 4.3.19 to 6.0.168 dependabot/npm_and_yarn/ai-6.0.168 OPEN 2026-04-20T00:32:34Z 535 chore(deps): bump @ai-sdk/openai from 1.3.24 to 3.0.53 dependabot/npm_and_yarn/ai-sdk/openai-3.0.53 OPEN 2026-04-20T00:32:16Z. Both failing now emits a writeOutput warning instead of silently calling process.exit(0). Added ExecFn injectable type for unit testing. BLOCKING #3: change fetchPrFiles return type from PrFile[] to FetchPrFilesResult { files, warning? } so API errors are surfaced in the audit trail via additionalContext rather than silently allowing merge. Warning is propagated through topLevelWarnings and merged with check-level warnings in the final writeOutput call. Tests: 8 new tests added (resolvePrNumber: 6, fetchPrFiles warning shape: 3); all 68 tests pass.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Solid structural guard with extensive unit tests and thoughtful fail-open posture, but several bypass and integration risks remain. The hook ignores renamed/copied files, allowing introduction of new *.test.ts via rename without evidence; the evidence parser can be spoofed via HTML comments; and the entry point exits early if task is absent instead of attempting PR resolution via gh pr view. Additionally, mixed stdout logging across hooks risks invalid JSON output for PreToolUse consumers. Please address these blocking issues before merge by expanding file status handling (including previous_filename), hardening evidence detection, making task handling robust (or independent), and standardizing JSON-only stdout for hooks.
Findings
- [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:78 — New test introduction via rename/copy is not detected — only status:"added" is considered
findNewTestFiles filters exclusively onstatus === "added"andisTestFile(filename), ignoring files withstatus: "renamed"or"copied". This creates a bypass: a PR can introduce a new*.test.tsor*.spec.tsby renaming/copying an existing non-test file, and the hook will treat it as "no new test files" and allow merge without evidence. Evidence: - At .claude/hooks/require-execution-evidence-before-merge.ts: the
PrFiletype includes multiple statuses, butfindNewTestFiles(around this line) filters strictly onstatus === "added". - At .claude/hooks/require-execution-evidence-before-merge.ts: in
makeProdPrDeps.fetchPrFiles, the--jqextracts only{filename, status}from the GitHub API, droppingprevious_filenamewhich would be needed to reason about renames into test-file patterns.
Remediation: Treatrenamed/copiedas "new test files" when the new filename matches a test pattern and (optionally) the previous filename did not. Update the jq to extract.previous_filenameand adjust detection accordingly. - [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:149 — Execution-evidence detection can be trivially spoofed by placing the marker inside an HTML comment
hasExecutionEvidence only examines visible lines and requires a line-start "Execution evidence:" marker, but it does not exclude content inside HTML comments. A PR body like:
will be treated as valid evidence due to the line-start match, even though it's invisible and clearly not an actual run. Evidence: regex at .claude/hooks/require-execution-evidence-before-merge.ts:149 matches any line beginning with optional # then execution evidence:; there is no guard for HTML comment blocks. Tests also lack this negative case.
Remediation: Before scanning, strip HTML comments (/<!--[\s\S]*?-->/g) or explicitly reject matches within comment blocks. Add unit tests to cover comment-wrapped markers and ensure they return false.
- [BLOCKING] .claude/hooks/types.ts:14 — Hook output shape may be incompatible: missing top-level event fields and inconsistent use across hooks
The writeOutput function serializes only a{ hookSpecificOutput: { ... } }object. However, other hooks in this repo (e.g., parallel-work-guard.ts) sometimes also write plain text to stdout prior to writeOutput, and the consumer contract for PreToolUse may expect a single JSON object with specific top-level fields (e.g., decision, reason) rather than arbitrary logs plus JSON. While you aggregated warnings into additionalContext in the new hook, parallel-work-guard still writes multiple log lines (see .claude/hooks/parallel-work-guard.ts lines ~196, 224, 258, 289) before its JSON, potentially producing mixed output. Since this PR centralizes execWithPath but leaves mixed-output behavior, the receiving system may fail JSON parsing or drop the decision.
Remediation: Ensure all PreToolUse hooks avoid printing non-JSON to stdout. Route warnings through writeOutput.additionalContext and remove direct process.stdout.write calls. Alternatively, write logs to stderr. - [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:399 — Hook silently skips enforcement when
tool_input.taskis missing instead of attempting robust PR resolution
At .claude/hooks/require-execution-evidence-before-merge.ts:399–404, the hook readsconst task = (input.tool_input.task as string | undefined) ?? ""; if (!task) process.exit(0);and exits immediately. This creates a bypass: if the invoking tool omitstask(or usestaskId, which this hook does not read), enforcement is skipped entirely, even thoughresolvePrNumber's primary path (gh pr view --json number) does not depend ontaskand could still resolve the PR. In contrast, parallel-work-guard supports bothtaskandtaskIdaliases. Remediation: accepttaskIdas a fallback, and if neither is present, still attempt thegh pr viewprimary resolution path (only use the branch-name fallback whentaskis available). Emit a warning rather than exiting silently when task context is absent.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Solid structural addition with thorough unit tests and improved robustness (timeouts, PR resolution, pagination, and single-output discipline). However, three blocking issues remain. First, the hook is only registered for mcp__minsky__session_pr_merge; merges via the GitHub MCP path (mcp__github__merge_pull_request) appear unguarded, creating a bypass. Second, hasExecutionEvidence’s regex admits malformed headings (e.g., '##Notes execution evidence:') due to allowing no-space after heading markers; tighten to require a space or pure label. Third, deriving the repo from origin targets the contributor’s fork in fork-based workflows, causing gh API calls to miss the upstream PR and fail-open, bypassing enforcement. Address these to close bypass vectors; a minor non-blocking note flags pagination parsing edge cases.
Findings
- [BLOCKING] .claude/settings.json:7 — Policy gap: hook only enforces on mcp__minsky__session_pr_merge; merges via mcp__github__merge_pull_request bypass enforcement
The new PreToolUse hook is registered only under the matcher "mcp__minsky__session_pr_merge" (lines 6–17). However, the repository also supports merges via the GitHub MCP path (see PostToolUse matcher entry for "mcp__github__merge_pull_request" at lines 54–66). There is no corresponding PreToolUse enforcement for "mcp__github__merge_pull_request", which means a PR that adds new tests could be merged via the GitHub MCP tool without triggering the execution-evidence guard. This creates a silent bypass of the rule the PR intends to enforce. Please register the same hook (or an equivalent guard) under a PreToolUse matcher that covers the GitHub MCP merge path as well, or otherwise document and disable that alternate path. - [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:129 — Regex allows mid-sentence "execution evidence:" matches despite spec claiming line-start anchor
In hasExecutionEvidence (lines ~113–160), the headingPattern is defined as /^(#{0,6}\s*)(execution evidence:\s*)(.)$/im. This matches any line where the literal text "execution evidence:" appears, even if preceded by arbitrary characters other than '#'/whitespace? Actually the first capture restricts only leading '#' and whitespace, but the regex anchors at start-of-line and then requires either 0-6 '#' and spaces followed immediately by "execution evidence:". However, your tests claim to reject mid-sentence occurrences, yet the implementation also accepts inline content case (match[3]) and counts it as evidence. More importantly, the negative guard checks "beforeMarker" by slicing up to indexOf("execution") on the entire line; if the phrase appears mid-line preceded by non-# characters (e.g., "Note: execution evidence: ..."), the headingPattern WILL NOT match, but lines like "##Notes execution evidence:" (no space between heading and text) would match due to #{0,6}\s permitting no space after '#'. This allows malformed headings like "##Notes execution evidence:" to count as evidence. To enforce a true line-start anchor with a proper heading or label, require either: (a) optional heading markers with at least one space before the label (e.g., /^#{1,6}\s+execution evidence:/i), or (b) no heading markers and no preceding content (^execution evidence:). As written, variants like "#execution evidence:" (no space) or "####execution evidence:" could be incorrectly accepted. - [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:286 — Fork workflow bypass: repo derived from origin causes all API calls to target the fork, failing open for upstream PRs
deriveRepoFromGit (lines 262–272) reads theoriginremote and parseGitHubRemoteUrl to produceowner/repo. All subsequent PR resolution and API calls use this slug (e.g., resolvePrNumber passes--repo ${repo}at lines 315–327; fetchPrFiles and fetchPrMeta callgh api repos/${repo}/pulls/...at lines 172–217). In a common fork-based workflow,originpoints to the contributor's fork while the PR is opened against the upstream repository. In that case,gh pr view --repo <fork>andgh api repos/<fork>/pulls/<n>will 404 or return empty, causing the hook to emit a warning and fail-open — silently allowing merges without enforcement. This creates a systematic bypass for fork-origin branches. To fix: detect and target the upstream repository that actually owns the PR (e.g., querygh repo view --json parentto prefer.parent.nameWithOwnerwhen present; or resolve the PR viagh pr viewwithout--repoto infer the correct base repo, then use that for subsequent API calls). - [NON-BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:201 — Pagination parsing path mismatch: use of --paginate with --jq should already flatten; fallback line-splitting may mask errors
fetchPrFiles invokesgh api ... --paginate --jq "[.[] | {filename: .filename, status: .status}]"(lines 182–197). With--paginate,ghtypically concatenates the JSON arrays for each page back-to-back without newlines; with--jq, the result is already flattened per page, but concatenation can yield an invalid single JSON document. Your code attempts a direct JSON.parse first, then splits by newline and parses each line as an array (lines 198–221). Ifghconcatenates arrays without newlines, the second path won't help and you'll return an empty list without warning, silently allowing. Consider using--paginate --templateor accumulating via--jq '.[]'and then wrapping in[], or usegh apiwithout--jqand parse paginated pages by invoking page-by-page. At minimum, log a warning when both JSON.parse and line-by-line fail but stdout is non-empty.
…marker, HTML comment strip, rename/copy detection Substantive #1 — hasBypassPrefix now detects [unverified-tests] anywhere in the title (was startsWith). The sibling skill mt#1460 tells agents to put the marker at the start of their input title, but prepare-pr composes the visible title as feat(mt#X): input title, placing the marker mid-string. Updated function docstring and renamed the conceptual role (marker, not prefix). Added tests for: tag at start, tag mid-title after conventional-commit prefix, tag at end, no tag, bare word without brackets does NOT match. Updated the checkExecutionEvidence bypass-prefix test that previously asserted mid-title does NOT bypass (it was testing the old startsWith behavior). Replaced with two tests: one confirming mid-title fires, one confirming bare-word-without-brackets does not. Substantive #2 — hasExecutionEvidence now strips HTML comments via body.replace before scanning. A marker inside HTML comments is invisible in rendered Markdown and must not count as evidence. Added tests for: marker inside single-line comment returns false; multi-line comment returns false; real marker outside comment returns true; mixed (commented + real) returns true. Substantive #3 — findNewTestFiles now counts renamed/copied files as new test files when the new filename matches a test pattern AND the previous_filename did not match a test pattern. Updated PrFile type to include optional previous_filename. Updated fetchPrFiles jq projection to include previous_filename from the GitHub API. Added tests for: rename non-test to test counts; rename test to test does not count (just relocated); copy non-test to test counts; rename non-test to non-test ignored; rename with no previous_filename conservatively included. Intentionally NOT fixed (bypass-merged per task instructions): - mcp__github__merge_pull_request matcher (already blocked by separate hook) - Fork workflow / upstream parent detection (out of scope) - Round-5 regex tightening for Notes execution evidence (self-reversal) - parallel-work-guard.ts mixed stdout (separate file, not in mt#1459 scope) - task/taskId alias fallback (gh pr view primary path handles it) - Pagination alternative --template vs --paginate (current impl sufficient)
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Strong structural guard with thorough tests, but several high-impact bypass vectors and an auditability gap remain. Deriving the repo from the local origin breaks fork-based PRs and causes a fail-open path; the evidence matcher still accepts malformed Markdown headings without a space after hashes; the hook is only wired for the Minsky merge path at PreToolUse, leaving the GitHub MCP merge path unguarded if the separate ban is disabled; and the entry point silently exits when task is missing without emitting a warning. Please address these issues to ensure consistent enforcement and fail-open-with-warning behavior before merging.
Findings
- [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:424 — Fork-based PRs bypass enforcement: repo slug derived from origin targets the fork, causing gh API calls to miss the upstream PR and fail-open
At import.meta.main, the hook derives the repo slug from the local git origin (const repo = deriveRepoFromGit(input.cwd);). In fork workflows,originpoints to the contributor's fork, while the open PR lives on the upstream base repo. Subsequent calls explicitly pass--repo ${repo}togh pr viewinresolvePrNumberand togh api repos/${repo}/pulls/${prNumber}.... Whenrepois the fork,gh pr view --repo <fork>typically returns no PR andgh api repos/<fork>/pulls/<#>will 404. Your logic then emits a warning and fails-open, allowing merges without evidence — a silent bypass vector. Evidence in file: - line ~424:
const repo = deriveRepoFromGit(input.cwd); - line ~362:
gh pr view --repo repo ... - lines ~219–260:
gh api repos/${repo}/pulls/${prNumber}...
Fix: resolve the PR first without forcing a repo (let gh infer from the current branch), or query gh pr view --json baseRepository to obtain the base repo slug and use that for subsequent gh api calls. Alternatively, detect an upstream remote and default to it when present. Do not fail-open due to fork/upstream mismatch.
- [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:154 — Evidence regex admits malformed headings without a space after Markdown hashes (e.g., '##Execution evidence:')
ThehasExecutionEvidencematcher usesheadingPattern = /^(#{0,6}\s*)(execution evidence:\s*)(.*)$/im. Because the first group allows#{0,6}followed by\s*, it also matches lines with no space after the heading markers (e.g., '##Execution evidence:'), and even zero hashes (plain text), which can produce false positives like '##Notes execution evidence:' when the token follows immediately after '#' characters or other text. A prior review explicitly called for requiring a space after heading markers to avoid matches like '##Notes execution evidence:'. Tighten the pattern to require either: (a) start-of-line with optional heading markers followed by at least one space, then the label; or (b) the label at line start with no hashes. For example:/^(?:#{1,6}\s+execution evidence:|execution evidence:)\s*(.*)$/iand ensure the '## Notes' case does not match. - [BLOCKING] .claude/settings.json:1 — Hook only registered for mcp__minsky__session_pr_merge PreToolUse; merges via mcp__github__merge_pull_request path are unguarded at PreToolUse
Settings register the new hook under PreToolUse for matcher "mcp__minsky__session_pr_merge" only (lines 6–20). There is a PostToolUse hook for "mcp__minsky__session_pr_merge|mcp__github__merge_pull_request" (lines 86–103), and a PreToolUse block banning GitHub MCP PR-write tools (lines 45–61), but that separate hook only denies GitHub MCP write attempts and doesn't ensure execution evidence on merges routed via GitHub MCP if that denial hook is disabled or bypassed. To close the bypass, the execution-evidence hook should also be attached to a PreToolUse matcher that includesmcp__github__merge_pull_request(or rely on the denial hook's presence by explicitly asserting dependency). As-is, a configuration drift that removes or disables the ban would re-open the GitHub MCP merge path without this guard. Register this hook for both merge tool names or make the ban hook a hard dependency with tests in settings. - [BLOCKING] .claude/hooks/require-execution-evidence-before-merge.ts:338 — Silent bypass when
taskis absent: hook exits without emitting any audit warning or context
In the entry point, the code readsconst task = (input.tool_input.task as string | undefined) ?? ""; if (!task) process.exit(0);(lines ~336–340). This creates a silent allow path when the tool input lackstask, with nowriteOutputwarning. That makes enforcement dependent on the calling tool always supplyingtaskand provides no audit trail when it doesn't, contradicting the "fail-open with a warning" posture described in the PR. Emit a structuredadditionalContextwarning instead of exiting silently so operators can detect misconfigurations and bypass attempts.
Summary
Implements mt#1459: a PreToolUse hook on
mcp__minsky__session_pr_mergethat blocks merge when a PR adds new test files but the PR body lacks anExecution evidence:block. This is the structural enforcement for thefeedback_behavior_detecting_artifacts_need_execution_evidencerule, which failed 4-for-4 at the memory tier.Motivation & Context
The mt#1205 workstream (2026-04-28) violated the rule three times in 8 hours, shipping three PRs where new tests were never run. The first live run found three real bugs in 326ms, including a production bug in
isPgPoolExhaustionErrorthat silently no-op'dwithPgPoolRetryfor ~3 days. Per the 2026-04-28 retrospective and the mitigation-tier inversion principle, repeated failures at memory tier mandate escalation to hook tier.Design/Approach
The hook exports pure testable functions (
isTestFile,findNewTestFiles,hasExecutionEvidence,hasBypassPrefix,checkExecutionEvidence) and wraps the entry point inimport.meta.mainso the file can be imported for testing without executing. This follows the same pattern asparallel-work-guard.ts.Two escape hatches are supported:
[unverified-tests]marker anywhere in the title — allows merge with a warning, for use when tests cannot be run yet (e.g., infrastructure not deployed) and a follow-up verification task is filedExecution evidence:heading in PR body — paste from an actual test runThe hook is fail-open: if PR metadata cannot be fetched via
gh api, it allows merge with a warning rather than blocking silently.Key Changes
.claude/hooks/require-execution-evidence-before-merge.ts— new PreToolUse hook.claude/hooks/require-execution-evidence-before-merge.test.ts— 81 tests covering all four acceptance tests from the spec plus negative cases.claude/hooks/types.ts—execWithPathhelper centralized here (NON-BLOCKING chore(deps-dev): bump eslint from 8.57.1 to 9.27.0 #5).claude/hooks/parallel-work-guard.ts— updated to importexecWithPathfromtypes.ts.claude/settings.json— hook registered in the existingmcp__minsky__session_pr_mergePreToolUse matcher entryRound 1 fixes (PR #909 review)
writeOutputcalls in a loop replaced with a single aggregated call. Warnings are joined intoadditionalContext; deny path prepends warnings topermissionDecisionReason. Only one JSON object is written to stdout per code path.edobry/minskyreplaced withderiveRepoFromGit+parseGitHubRemoteUrl. Hook fails-open with a warning if the git remote can't be parsed.fetchPrFilesnow uses--paginateso PRs with more than 30 files are not silently truncated. Multi-page output (multiple JSON arrays on stdout) is flattened via line-by-line parse.hasExecutionEvidencetightened to require line-start anchor (prevents mid-sentence matches), negation guard (\bno\bbefore marker), and non-empty content after the heading. Tests added for: negation phrases, empty body after heading, whitespace-only body after heading, mid-sentence occurrence.execWithPathcentralized intypes.ts; both hooks import from there.Round 2 fixes (PR #909 round 2 review)
.claude/settings.jsonfrom 20s to 60s. Worst-case sequential subprocess budget isderiveRepoFromGit(5s) +gh pr list(10s) +fetchPrFiles(15s) +fetchPrMeta(15s) = 45s; 60s gives margin for degraded conditions or paginated PRs. No other hooks' timeouts changed.task/<id>branch-name PR resolution with a two-pathresolvePrNumber()helper exported for testing. Primary path:gh pr view --json number --jq .number(current branch, handles forks and non-standard names naturally). Fallback:gh pr list --repo <repo> --head task/<id>. If both fail, emits awriteOutputwarning (audit trail) and fails-open — does NOT silentlyprocess.exit(0). AddedExecFninjectable type so the helper is unit-testable without module mocking.fetchPrFilesreturn type fromPrFile[]toFetchPrFilesResult { files: PrFile[], warning?: string }. API errors now produce a warning string instead of silently returning[]. The warning is collected intopLevelWarningsand merged intoadditionalContext(allowed path) or prepended topermissionDecisionReason(blocked path), ensuring every API failure is auditable.Round 5 fixes (PR #909 round 5 review — substantive findings only)
hasBypassPrefix(now semantically a marker detector, not a prefix detector) changed fromstartsWithtoincludes-style regex. The sibling skill mt#1460 tells agents to put[unverified-tests]at the start of their input title, butprepare-prcomposes the visible PR title asfeat(mt#X): <input title>, placing the marker mid-string. The oldstartsWithcheck silently missed this common case. Bracket delimiters provide natural boundary semantics — bareunverified-testswithout brackets does not match. Added tests for: tag at start, tag mid-title after conventional-commit prefix, tag at end, no tag, bare word without brackets does not match.hasExecutionEvidencenow strips HTML comments viabody.replace(/<!--[\s\S]*?-->/g, "")before scanning. A marker inside<!-- ... -->is invisible in rendered Markdown and must not count as evidence. Added tests for: marker inside single-line comment returns false; multi-line comment returns false; real marker outside comment returns true; mixed (commented + real) returns true.findNewTestFilesnow countsstatus === "renamed"orstatus === "copied"entries as new test files when the new filename matches a test pattern AND theprevious_filenamedid not match a test pattern. UpdatedPrFiletype with optionalprevious_filename. UpdatedfetchPrFiles--jqprojection to includeprevious_filenamefrom the GitHub API. Added tests for: rename non-test to test counts; rename test to test does not count (just relocated); copy non-test to test counts; rename non-test to non-test ignored; rename with noprevious_filenameconservatively included.Non-fixes (intentionally out of scope per task instructions):
mcp__github__merge_pull_requestmatcher, fork workflow detection, round-5 regex tightening for##Notes execution evidence:(self-reversal),parallel-work-guard.tsmixed stdout,task/taskIdalias fallback, pagination alternative.Testing
81 unit tests cover:
isTestFile: pattern matching for*.test.ts,*.integration.test.ts,*.spec.ts; negative casesfindNewTestFiles:status: "added"test files; renamed/copied non-test→test; renamed test→test excluded; modified and deleted ignoredhasExecutionEvidence: case-insensitive detection; HTML comment stripping; positive and negative cases including negation phrases, empty/whitespace-only body after heading, mid-sentence occurrencehasBypassPrefix:[unverified-tests]marker detection anywhere in title, case-insensitive; bracket-delimited match required; bare word without brackets does not matchcheckExecutionEvidence: all four spec acceptance tests (AT1–AT4) plus edge cases; mid-title bypass marker fires; bare-word does not fireparseGitHubRemoteUrl: SCP-style SSH, HTTPS, URL-style SSH, embedded tokens, trailing newline, null casesresolvePrNumber: primary path (gh pr view), fallback path (gh pr list), both fail → warning emitted, task-derived branch used in fallbackFetchPrFilesResultshape: no-warning case, warning case, downstreamcheckExecutionEvidencebehavior with empty filesExecution evidence:
Dogfooding note
This PR adds a new test file (
require-execution-evidence-before-merge.test.ts). The hook itself enforces the rule it implements: theExecution evidence:block above is the required evidence. The hook was not yet deployed when this PR was created (it activates after merge), so this is simultaneously the specification of the rule and the first instance of compliance with it.