feat(mt#1551): Fold audit into /review-pr; retire /verify-task as a verification surface#956
Conversation
…erification surface
Implements the option B architectural restructure locked on 2026-05-01:
the reviewer subagent owns spec verification + adoption sweep + smoke
test at review time, and /verify-task is reduced to a thin closeout
wrapper for the bypass-merge fallback path.
Background: /verify-task previously dispatched the auditor subagent
post-merge to re-verify spec criteria. This duplicated the work the
reviewer subagent already does at review time and introduced today
stale-local-main false-FAIL bug on mt#1485. The session_pr_merge tool
already auto-sets DONE on successful merge, so /verify-task only ever
fires on the bypass-merge path where session_pr_merge was not used.
Changes:
1. .claude/agents/reviewer.md
- Mode 2 step 5 expanded into 5a (Spec verification), 5b (Adoption
sweep with the 10-export cost-bounding rule), 5c (Smoke test).
- Reviewer subagent is now the canonical verification surface.
2. .claude/skills/review-pr/SKILL.md
- Step 6 expanded with substeps 6.1 (Spec verification), 6.2
(Adoption sweep), 6.3 (Smoke test).
- Step 9 review body format updated: Smoke pass/fail line in CI
status, Adoption sweep table sub-section in spec verification.
3. .claude/skills/verify-task/SKILL.md
- Full rewrite as a thin closeout wrapper.
- No auditor dispatch. No spec re-verification.
- New protocol: read PR via mcp__github__pull_request_read, confirm
pr.merged === true AND merge-commit body contains the canonical
bypass-merge audit-trail signature, then set DONE.
- Halts with surfacing if either condition fails.
4. .claude/agents/auditor.md
- Description narrowed to ad-hoc spec verification.
- New Source freshness preamble: when post-merge audit, prefer
mcp__github__get_file_contents with ref set to merge SHA over
local file paths. Cross-check against merge commit before
trusting any local read.
- New anti-pattern: never report FAIL based on a stale-local-main
read.
5. .minsky/rules/task-lifecycle.mdc + compile outputs
- Added Verification surfaces section documenting the reviewer
subagent ownership, the /verify-task new role, and the
concurrent-merge follow-up (mt#1592).
- Regenerated AGENTS.md, CLAUDE.md, .cursor/rules/task-lifecycle.mdc
via bun src/cli.ts rules compile.
Acceptance tests (all met against the in-session content):
- reviewer.md Mode 2 step 5b adoption sweep with cost-bounding rule
- SKILL.md step 6.2 + 6.3 + body format updated
- verify-task SKILL.md no auditor dispatch; two-condition closeout
- auditor.md description narrowed; freshness preamble exists
- task-lifecycle docs in CLAUDE.md and AGENTS.md reflect new architecture
- recurrence-test scenario walked in verify-task SKILL.md
- mt#1592 follow-up exists for concurrent-merge regression detection
Memory feedback_stale_local_main_in_adoption_check.md updated:
removed bridge-until-ships framing, replaced with shipped Status
section; the auditor freshness preamble lands in this PR.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
The architectural intent is implemented coherently: reviewer now owns spec verification, adoption sweep, and smoke test; verify-task is reduced to a closeout wrapper; lifecycle/docs updated. However, there’s a concrete break: auditor.md’s new freshness protocol mandates using a GitHub contents tool, but this agent’s tool list doesn’t include it, making the guidance unenforceable. Additionally, the review body’s new “Smoke” line lacks a specified “skipped” representation despite step 6.3 allowing skips, creating format ambiguity. Minor duplication risk exists around the hardcoded bypass-audit phrase across multiple files. Please add the missing tool (or adjust the protocol), define the smoke field’s allowed values including “skipped,” and consider centralizing the audit-trail phrase before merging.
Findings
- [BLOCKING] .claude/agents/auditor.md:8 — Auditor instructs use of mcp__github__get_file_contents but the tool is not declared/available in this agent
In the new "Source freshness preamble" you require usingmcp__github__get_file_contentswith arefto the merge SHA for post-merge audits. However, at the top of this file the declared tools are only:Read, Glob, Grep, Bash, mcp__minsky__tasks_get, mcp__minsky__tasks_spec_get— there is no GitHub contents tool exposed to this agent. As written, the agent cannot follow its own protocol and would fall back to local reads (the exact stale-source path this change aims to retire). Fix by either (a) adding the GitHub contents tool to this agent’s allowed tools list, or (b) adjusting the protocol to use a tool that actually exists for this agent (e.g., route reads through a parent with the tool, or add an explicit dispatch step). - [BLOCKING] .claude/skills/review-pr/SKILL.md:334 — Review body format now requires a "Smoke" line but Step 6.3 allows skipping; missing guidance on representing a skipped smoke in CI status
You added a required "Smoke: <pass/fail —<command run>on PR branch>" line to the review body (around lines 334–338). Step 6.3 states docs/prompt-only PRs may skip the smoke with rationale. The required body format does not define how to represent a skipped smoke (e.g., "skipped — rationale" vs pass/fail), and the CI-status summary above only covers N passed/M failed without integrating smoke. Tighten the format and step 6.3 to specify the exact allowed values for Smoke (pass/fail/skipped with rationale) and how it interacts with CI status to avoid reviewers being blocked by the pre-merge hook expecting a "Smoke:" line when a skip is legitimate. - [NON-BLOCKING] .claude/skills/verify-task/SKILL.md:63 — Canonical audit-trail phrase hardcoded without a single source of truth constant; risk of divergence from reviewer.md and AGENTS.md copies
This skill greps for the literal phrase "Bot self-approval bypass per feedback_self_authored_pr_merge_constraints" (around lines 63–73). The same phrase now appears in reviewer skill (step 8) and in multiple docs (AGENTS.md/CLAUDE.md/rules). Hardcoding the string in multiple places invites drift and false negatives if the wording changes. Consider extracting this phrase into a single constant in a shared rules file (or a memory-backed setting surfaced via MCP) and referencing it across skills/agents, or at least add a cross-file comment anchor so future edits update all copies.
Two BLOCKING findings + one NON-BLOCKING from minsky-reviewer[bot] R1 on commit 2548d1b. Both BLOCKING addressed here; NON-BLOCKING deferred to mt#1599 as a separate cross-file refactor. Substantive fixes: 1. R1 BLOCKING #1: auditor.md instructs use of mcp__github__get_file_contents in the freshness preamble but the agent tools field did not declare it, making the protocol unenforceable. Added mcp__github__get_file_contents to the tools list. The auditor can now actually follow its own protocol. 2. R1 BLOCKING #2: SKILL.md step 9 added a required Smoke line, but step 6.3 allowed skipping for docs/prompt-only PRs without specifying how to represent a skipped smoke. Added an explicit three-value enumeration (pass / fail / skipped) with format requirements in both step 6.3 and step 9. The pre-merge hook can now parse the line unambiguously. Intentionally deferred to mt#1599: - R1 NON-BLOCKING #3: canonical bypass-merge audit-trail phrase hardcoded across 5+ files (verify-task SKILL.md grep, review-pr SKILL.md, reviewer.md, task-lifecycle compile outputs, memory). Real drift risk but cross-file refactor + drift-detection mechanism is its own scope. Filed mt#1599 with full context. CI passed 2/2 on the prior commit; no logic changed in this commit (only prompt/skill text and one tools-field entry).
Both BLOCKING findings addressed in commit 64aab6c. R1 BLOCKING #1 (auditor.md missing mcp__github__get_file_contents in tools field): added the tool to the agent declaration. R1 BLOCKING #2 (review body Smoke line ambiguity for skipped case): added explicit three-value enumeration (pass / fail / skipped) with format requirements in step 6.3 and step 9. R1 NON-BLOCKING #3 (canonical audit phrase hardcoded across files): deferred to mt#1599 as a separate cross-file refactor. This review is stale on commit 2548d1b; the new HEAD 64aab6c supersedes it.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
The architectural shift is coherent: spec verification, adoption sweep, and smoke are now owned by the reviewer; verify-task is a thin closeout; and the auditor includes a solid source-freshness protocol. However, the reviewer’s new adoption sweep lacks a safe source/freshness mechanism or GitHub-contents tool, so grep likely targets a potentially stale local workspace, reintroducing the staleness risk this PR aims to retire. Also, the canonical bypass-merge audit phrase is hardcoded in multiple places, inviting drift (non-blocking). Please add a source-freshness protocol and provide the reviewer with a tool to read at the PR ref (or constrain the sweep accordingly). Once addressed, this should be ready to merge.
Findings
- [BLOCKING] .claude/agents/reviewer.md:110 — Adoption sweep (step 5b) assumes access to “post-PR codebase” but reviewer lacks a safe source/read tool for PR-branch-wide grep; high risk of stale-local-main reads
In step 5b you instruct the reviewer to “sweep the post-PR codebase” by grepping acrosssrc/,tests/, docs,CLAUDE.md, andAGENTS.mdfor consumers of new exports. However, therevieweragent’s declared tools are onlyRead, Glob, Grep, Bash, mcp__minsky__session_pr_review_context, mcp__minsky__session_pr_review_submit, mcp__minsky__tasks_spec_get— there is no GitHub-contents tool or session checkout tool exposed to guarantee the grep runs against the PR branch rather than a potentially stale localmain(the same staleness surface called out in mt#1485). Unlike the updatedauditor(which now hasmcp__github__get_file_contentsand a Source Freshness preamble), the reviewer has no freshness protocol and no means to fetch whole-repo contents at the PR ref.
As written, the reviewer is likely to run Grep/Glob against whatever is in the local workspace, which may be hours/days behind origin and will not reflect the PR branch’s full post-change state. That undermines the “canonical verification surface” claim for adoption and reintroduces the stale-source false-FAIL/false-PASS risk you just retired in /verify-task.
Fix options:
- Add a Source Freshness preamble to
.claude/agents/reviewer.mdmirroring the auditor’s, and expose a GitHub-contents tool (e.g.,mcp__github__get_file_contentsor an equivalent list-and-read-at-ref API) to this agent; require adoption-sweep reads atref: <PR head sha>. - Or, adjust the protocol to constrain the sweep to changed files plus consumers resolvable via
parsedDiffand the session’s PR workspace, and/or explicitly dispatch a helper with the needed tool.
Until the reviewer can deterministically read the PR branch’s codebase (not stale local), step 5b is not safely implementable and risks incorrect verification outcomes.
- [NON-BLOCKING] .claude/skills/verify-task/SKILL.md:63 — Hardcoded audit-trail phrase repeated across multiple files without a single source; adds drift risk
This skill greps the merge-commit body for the literal phrase:
Bot self-approval bypass per feedback_self_authored_pr_merge_constraints
Similar literal copies are introduced in .cursor/rules/task-lifecycle.mdc, .minsky/rules/task-lifecycle.mdc, AGENTS.md, CLAUDE.md, and referenced in .claude/skills/review-pr/SKILL.md step 8. Without a single source of truth (constant/memory/rule anchor), wording drift will cause false negatives or inconsistent guidance.
Suggestion: extract this string to a shared rule/memory key (e.g., feedback_self_authored_pr_merge_constraints.audit_phrase) or a central config file (rules/constants.mdc) and reference it across skills/docs. At minimum, add a prominent comment anchor in each location linking to the canonical definition to reduce divergence risk.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Solid architectural realignment that moves verification into the reviewer and simplifies /verify-task, but a few inconsistencies reintroduce staleness and gating ambiguity. Blocking: (1) the auditor still mandates running tests via local Bash without ensuring the execution workspace matches the verified Source ref, risking stale-local false FAILs; (2) the new Smoke: line’s relationship to the CI status N/M counts and the hook parser contract is undefined; (3) /verify-task contains contradictory guidance about proceeding to DONE when the audit-trail phrase is missing versus the stated invariant that both conditions are required. Non-blocking: add a concrete mcp__github__get_file_contents usage example, tighten PR lookup by head filter across open/closed, and add an anchor/constant for the canonical bypass-merge phrase. Address the blocking items before merge.
Findings
- [BLOCKING] .claude/agents/auditor.md:56 — Preamble prefers GitHub reads but baseline still requires running local tests without a freshness-sync protocol
At.claude/agents/auditor.md:56-69you require running the full test suite, typecheck, lint, and an E2E smoke viaBashas "Baseline" checks, while the new "Source freshness preamble" (lines 18-35) instructs preferringmcp__github__get_file_contentsover local reads because the local checkout may be stale. There is no explicit protocol for ensuring the local workspace used byBashactually matches the merge SHA (e.g.,git fetch && git checkout <merge-sha>or running in a clean workspace at that ref). As written, the agent may read code from GitHub at the merge SHA but then run tests against a stale local main — reintroducing the exact false-FAIL class (mt#1485) this change aims to retire.
Please add an explicit, enforceable step before any Bash execution that synchronizes the execution context to the verified source ref (e.g., fetch + checkout the merge commit, or run commands against a temp worktree at <merge-sha>). Alternatively, carve out guidance that baseline checks requiring execution MUST be skipped unless the local workspace is confirmed to match the Source ref. Without this, auditors can still produce FAIL based on stale-local execution artifacts.
- [BLOCKING] .claude/skills/review-pr/SKILL.md:367 — Body format change adds required
Smoke:line, but Step 6.3 doesn’t specify integration with CI-status count; potential hook mismatch
At.claude/skills/review-pr/SKILL.md:367-370the review body template now includes a mandatory**Smoke:** <one of: ...>line. Step 6.3 (lines 153-172) defines allowed values and notes that pre-merge hook treatsskippedas valid. However, theCI statusline above still uses the legacy "N checks passed, M failed" summary and doesn’t define whether the smoke test contributes to N/M or is orthogonal. If the hook parses N/M to gate merges, ambiguity remains about whether afailon Smoke must incrementMor is independently blocking. The prior review flagged this; the wording here adds allowed tokens but still leaves the accounting undefined.
Please explicitly define: (a) whether Smoke is included in the N/M counts, (b) how skipped affects those counts, and (c) the exact parser contract for the pre-merge hook (e.g., regex anchors). Without this, reviewers may produce bodies that pass local lint but fail the gate, or worse, the gate may ignore a failing smoke.
- [BLOCKING] .claude/skills/verify-task/SKILL.md:66 — Ambiguous instruction on bypass-merge without audit phrase contradicts "never set DONE without both conditions"
At.claude/skills/verify-task/SKILL.md:56-85you define two required closeout conditions (A:pr.merged === true; B: merge-commit body contains the canonical phrase). Under the B-missing branch (lines ~74-85), the guidance says to "halt and surface" but then adds: "post a follow-up comment or issue documenting the bypass), or accept that this merge bypassed the audit discipline. Surface to the user before proceeding." This implies proceeding to DONE despite the missing audit trail, which directly contradicts the later Key principle: "Never set DONE without both conditions."
Please make this consistent and unambiguous: either (1) strictly block DONE when the phrase is missing (recommended, matching the Key principle), or (2) explicitly define the exceptional path where DONE is allowed and update the Key principles and Step 4 accordingly. As written, an implementer could reasonably set DONE after "surface to the user," violating the stated invariant.
- [NON-BLOCKING] .claude/agents/auditor.md:1 — Tooling addition mentions
mcp__github__get_file_contentsbut lacks example usage and required args; risk of mis-invocation
You addedmcp__github__get_file_contentsto the tools list (line 9) and reference it throughout the freshness preamble, but the skill never shows a concrete call shape (owner/repo/path/ref, error handling). Given prior staleness bugs, providing a short snippet or explicit parameter contract here would reduce misuse (e.g., forgetting to setrefto the merge SHA). Consider adding an example call under the preamble or in the Protocol section item 3. - [NON-BLOCKING] .claude/skills/verify-task/SKILL.md:27 — PR location procedure omits the common
task/mt-<id>branch-to-PR resolution viaheadfilter; may be brittle
In Step 2 (lines 27-42), you mention searching by branch-name pattern usingmcp__github__list_pull_requestswithstate: "closed". Pragmatically, the most reliable lookup is filtering PRs byhead(owner:branch) andstatein bothopenandclosed, then falling back to session records. Consider documenting a deterministic resolution order and includingopento handle the rare sequence where/verify-taskis invoked just after merge but before GH marks the PR closed. - [NON-BLOCKING] .claude/skills/verify-task/SKILL.md:1 — Canonical audit-trail phrase duplicated across multiple files without a shared anchor; drift risk persists
This PR retains hardcoded uses of the phrase"Bot self-approval bypass per feedback_self_authored_pr_merge_constraints"here (lines 66-71) and adds/updates references in.cursor/rules/task-lifecycle.mdc,.minsky/rules/task-lifecycle.mdc,AGENTS.md, andCLAUDE.md. There is still no single source of truth or cross-file comment anchor. While prior review marked this as non-blocking, the blast radius just grew. Consider adding an explicitAnchor:comment tag next to each occurrence, or referencing a memory entry key, to reduce future drift.
Spec verification
| Criterion | Status | Evidence |
|---|---|---|
| Expand reviewer to own spec verification + adoption sweep + smoke test at review time (option B) | Met | .claude/agents/reviewer.md:79-98 expands step 5 into 5a/5b/5c; .claude/skills/review-pr/SKILL.md:118-176 adds 6.1/6.2/6.3 with details and body format updates at :367-370. |
| Retire /verify-task as a verification surface; reduce to thin closeout wrapper for bypass-merge path only | Met | .claude/skills/verify-task/SKILL.md:1-22 rewrites description to thin closeout; process removes auditor dispatch; step 3 enforces merged+audit-trail conditions before setting DONE (:56-85). |
| Auditor agent narrowed to ad-hoc use with source freshness preamble and anti-staleness pattern | Met | .claude/agents/auditor.md:1-12 narrows role; :18-35 adds Source freshness preamble; :93-100 reiterates anti-stale anti-pattern. Tools list now includes mcp__github__get_file_contents at line 9. |
R2 surfaced 4 BLOCKING + 5 NON-BLOCKING on merge commit acbdca2. All 4 BLOCKING + 2 NON-BLOCKING addressed here; remaining NON-BLOCKING (canonical phrase drift) is the deferred mt#1599 concern. Substantive fixes: 1. R2b BLOCKING #3 (verify-task SKILL.md contradiction): The audit-trail-missing branch had ambiguous language (halt and surface, then or accept that this merge bypassed audit discipline) that contradicted the Never set DONE without both conditions key principle. Tightened: missing audit phrase always halts unconditionally. Removed proceed-on-missing language. Recovery path is documented but does NOT proceed to DONE without manual user-reviewed exception. 2. R2b BLOCKING #2 (Smoke / CI-status integration ambiguity): The Smoke line was added to review body but its relationship to the CI status N/M counts was undefined. Clarified: Smoke is an independent gate. CI N/M counts only GitHub Actions check_runs; Smoke is its own review-body field that the pre-merge hook parses independently. Smoke=fail blocks regardless of CI; Smoke=skipped is a valid value; Smoke=pass is a positive signal but does not increment CI N. Added parser contract explanation in step 6.3 and step 9 body format. 3. R2a BLOCKING #1 (reviewer.md adoption sweep workspace): Added a Source freshness section to Mode 2 Protocol. The reviewer subagent dispatched by /review-pr typically runs in a session workspace (which is at PR HEAD by definition since sessions check out from origin), so local Grep is safe. When invoked outside a session workspace, the agent must constrain reads to parsedDiff only and explicitly note the limitation. Mirrors the auditor freshness preamble pattern. 4. R2b BLOCKING #1 (auditor.md baseline tests vs Source ref): Auditor preamble preferred GitHub reads but baseline checks still ran Bash tests against possibly-stale local workspace. Added explicit guidance: if running in session workspace (at PR HEAD), baseline checks are safe; if running in main workspace and head does not match Source ref, EITHER ask parent to dispatch via session at merge SHA OR skip baseline checks with rationale recorded in the report. Never report FAIL on stale-local test run. Worth-it NON-BLOCKING addressed: - R2b NON-BLOCKING #4 (auditor.md missing example): added concrete mcp__github__get_file_contents call shape under freshness preamble. - R2b NON-BLOCKING #5 (verify-task PR lookup): step 2 now searches both open and closed states to handle the race window between merge and GitHub finishing the close marker. Intentionally deferred (R2 reaffirmed): - R2 NON-BLOCKING #6 (canonical audit phrase drift): tracked in mt#1599 as a separate cross-file refactor. Real concern, but cross-file + drift-detection mechanism is its own scope. Spec verification table from R2: 3 of 3 success criteria reported Met. The architectural intent of the PR is unchanged; R2 BLOCKING items were all gap-filling refinements, not redirects.
All BLOCKING findings addressed in commit 4048e27. R2a BLOCKING #1 (reviewer adoption-sweep workspace freshness): added Source freshness section to Mode 2 Protocol clarifying that session-workspace dispatch is at PR HEAD; outside-session invocation must constrain to parsedDiff only. R2 NON-BLOCKING #2 (canonical audit phrase drift): deferred to mt#1599 per the deferral rationale documented in earlier review. This review is stale on commit acbdca2; the new HEAD 4048e27 supersedes it.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Good architectural shift toward option B: reviewer now owns spec verification, adoption sweep, and smoke; verify-task reduced to a strict closeout wrapper; auditor narrowed with source-freshness guidance. However, several inconsistencies and a literal-string typo introduce real risk. Reviewer.md still tells authors to put Smoke under CI status and its body template omits the new Smoke line and Adoption sweep sub-section defined in the SKILL. Adoption sweep still lacks a safe non-session read path (no GitHub-contents tool or hard requirement for session dispatch). Verify-task contains a typo in the canonical bypass-merge audit phrase. Please reconcile these before merge; a minor non-blocking nit suggests de-hardcoding owner/repo in the auditor’s example call.
Findings
- [BLOCKING] .claude/agents/reviewer.md:120 — Reviewer 5c instructs to put Smoke result under CI status, contradicting SKILL.md where Smoke is an independent gate/line
In step 5c, the text says: "Record pass/fail and include the result in the review body's CI-status section." However,.claude/skills/review-pr/SKILL.mdexplicitly defines Smoke as an independent gate with its own**Smoke:**line and clarifies that CI N/M counts only GitHub Actions check_runs. This inconsistency will cause reviewers to format bodies incorrectly and risks parser/gate mismatches. Alignreviewer.mdwith the SKILL: instruct to write a separateSmoke:line (and not fold it into CI status). - [BLOCKING] .claude/agents/reviewer.md:89 — Adoption sweep relies on local Grep/Glob without guaranteed fresh PR-branch source or a GitHub-contents tool; staleness risk remains
The new "Source freshness" note helps when dispatched into a session workspace, but the reviewer still has no GitHub-contents read/list tool to reliably sweep the post-PR codebase when not in a session. Step 5b mandates grepping acrosssrc/,tests/, docs,CLAUDE.md,AGENTS.md. In the rare but supported "Invoked outside a session" case, the spec instructs to constrain to diff-only — this silently weakens the canonical verification surface by skipping the whole-repo adoption sweep entirely. Either (a) expose a GitHub contents/list tool to the reviewer and require reads at the PR ref, or (b) make the session-dispatch path a hard requirement for Mode 2 so that adoption sweeps never run against potentially stale local workspaces or get skipped. - [BLOCKING] .claude/skills/verify-task/SKILL.md:89 — Typo in canonical audit phrase in guidance text breaks literal-grep invariant
Under Condition B’s missing-phrase branch, the quoted string is rendered as"Bot self-approval bypass per feedback*self_authored_pr_merge_constraints"(note the stray*betweenfeedbackandself_authored). Elsewhere in this file (line ~72) and across rules/docs, the canonical literal is"Bot self-approval bypass per feedback_self_authored_pr_merge_constraints". This typo will cause a reviewer to copy the wrong string, and any automated grep will fail, leading to false blocks. Fix the string to match the canonical literal exactly. - [BLOCKING] .claude/agents/reviewer.md:247 — Mode 2 review body template omits required
Smoke:line and Adoption sweep sub-section defined in SKILL.md
In.claude/skills/review-pr/SKILL.mdstep 9, the body format now requires a separate**Smoke:** ...line and an "Adoption sweep" table under the Spec verification block. The corresponding "Review body format — Mode 2" section here shows only**CI status:** ...and a Spec verification table without the Adoption sweep sub-section. This divergence will lead to reviewer outputs that fail the pre-merge hook or omit required content. Update the body template in this file to include theSmoke:line and the Adoption sweep sub-section to stay consistent with the canonical SKILL. - [NON-BLOCKING] .claude/agents/auditor.md:24 — Hardcodes owner/repo in example
mcp__github__get_file_contentscall; risks copy-paste misuse across forks
The example showsowner: "edobry", repo: "minsky"inline. While illustrative, agents often copy example blocks literally. Consider adding a comment or placeholder values (e.g.,<owner>,<repo>) and a note to derive owner/repo from the task/PR context to avoid attempts to read the wrong repository when auditing forks or different remotes.
R3 surfaced 4 BLOCKING + 1 NON-BLOCKING on commit 4048e27. All real inconsistencies my R2 fixes introduced (or failed to propagate). All addressed here. Substantive fixes: 1. R3 BLOCKING #1 (reviewer.md step 5c contradicts SKILL.md): reviewer.md said put Smoke result in CI-status section. R2-fixed SKILL.md defines Smoke as an independent gate with its own line. Updated step 5c text to instruct writing a separate Smoke: line parsed independently from CI status by the pre-merge hook. 2. R3 BLOCKING #2 (reviewer.md tools field missing GH-contents tool): The R2 freshness section said use mcp__github__get_file_contents when invoked outside a session workspace, but the agents tools declaration did not include that tool. Added it. Also tightened the freshness section: session-workspace dispatch is the canonical Mode 2 path; outside-session uses GH-contents reads at PR head SHA, not silent skip. 3. R3 BLOCKING #3 (verify-task SKILL.md markdown italic typo): The canonical phrase Bot self-approval bypass per feedback_self_authored_pr_merge_constraints inside the surfacing message was being interpreted by markdown italic, mangling the underscores into asterisks. Wrapped the phrase in backticks (inline code) so the literal underscores are preserved verbatim. The grep gate in step 3 already used backticks; this fixes the parallel error-message instance. 4. R3 BLOCKING #4 (reviewer.md Mode 2 body template incomplete): The Review body format Mode 2 section omitted the new Smoke line and Adoption sweep table that R2-fixed SKILL.md step 9 added. Synced the reviewer.md body template with the SKILL canonical version. 5. R3 NON-BLOCKING (auditor.md hardcoded owner/repo): Replaced edobry/minsky in the example call with placeholders plus a comment explaining how to derive owner/repo from PR context (pr.head.repo.owner.login / pr.head.repo.name). Reduces copy-paste misuse risk on forks or different remotes. R3 spec verification: not present (R3 focused on consistency findings, spec criteria treated as already covered by R1/R2 verification). Convergence note: R3 found real, substantive issues — not bikeshedding or self-reversal. Each finding traces to a gap or typo in my R2 fixes. The convergence-via-bypass pattern does not yet apply; this is normal gap-filling iteration.
All BLOCKING findings addressed in commit 4234055. R3 BLOCKING #1 (reviewer.md step 5c CI-status vs Smoke contradiction): updated step 5c to instruct a separate Smoke: line parsed independently from CI status. R3 BLOCKING #2 (reviewer.md tools field missing GH-contents): added mcp__github__get_file_contents to tools field; tightened freshness section. R3 BLOCKING #3 (verify-task markdown italic typo on canonical phrase): wrapped phrase in backticks. R3 BLOCKING #4 (reviewer.md Mode 2 body template): synced with SKILL.md step 9 (Smoke line + Adoption sweep table). R3 NON-BLOCKING (auditor.md hardcoded owner/repo): replaced with placeholders + derivation comment. Stale on commit 4048e27; new HEAD 4234055 supersedes.
There was a problem hiding this comment.
Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown
Overall solid restructuring: reviewer now owns spec verification + adoption sweep + smoke, and verify-task is a thin closeout wrapper. The earlier inconsistencies called out in Iteration 1 are addressed: the Smoke line is independent from CI status, and the Adoption sweep section appears in both reviewer guidance and SKILL.md. Two non-blocking nits remain: mixed numbering under step 6 in SKILL.md (6.1–6.3 then 6a/6b) and practical limits for adoption sweeps outside a session given only get_file_contents (consider a directory-listing tool or scope clarification). No blockers found; proceeding with comments only.
Findings
- [NON-BLOCKING] .claude/skills/review-pr/SKILL.md:200 — Mixed subsection numbering under step 6 is confusing (6.1–6.3 followed by 6a/6b)
Step 6 introduces decimal substeps6.1,6.2,6.3, but subsequent subsections use lettered suffixes6aand6b(e.g.,### 6a. Assess documentation impact,### 6b. Aggregate observations…). Mixing numbering styles in the same step can cause cross-references to drift or confuse implementers following the protocol. Suggest standardizing on one scheme (either decimal6.4,6.5or consistently lettered6a,6b) and updating cross-references accordingly (e.g.,.claude/agents/reviewer.mdreferencesstep 6b). - [NON-BLOCKING] .claude/agents/reviewer.md:84 — Adoption sweep outside a session lacks a directory-listing tool, making whole-repo sweeps impractical
The Source freshness section (step 2 context) instructs performing adoption-sweep reads viamcp__github__get_file_contentswhen invoked outside a session. However, without a complementary GitHub "list directory" capability, enumerating candidate files for cross-repo greps becomes manual and brittle. Suggest adding or referencing amcp__github__list_directory/get_tree-style tool, or explicitly scoping the outside-session sweep to changed files + known consumers to avoid implying a whole-repo sweep is feasible over single-file GETs.
Inline comments
- .claude/agents/auditor.md:33 — Nice improvement replacing hardcoded
owner/repowith placeholders and guidance to derive from PR context. This reduces copy-paste footguns when auditing forks.
Summary
Implements the option B architectural restructure locked on 2026-05-01: the reviewer subagent owns spec verification + adoption sweep + smoke test at review time, and
/verify-taskis reduced to a thin closeout wrapper for the bypass-merge fallback path.Motivation & Context
/verify-taskpreviously dispatched the auditor subagent post-merge to re-verify spec criteria. This duplicated the work the reviewer subagent already does at review time, introduced the stale-local-main false-FAIL bug on mt#1485 (auditor read pre-merge content from a local main that hadn't been pulled in days), and muddled the architecture (two verification surfaces with overlapping responsibility).session_pr_mergealready auto-sets DONE on successful merge (src/domain/session/session-merge-operations.ts:519-544), so/verify-taskonly ever fires on the bypass-merge path wheresession_pr_mergewas not used. In that fallback case, all/verify-taskactually needs to do is set DONE — the audit is redundant because the reviewer-bot has already verified, and the bypass-merge audit trail in the merge commit names the substantive fixes.Decision (locked 2026-05-01)
Option B (selected): Fold spec verification + adoption check into
/review-pr. Retire/verify-taskas a verification surface. Other options considered and rejected:/verify-taskto adoption-only): still leaves two verification surfaces; staleness-bug surface persists.Key Changes
.claude/agents/reviewer.md— Mode 2 step 5 expanded into 5a (Spec verification), 5b (Adoption sweep with the >10-export cost-bounding rule), 5c (Smoke test). Reviewer subagent is now the canonical verification surface..claude/skills/review-pr/SKILL.md— Step 6 expanded with substeps 6.1, 6.2, 6.3. Step 9 review body format gained a Smoke pass/fail line in CI status and an Adoption sweep table sub-section under spec verification..claude/skills/verify-task/SKILL.md— Full rewrite as a thin closeout wrapper. No auditor dispatch. No spec re-verification. New protocol:mcp__github__pull_request_read.pr.merged === trueAND merge-commit body contains the canonical bypass-merge audit-trail signature ("Bot self-approval bypass per feedback_self_authored_pr_merge_constraints").Halts with clear surfacing if either condition fails.
.claude/agents/auditor.md— Description narrowed to ad-hoc spec verification. New Source freshness preamble: when given a merge SHA, prefermcp__github__get_file_contentsover local file paths; cross-check against merge commit before trusting any local read. New anti-pattern: never report FAIL based on a stale-local-main read..minsky/rules/task-lifecycle.mdc+ compile outputs — New "Verification surfaces" section documents the reviewer subagent ownership,/verify-task's new role, and the concurrent-merge follow-up. RegeneratedAGENTS.md,CLAUDE.md,.cursor/rules/task-lifecycle.mdcviabun src/cli.ts rules compile.Out of scope
session_pr_merge's auto-DONE behavior — it works correctly./review-prraises the floor for PR-introduced regressions but does NOT cover concurrent-merge interactions; the right structural fix is a hook on main-advancement that runs the smoke against the new HEAD.Acceptance Tests
All met against the in-session content:
reviewer.mdMode 2 step 5b adoption sweep with cost-bounding ruleSKILL.mdstep 6.2 + 6.3 + body format updatedverify-taskSKILL.md no auditor dispatch; two-condition closeoutauditor.mddescription narrowed; freshness preamble existsCLAUDE.mdandAGENTS.mdreflect new architectureverify-taskSKILL.mdmt#1592follow-up filed for concurrent-merge regression detectionTesting
Prompt/skill/rule-text changes only. No unit tests apply. Verification expectation: the next
/verify-taskinvocation on a bypass-merged PR reads the merge commit body, confirms the audit-trail signature, and sets DONE without dispatching the auditor or reading any local files. The rules-compile pipeline regeneratesCLAUDE.md/AGENTS.md/.cursor/rules/from the updated.minsky/rules/task-lifecycle.mdc; pre-commit hook validates compile-output freshness.Memory
feedback_stale_local_main_in_adoption_check.mdupdated: removed "bridge until ships" framing, replaced with a shipped Status section pointing at this PR. Auditor freshness preamble (the residual fix surface) lands here.Ancillary Changes
None — confined to the prompt / skill / rule files listed above plus their compile outputs.
(Had Claude look into this — AI-assisted implementation of mt#1551.)