Skip to content

feat(mt#1551): Fold audit into /review-pr; retire /verify-task as a verification surface#956

Merged
edobry merged 6 commits into
mainfrom
task/mt-1551
May 6, 2026
Merged

feat(mt#1551): Fold audit into /review-pr; retire /verify-task as a verification surface#956
edobry merged 6 commits into
mainfrom
task/mt-1551

Conversation

@minsky-ai
Copy link
Copy Markdown
Contributor

@minsky-ai minsky-ai Bot commented May 5, 2026

Summary

Implements the option B architectural restructure locked on 2026-05-01: the reviewer subagent owns spec verification + adoption sweep + smoke test at review time, and /verify-task is reduced to a thin closeout wrapper for the bypass-merge fallback path.

Motivation & Context

/verify-task previously dispatched the auditor subagent post-merge to re-verify spec criteria. This duplicated the work the reviewer subagent already does at review time, introduced the stale-local-main false-FAIL bug on mt#1485 (auditor read pre-merge content from a local main that hadn't been pulled in days), and muddled the architecture (two verification surfaces with overlapping responsibility).

session_pr_merge already auto-sets DONE on successful merge (src/domain/session/session-merge-operations.ts:519-544), so /verify-task only ever fires on the bypass-merge path where session_pr_merge was not used. In that fallback case, all /verify-task actually needs to do is set DONE — the audit is redundant because the reviewer-bot has already verified, and the bypass-merge audit trail in the merge commit names the substantive fixes.

Decision (locked 2026-05-01)

Option B (selected): Fold spec verification + adoption check into /review-pr. Retire /verify-task as a verification surface. Other options considered and rejected:

  • Option A (narrow /verify-task to adoption-only): still leaves two verification surfaces; staleness-bug surface persists.
  • Option C (two-gate design — pre-merge correctness + post-merge adoption): cleaner conceptually than today, but still two surfaces and still needs the staleness fix.

Key Changes

.claude/agents/reviewer.md — Mode 2 step 5 expanded into 5a (Spec verification), 5b (Adoption sweep with the >10-export cost-bounding rule), 5c (Smoke test). Reviewer subagent is now the canonical verification surface.

.claude/skills/review-pr/SKILL.md — Step 6 expanded with substeps 6.1, 6.2, 6.3. Step 9 review body format gained a Smoke pass/fail line in CI status and an Adoption sweep table sub-section under spec verification.

.claude/skills/verify-task/SKILL.md — Full rewrite as a thin closeout wrapper. No auditor dispatch. No spec re-verification. New protocol:

  1. Check status (entry gate).
  2. Read PR via mcp__github__pull_request_read.
  3. Confirm pr.merged === true AND merge-commit body contains the canonical bypass-merge audit-trail signature ("Bot self-approval bypass per feedback_self_authored_pr_merge_constraints").
  4. Set DONE.

Halts with clear surfacing if either condition fails.

.claude/agents/auditor.md — Description narrowed to ad-hoc spec verification. New Source freshness preamble: when given a merge SHA, prefer mcp__github__get_file_contents over local file paths; cross-check against merge commit before trusting any local read. New anti-pattern: never report FAIL based on a stale-local-main read.

.minsky/rules/task-lifecycle.mdc + compile outputs — New "Verification surfaces" section documents the reviewer subagent ownership, /verify-task's new role, and the concurrent-merge follow-up. Regenerated AGENTS.md, CLAUDE.md, .cursor/rules/task-lifecycle.mdc via bun src/cli.ts rules compile.

Out of scope

  • Refactoring the Mode 1 sectioning protocol (just shipped via mt#1485; stable).
  • Changing session_pr_merge's auto-DONE behavior — it works correctly.
  • Concurrent-merge regression detection — tracked separately in mt#1592. Two PRs that pass CI individually but interact badly post-merge are a real failure mode the post-merge baseline used to catch. Folding the smoke into /review-pr raises the floor for PR-introduced regressions but does NOT cover concurrent-merge interactions; the right structural fix is a hook on main-advancement that runs the smoke against the new HEAD.

Acceptance Tests

All met against the in-session content:

  • reviewer.md Mode 2 step 5b adoption sweep with cost-bounding rule
  • SKILL.md step 6.2 + 6.3 + body format updated
  • verify-task SKILL.md no auditor dispatch; two-condition closeout
  • auditor.md description narrowed; freshness preamble exists
  • ✅ Task-lifecycle docs in CLAUDE.md and AGENTS.md reflect new architecture
  • ✅ Recurrence-test scenario walked in verify-task SKILL.md
  • mt#1592 follow-up filed for concurrent-merge regression detection

Testing

Prompt/skill/rule-text changes only. No unit tests apply. Verification expectation: the next /verify-task invocation on a bypass-merged PR reads the merge commit body, confirms the audit-trail signature, and sets DONE without dispatching the auditor or reading any local files. The rules-compile pipeline regenerates CLAUDE.md / AGENTS.md / .cursor/rules/ from the updated .minsky/rules/task-lifecycle.mdc; pre-commit hook validates compile-output freshness.

Memory

feedback_stale_local_main_in_adoption_check.md updated: removed "bridge until ships" framing, replaced with a shipped Status section pointing at this PR. Auditor freshness preamble (the residual fix surface) lands here.

Ancillary Changes

None — confined to the prompt / skill / rule files listed above plus their compile outputs.

(Had Claude look into this — AI-assisted implementation of mt#1551.)

…erification surface

Implements the option B architectural restructure locked on 2026-05-01:
the reviewer subagent owns spec verification + adoption sweep + smoke
test at review time, and /verify-task is reduced to a thin closeout
wrapper for the bypass-merge fallback path.

Background: /verify-task previously dispatched the auditor subagent
post-merge to re-verify spec criteria. This duplicated the work the
reviewer subagent already does at review time and introduced today
stale-local-main false-FAIL bug on mt#1485. The session_pr_merge tool
already auto-sets DONE on successful merge, so /verify-task only ever
fires on the bypass-merge path where session_pr_merge was not used.

Changes:

1. .claude/agents/reviewer.md
   - Mode 2 step 5 expanded into 5a (Spec verification), 5b (Adoption
     sweep with the 10-export cost-bounding rule), 5c (Smoke test).
   - Reviewer subagent is now the canonical verification surface.

2. .claude/skills/review-pr/SKILL.md
   - Step 6 expanded with substeps 6.1 (Spec verification), 6.2
     (Adoption sweep), 6.3 (Smoke test).
   - Step 9 review body format updated: Smoke pass/fail line in CI
     status, Adoption sweep table sub-section in spec verification.

3. .claude/skills/verify-task/SKILL.md
   - Full rewrite as a thin closeout wrapper.
   - No auditor dispatch. No spec re-verification.
   - New protocol: read PR via mcp__github__pull_request_read, confirm
     pr.merged === true AND merge-commit body contains the canonical
     bypass-merge audit-trail signature, then set DONE.
   - Halts with surfacing if either condition fails.

4. .claude/agents/auditor.md
   - Description narrowed to ad-hoc spec verification.
   - New Source freshness preamble: when post-merge audit, prefer
     mcp__github__get_file_contents with ref set to merge SHA over
     local file paths. Cross-check against merge commit before
     trusting any local read.
   - New anti-pattern: never report FAIL based on a stale-local-main
     read.

5. .minsky/rules/task-lifecycle.mdc + compile outputs
   - Added Verification surfaces section documenting the reviewer
     subagent ownership, the /verify-task new role, and the
     concurrent-merge follow-up (mt#1592).
   - Regenerated AGENTS.md, CLAUDE.md, .cursor/rules/task-lifecycle.mdc
     via bun src/cli.ts rules compile.

Acceptance tests (all met against the in-session content):
- reviewer.md Mode 2 step 5b adoption sweep with cost-bounding rule
- SKILL.md step 6.2 + 6.3 + body format updated
- verify-task SKILL.md no auditor dispatch; two-condition closeout
- auditor.md description narrowed; freshness preamble exists
- task-lifecycle docs in CLAUDE.md and AGENTS.md reflect new architecture
- recurrence-test scenario walked in verify-task SKILL.md
- mt#1592 follow-up exists for concurrent-merge regression detection

Memory feedback_stale_local_main_in_adoption_check.md updated:
removed bridge-until-ships framing, replaced with shipped Status
section; the auditor freshness preamble lands in this PR.
@minsky-ai minsky-ai Bot added the authorship/co-authored Co-authored by human and AI agent label May 5, 2026
Copy link
Copy Markdown

@minsky-reviewer minsky-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown


The architectural intent is implemented coherently: reviewer now owns spec verification, adoption sweep, and smoke test; verify-task is reduced to a closeout wrapper; lifecycle/docs updated. However, there’s a concrete break: auditor.md’s new freshness protocol mandates using a GitHub contents tool, but this agent’s tool list doesn’t include it, making the guidance unenforceable. Additionally, the review body’s new “Smoke” line lacks a specified “skipped” representation despite step 6.3 allowing skips, creating format ambiguity. Minor duplication risk exists around the hardcoded bypass-audit phrase across multiple files. Please add the missing tool (or adjust the protocol), define the smoke field’s allowed values including “skipped,” and consider centralizing the audit-trail phrase before merging.

Findings

  • [BLOCKING] .claude/agents/auditor.md:8 — Auditor instructs use of mcp__github__get_file_contents but the tool is not declared/available in this agent
    In the new "Source freshness preamble" you require using mcp__github__get_file_contents with a ref to the merge SHA for post-merge audits. However, at the top of this file the declared tools are only: Read, Glob, Grep, Bash, mcp__minsky__tasks_get, mcp__minsky__tasks_spec_get — there is no GitHub contents tool exposed to this agent. As written, the agent cannot follow its own protocol and would fall back to local reads (the exact stale-source path this change aims to retire). Fix by either (a) adding the GitHub contents tool to this agent’s allowed tools list, or (b) adjusting the protocol to use a tool that actually exists for this agent (e.g., route reads through a parent with the tool, or add an explicit dispatch step).
  • [BLOCKING] .claude/skills/review-pr/SKILL.md:334 — Review body format now requires a "Smoke" line but Step 6.3 allows skipping; missing guidance on representing a skipped smoke in CI status
    You added a required "Smoke: <pass/fail — <command run> on PR branch>" line to the review body (around lines 334–338). Step 6.3 states docs/prompt-only PRs may skip the smoke with rationale. The required body format does not define how to represent a skipped smoke (e.g., "skipped — rationale" vs pass/fail), and the CI-status summary above only covers N passed/M failed without integrating smoke. Tighten the format and step 6.3 to specify the exact allowed values for Smoke (pass/fail/skipped with rationale) and how it interacts with CI status to avoid reviewers being blocked by the pre-merge hook expecting a "Smoke:" line when a skip is legitimate.
  • [NON-BLOCKING] .claude/skills/verify-task/SKILL.md:63 — Canonical audit-trail phrase hardcoded without a single source of truth constant; risk of divergence from reviewer.md and AGENTS.md copies
    This skill greps for the literal phrase "Bot self-approval bypass per feedback_self_authored_pr_merge_constraints" (around lines 63–73). The same phrase now appears in reviewer skill (step 8) and in multiple docs (AGENTS.md/CLAUDE.md/rules). Hardcoding the string in multiple places invites drift and false negatives if the wording changes. Consider extracting this phrase into a single constant in a shared rules file (or a memory-backed setting surfaced via MCP) and referencing it across skills/agents, or at least add a cross-file comment anchor so future edits update all copies.

edobry added 2 commits May 5, 2026 20:31
Two BLOCKING findings + one NON-BLOCKING from minsky-reviewer[bot] R1
on commit 2548d1b. Both BLOCKING addressed here; NON-BLOCKING deferred
to mt#1599 as a separate cross-file refactor.

Substantive fixes:

1. R1 BLOCKING #1: auditor.md instructs use of mcp__github__get_file_contents
   in the freshness preamble but the agent tools field did not declare it,
   making the protocol unenforceable. Added mcp__github__get_file_contents
   to the tools list. The auditor can now actually follow its own protocol.

2. R1 BLOCKING #2: SKILL.md step 9 added a required Smoke line, but step 6.3
   allowed skipping for docs/prompt-only PRs without specifying how to
   represent a skipped smoke. Added an explicit three-value enumeration
   (pass / fail / skipped) with format requirements in both step 6.3 and
   step 9. The pre-merge hook can now parse the line unambiguously.

Intentionally deferred to mt#1599:

- R1 NON-BLOCKING #3: canonical bypass-merge audit-trail phrase hardcoded
  across 5+ files (verify-task SKILL.md grep, review-pr SKILL.md, reviewer.md,
  task-lifecycle compile outputs, memory). Real drift risk but cross-file
  refactor + drift-detection mechanism is its own scope. Filed mt#1599 with
  full context.

CI passed 2/2 on the prior commit; no logic changed in this commit (only
prompt/skill text and one tools-field entry).
@minsky-ai minsky-ai Bot dismissed minsky-reviewer[bot]’s stale review May 6, 2026 00:32

Both BLOCKING findings addressed in commit 64aab6c. R1 BLOCKING #1 (auditor.md missing mcp__github__get_file_contents in tools field): added the tool to the agent declaration. R1 BLOCKING #2 (review body Smoke line ambiguity for skipped case): added explicit three-value enumeration (pass / fail / skipped) with format requirements in step 6.3 and step 9. R1 NON-BLOCKING #3 (canonical audit phrase hardcoded across files): deferred to mt#1599 as a separate cross-file refactor. This review is stale on commit 2548d1b; the new HEAD 64aab6c supersedes it.

Copy link
Copy Markdown

@minsky-reviewer minsky-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown


The architectural shift is coherent: spec verification, adoption sweep, and smoke are now owned by the reviewer; verify-task is a thin closeout; and the auditor includes a solid source-freshness protocol. However, the reviewer’s new adoption sweep lacks a safe source/freshness mechanism or GitHub-contents tool, so grep likely targets a potentially stale local workspace, reintroducing the staleness risk this PR aims to retire. Also, the canonical bypass-merge audit phrase is hardcoded in multiple places, inviting drift (non-blocking). Please add a source-freshness protocol and provide the reviewer with a tool to read at the PR ref (or constrain the sweep accordingly). Once addressed, this should be ready to merge.

Findings

  • [BLOCKING] .claude/agents/reviewer.md:110 — Adoption sweep (step 5b) assumes access to “post-PR codebase” but reviewer lacks a safe source/read tool for PR-branch-wide grep; high risk of stale-local-main reads
    In step 5b you instruct the reviewer to “sweep the post-PR codebase” by grepping across src/, tests/, docs, CLAUDE.md, and AGENTS.md for consumers of new exports. However, the reviewer agent’s declared tools are only Read, Glob, Grep, Bash, mcp__minsky__session_pr_review_context, mcp__minsky__session_pr_review_submit, mcp__minsky__tasks_spec_get — there is no GitHub-contents tool or session checkout tool exposed to guarantee the grep runs against the PR branch rather than a potentially stale local main (the same staleness surface called out in mt#1485). Unlike the updated auditor (which now has mcp__github__get_file_contents and a Source Freshness preamble), the reviewer has no freshness protocol and no means to fetch whole-repo contents at the PR ref.

As written, the reviewer is likely to run Grep/Glob against whatever is in the local workspace, which may be hours/days behind origin and will not reflect the PR branch’s full post-change state. That undermines the “canonical verification surface” claim for adoption and reintroduces the stale-source false-FAIL/false-PASS risk you just retired in /verify-task.

Fix options:

  • Add a Source Freshness preamble to .claude/agents/reviewer.md mirroring the auditor’s, and expose a GitHub-contents tool (e.g., mcp__github__get_file_contents or an equivalent list-and-read-at-ref API) to this agent; require adoption-sweep reads at ref: <PR head sha>.
  • Or, adjust the protocol to constrain the sweep to changed files plus consumers resolvable via parsedDiff and the session’s PR workspace, and/or explicitly dispatch a helper with the needed tool.

Until the reviewer can deterministically read the PR branch’s codebase (not stale local), step 5b is not safely implementable and risks incorrect verification outcomes.

  • [NON-BLOCKING] .claude/skills/verify-task/SKILL.md:63 — Hardcoded audit-trail phrase repeated across multiple files without a single source; adds drift risk
    This skill greps the merge-commit body for the literal phrase:
Bot self-approval bypass per feedback_self_authored_pr_merge_constraints

Similar literal copies are introduced in .cursor/rules/task-lifecycle.mdc, .minsky/rules/task-lifecycle.mdc, AGENTS.md, CLAUDE.md, and referenced in .claude/skills/review-pr/SKILL.md step 8. Without a single source of truth (constant/memory/rule anchor), wording drift will cause false negatives or inconsistent guidance.

Suggestion: extract this string to a shared rule/memory key (e.g., feedback_self_authored_pr_merge_constraints.audit_phrase) or a central config file (rules/constants.mdc) and reference it across skills/docs. At minimum, add a prominent comment anchor in each location linking to the canonical definition to reduce divergence risk.

Copy link
Copy Markdown

@minsky-reviewer minsky-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown


Solid architectural realignment that moves verification into the reviewer and simplifies /verify-task, but a few inconsistencies reintroduce staleness and gating ambiguity. Blocking: (1) the auditor still mandates running tests via local Bash without ensuring the execution workspace matches the verified Source ref, risking stale-local false FAILs; (2) the new Smoke: line’s relationship to the CI status N/M counts and the hook parser contract is undefined; (3) /verify-task contains contradictory guidance about proceeding to DONE when the audit-trail phrase is missing versus the stated invariant that both conditions are required. Non-blocking: add a concrete mcp__github__get_file_contents usage example, tighten PR lookup by head filter across open/closed, and add an anchor/constant for the canonical bypass-merge phrase. Address the blocking items before merge.

Findings

  • [BLOCKING] .claude/agents/auditor.md:56 — Preamble prefers GitHub reads but baseline still requires running local tests without a freshness-sync protocol
    At .claude/agents/auditor.md:56-69 you require running the full test suite, typecheck, lint, and an E2E smoke via Bash as "Baseline" checks, while the new "Source freshness preamble" (lines 18-35) instructs preferring mcp__github__get_file_contents over local reads because the local checkout may be stale. There is no explicit protocol for ensuring the local workspace used by Bash actually matches the merge SHA (e.g., git fetch && git checkout <merge-sha> or running in a clean workspace at that ref). As written, the agent may read code from GitHub at the merge SHA but then run tests against a stale local main — reintroducing the exact false-FAIL class (mt#1485) this change aims to retire.

Please add an explicit, enforceable step before any Bash execution that synchronizes the execution context to the verified source ref (e.g., fetch + checkout the merge commit, or run commands against a temp worktree at <merge-sha>). Alternatively, carve out guidance that baseline checks requiring execution MUST be skipped unless the local workspace is confirmed to match the Source ref. Without this, auditors can still produce FAIL based on stale-local execution artifacts.

  • [BLOCKING] .claude/skills/review-pr/SKILL.md:367 — Body format change adds required Smoke: line, but Step 6.3 doesn’t specify integration with CI-status count; potential hook mismatch
    At .claude/skills/review-pr/SKILL.md:367-370 the review body template now includes a mandatory **Smoke:** <one of: ...> line. Step 6.3 (lines 153-172) defines allowed values and notes that pre-merge hook treats skipped as valid. However, the CI status line above still uses the legacy "N checks passed, M failed" summary and doesn’t define whether the smoke test contributes to N/M or is orthogonal. If the hook parses N/M to gate merges, ambiguity remains about whether a fail on Smoke must increment M or is independently blocking. The prior review flagged this; the wording here adds allowed tokens but still leaves the accounting undefined.

Please explicitly define: (a) whether Smoke is included in the N/M counts, (b) how skipped affects those counts, and (c) the exact parser contract for the pre-merge hook (e.g., regex anchors). Without this, reviewers may produce bodies that pass local lint but fail the gate, or worse, the gate may ignore a failing smoke.

  • [BLOCKING] .claude/skills/verify-task/SKILL.md:66 — Ambiguous instruction on bypass-merge without audit phrase contradicts "never set DONE without both conditions"
    At .claude/skills/verify-task/SKILL.md:56-85 you define two required closeout conditions (A: pr.merged === true; B: merge-commit body contains the canonical phrase). Under the B-missing branch (lines ~74-85), the guidance says to "halt and surface" but then adds: "post a follow-up comment or issue documenting the bypass), or accept that this merge bypassed the audit discipline. Surface to the user before proceeding." This implies proceeding to DONE despite the missing audit trail, which directly contradicts the later Key principle: "Never set DONE without both conditions."

Please make this consistent and unambiguous: either (1) strictly block DONE when the phrase is missing (recommended, matching the Key principle), or (2) explicitly define the exceptional path where DONE is allowed and update the Key principles and Step 4 accordingly. As written, an implementer could reasonably set DONE after "surface to the user," violating the stated invariant.

  • [NON-BLOCKING] .claude/agents/auditor.md:1 — Tooling addition mentions mcp__github__get_file_contents but lacks example usage and required args; risk of mis-invocation
    You added mcp__github__get_file_contents to the tools list (line 9) and reference it throughout the freshness preamble, but the skill never shows a concrete call shape (owner/repo/path/ref, error handling). Given prior staleness bugs, providing a short snippet or explicit parameter contract here would reduce misuse (e.g., forgetting to set ref to the merge SHA). Consider adding an example call under the preamble or in the Protocol section item 3.
  • [NON-BLOCKING] .claude/skills/verify-task/SKILL.md:27 — PR location procedure omits the common task/mt-<id> branch-to-PR resolution via head filter; may be brittle
    In Step 2 (lines 27-42), you mention searching by branch-name pattern using mcp__github__list_pull_requests with state: "closed". Pragmatically, the most reliable lookup is filtering PRs by head (owner:branch) and state in both open and closed, then falling back to session records. Consider documenting a deterministic resolution order and including open to handle the rare sequence where /verify-task is invoked just after merge but before GH marks the PR closed.
  • [NON-BLOCKING] .claude/skills/verify-task/SKILL.md:1 — Canonical audit-trail phrase duplicated across multiple files without a shared anchor; drift risk persists
    This PR retains hardcoded uses of the phrase "Bot self-approval bypass per feedback_self_authored_pr_merge_constraints" here (lines 66-71) and adds/updates references in .cursor/rules/task-lifecycle.mdc, .minsky/rules/task-lifecycle.mdc, AGENTS.md, and CLAUDE.md. There is still no single source of truth or cross-file comment anchor. While prior review marked this as non-blocking, the blast radius just grew. Consider adding an explicit Anchor: comment tag next to each occurrence, or referencing a memory entry key, to reduce future drift.

Spec verification

Criterion Status Evidence
Expand reviewer to own spec verification + adoption sweep + smoke test at review time (option B) Met .claude/agents/reviewer.md:79-98 expands step 5 into 5a/5b/5c; .claude/skills/review-pr/SKILL.md:118-176 adds 6.1/6.2/6.3 with details and body format updates at :367-370.
Retire /verify-task as a verification surface; reduce to thin closeout wrapper for bypass-merge path only Met .claude/skills/verify-task/SKILL.md:1-22 rewrites description to thin closeout; process removes auditor dispatch; step 3 enforces merged+audit-trail conditions before setting DONE (:56-85).
Auditor agent narrowed to ad-hoc use with source freshness preamble and anti-staleness pattern Met .claude/agents/auditor.md:1-12 narrows role; :18-35 adds Source freshness preamble; :93-100 reiterates anti-stale anti-pattern. Tools list now includes mcp__github__get_file_contents at line 9.

R2 surfaced 4 BLOCKING + 5 NON-BLOCKING on merge commit acbdca2.
All 4 BLOCKING + 2 NON-BLOCKING addressed here; remaining NON-BLOCKING
(canonical phrase drift) is the deferred mt#1599 concern.

Substantive fixes:

1. R2b BLOCKING #3 (verify-task SKILL.md contradiction):
   The audit-trail-missing branch had ambiguous language (halt and
   surface, then or accept that this merge bypassed audit discipline)
   that contradicted the Never set DONE without both conditions key
   principle. Tightened: missing audit phrase always halts unconditionally.
   Removed proceed-on-missing language. Recovery path is documented but
   does NOT proceed to DONE without manual user-reviewed exception.

2. R2b BLOCKING #2 (Smoke / CI-status integration ambiguity):
   The Smoke line was added to review body but its relationship to the
   CI status N/M counts was undefined. Clarified: Smoke is an independent
   gate. CI N/M counts only GitHub Actions check_runs; Smoke is its own
   review-body field that the pre-merge hook parses independently.
   Smoke=fail blocks regardless of CI; Smoke=skipped is a valid value;
   Smoke=pass is a positive signal but does not increment CI N. Added
   parser contract explanation in step 6.3 and step 9 body format.

3. R2a BLOCKING #1 (reviewer.md adoption sweep workspace):
   Added a Source freshness section to Mode 2 Protocol. The reviewer
   subagent dispatched by /review-pr typically runs in a session
   workspace (which is at PR HEAD by definition since sessions check
   out from origin), so local Grep is safe. When invoked outside a
   session workspace, the agent must constrain reads to parsedDiff
   only and explicitly note the limitation. Mirrors the auditor
   freshness preamble pattern.

4. R2b BLOCKING #1 (auditor.md baseline tests vs Source ref):
   Auditor preamble preferred GitHub reads but baseline checks still
   ran Bash tests against possibly-stale local workspace. Added explicit
   guidance: if running in session workspace (at PR HEAD), baseline
   checks are safe; if running in main workspace and head does not
   match Source ref, EITHER ask parent to dispatch via session at
   merge SHA OR skip baseline checks with rationale recorded in the
   report. Never report FAIL on stale-local test run.

Worth-it NON-BLOCKING addressed:

- R2b NON-BLOCKING #4 (auditor.md missing example): added concrete
  mcp__github__get_file_contents call shape under freshness preamble.
- R2b NON-BLOCKING #5 (verify-task PR lookup): step 2 now searches
  both open and closed states to handle the race window between merge
  and GitHub finishing the close marker.

Intentionally deferred (R2 reaffirmed):

- R2 NON-BLOCKING #6 (canonical audit phrase drift): tracked in mt#1599
  as a separate cross-file refactor. Real concern, but cross-file
  + drift-detection mechanism is its own scope.

Spec verification table from R2: 3 of 3 success criteria reported Met.
The architectural intent of the PR is unchanged; R2 BLOCKING items
were all gap-filling refinements, not redirects.
@minsky-ai minsky-ai Bot dismissed stale reviews from minsky-reviewer[bot] and minsky-reviewer[bot] May 6, 2026 00:38

All BLOCKING findings addressed in commit 4048e27. R2a BLOCKING #1 (reviewer adoption-sweep workspace freshness): added Source freshness section to Mode 2 Protocol clarifying that session-workspace dispatch is at PR HEAD; outside-session invocation must constrain to parsedDiff only. R2 NON-BLOCKING #2 (canonical audit phrase drift): deferred to mt#1599 per the deferral rationale documented in earlier review. This review is stale on commit acbdca2; the new HEAD 4048e27 supersedes it.

Copy link
Copy Markdown

@minsky-reviewer minsky-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown


Good architectural shift toward option B: reviewer now owns spec verification, adoption sweep, and smoke; verify-task reduced to a strict closeout wrapper; auditor narrowed with source-freshness guidance. However, several inconsistencies and a literal-string typo introduce real risk. Reviewer.md still tells authors to put Smoke under CI status and its body template omits the new Smoke line and Adoption sweep sub-section defined in the SKILL. Adoption sweep still lacks a safe non-session read path (no GitHub-contents tool or hard requirement for session dispatch). Verify-task contains a typo in the canonical bypass-merge audit phrase. Please reconcile these before merge; a minor non-blocking nit suggests de-hardcoding owner/repo in the auditor’s example call.

Findings

  • [BLOCKING] .claude/agents/reviewer.md:120 — Reviewer 5c instructs to put Smoke result under CI status, contradicting SKILL.md where Smoke is an independent gate/line
    In step 5c, the text says: "Record pass/fail and include the result in the review body's CI-status section." However, .claude/skills/review-pr/SKILL.md explicitly defines Smoke as an independent gate with its own **Smoke:** line and clarifies that CI N/M counts only GitHub Actions check_runs. This inconsistency will cause reviewers to format bodies incorrectly and risks parser/gate mismatches. Align reviewer.md with the SKILL: instruct to write a separate Smoke: line (and not fold it into CI status).
  • [BLOCKING] .claude/agents/reviewer.md:89 — Adoption sweep relies on local Grep/Glob without guaranteed fresh PR-branch source or a GitHub-contents tool; staleness risk remains
    The new "Source freshness" note helps when dispatched into a session workspace, but the reviewer still has no GitHub-contents read/list tool to reliably sweep the post-PR codebase when not in a session. Step 5b mandates grepping across src/, tests/, docs, CLAUDE.md, AGENTS.md. In the rare but supported "Invoked outside a session" case, the spec instructs to constrain to diff-only — this silently weakens the canonical verification surface by skipping the whole-repo adoption sweep entirely. Either (a) expose a GitHub contents/list tool to the reviewer and require reads at the PR ref, or (b) make the session-dispatch path a hard requirement for Mode 2 so that adoption sweeps never run against potentially stale local workspaces or get skipped.
  • [BLOCKING] .claude/skills/verify-task/SKILL.md:89 — Typo in canonical audit phrase in guidance text breaks literal-grep invariant
    Under Condition B’s missing-phrase branch, the quoted string is rendered as "Bot self-approval bypass per feedback*self_authored_pr_merge_constraints" (note the stray * between feedback and self_authored). Elsewhere in this file (line ~72) and across rules/docs, the canonical literal is "Bot self-approval bypass per feedback_self_authored_pr_merge_constraints". This typo will cause a reviewer to copy the wrong string, and any automated grep will fail, leading to false blocks. Fix the string to match the canonical literal exactly.
  • [BLOCKING] .claude/agents/reviewer.md:247 — Mode 2 review body template omits required Smoke: line and Adoption sweep sub-section defined in SKILL.md
    In .claude/skills/review-pr/SKILL.md step 9, the body format now requires a separate **Smoke:** ... line and an "Adoption sweep" table under the Spec verification block. The corresponding "Review body format — Mode 2" section here shows only **CI status:** ... and a Spec verification table without the Adoption sweep sub-section. This divergence will lead to reviewer outputs that fail the pre-merge hook or omit required content. Update the body template in this file to include the Smoke: line and the Adoption sweep sub-section to stay consistent with the canonical SKILL.
  • [NON-BLOCKING] .claude/agents/auditor.md:24 — Hardcodes owner/repo in example mcp__github__get_file_contents call; risks copy-paste misuse across forks
    The example shows owner: "edobry", repo: "minsky" inline. While illustrative, agents often copy example blocks literally. Consider adding a comment or placeholder values (e.g., <owner>, <repo>) and a note to derive owner/repo from the task/PR context to avoid attempts to read the wrong repository when auditing forks or different remotes.

R3 surfaced 4 BLOCKING + 1 NON-BLOCKING on commit 4048e27. All real
inconsistencies my R2 fixes introduced (or failed to propagate). All
addressed here.

Substantive fixes:

1. R3 BLOCKING #1 (reviewer.md step 5c contradicts SKILL.md):
   reviewer.md said put Smoke result in CI-status section. R2-fixed
   SKILL.md defines Smoke as an independent gate with its own line.
   Updated step 5c text to instruct writing a separate Smoke: line
   parsed independently from CI status by the pre-merge hook.

2. R3 BLOCKING #2 (reviewer.md tools field missing GH-contents tool):
   The R2 freshness section said use mcp__github__get_file_contents
   when invoked outside a session workspace, but the agents tools
   declaration did not include that tool. Added it. Also tightened
   the freshness section: session-workspace dispatch is the canonical
   Mode 2 path; outside-session uses GH-contents reads at PR head SHA,
   not silent skip.

3. R3 BLOCKING #3 (verify-task SKILL.md markdown italic typo):
   The canonical phrase Bot self-approval bypass per
   feedback_self_authored_pr_merge_constraints inside the surfacing
   message was being interpreted by markdown italic, mangling the
   underscores into asterisks. Wrapped the phrase in backticks (inline
   code) so the literal underscores are preserved verbatim. The grep
   gate in step 3 already used backticks; this fixes the parallel
   error-message instance.

4. R3 BLOCKING #4 (reviewer.md Mode 2 body template incomplete):
   The Review body format Mode 2 section omitted the new Smoke line
   and Adoption sweep table that R2-fixed SKILL.md step 9 added.
   Synced the reviewer.md body template with the SKILL canonical version.

5. R3 NON-BLOCKING (auditor.md hardcoded owner/repo):
   Replaced edobry/minsky in the example call with placeholders
   plus a comment explaining how to derive owner/repo from PR context
   (pr.head.repo.owner.login / pr.head.repo.name). Reduces copy-paste
   misuse risk on forks or different remotes.

R3 spec verification: not present (R3 focused on consistency findings,
spec criteria treated as already covered by R1/R2 verification).

Convergence note: R3 found real, substantive issues — not bikeshedding
or self-reversal. Each finding traces to a gap or typo in my R2 fixes.
The convergence-via-bypass pattern does not yet apply; this is normal
gap-filling iteration.
@minsky-ai minsky-ai Bot dismissed minsky-reviewer[bot]’s stale review May 6, 2026 02:39

All BLOCKING findings addressed in commit 4234055. R3 BLOCKING #1 (reviewer.md step 5c CI-status vs Smoke contradiction): updated step 5c to instruct a separate Smoke: line parsed independently from CI status. R3 BLOCKING #2 (reviewer.md tools field missing GH-contents): added mcp__github__get_file_contents to tools field; tightened freshness section. R3 BLOCKING #3 (verify-task markdown italic typo on canonical phrase): wrapped phrase in backticks. R3 BLOCKING #4 (reviewer.md Mode 2 body template): synced with SKILL.md step 9 (Smoke line + Adoption sweep table). R3 NON-BLOCKING (auditor.md hardcoded owner/repo): replaced with placeholders + derivation comment. Stale on commit 4048e27; new HEAD 4234055 supersedes.

Copy link
Copy Markdown

@minsky-reviewer minsky-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent adversarial review (Chinese-wall)
Reviewer: minsky-reviewer[bot] via openai:gpt-5
Tier: unknown


Overall solid restructuring: reviewer now owns spec verification + adoption sweep + smoke, and verify-task is a thin closeout wrapper. The earlier inconsistencies called out in Iteration 1 are addressed: the Smoke line is independent from CI status, and the Adoption sweep section appears in both reviewer guidance and SKILL.md. Two non-blocking nits remain: mixed numbering under step 6 in SKILL.md (6.1–6.3 then 6a/6b) and practical limits for adoption sweeps outside a session given only get_file_contents (consider a directory-listing tool or scope clarification). No blockers found; proceeding with comments only.

Findings

  • [NON-BLOCKING] .claude/skills/review-pr/SKILL.md:200 — Mixed subsection numbering under step 6 is confusing (6.1–6.3 followed by 6a/6b)
    Step 6 introduces decimal substeps 6.1, 6.2, 6.3, but subsequent subsections use lettered suffixes 6a and 6b (e.g., ### 6a. Assess documentation impact, ### 6b. Aggregate observations…). Mixing numbering styles in the same step can cause cross-references to drift or confuse implementers following the protocol. Suggest standardizing on one scheme (either decimal 6.4, 6.5 or consistently lettered 6a, 6b) and updating cross-references accordingly (e.g., .claude/agents/reviewer.md references step 6b).
  • [NON-BLOCKING] .claude/agents/reviewer.md:84 — Adoption sweep outside a session lacks a directory-listing tool, making whole-repo sweeps impractical
    The Source freshness section (step 2 context) instructs performing adoption-sweep reads via mcp__github__get_file_contents when invoked outside a session. However, without a complementary GitHub "list directory" capability, enumerating candidate files for cross-repo greps becomes manual and brittle. Suggest adding or referencing a mcp__github__list_directory/get_tree-style tool, or explicitly scoping the outside-session sweep to changed files + known consumers to avoid implying a whole-repo sweep is feasible over single-file GETs.

Inline comments

  • .claude/agents/auditor.md:33 — Nice improvement replacing hardcoded owner/repo with placeholders and guidance to derive from PR context. This reduces copy-paste footguns when auditing forks.

@edobry edobry merged commit 81d6564 into main May 6, 2026
2 checks passed
@edobry edobry deleted the task/mt-1551 branch May 6, 2026 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

authorship/co-authored Co-authored by human and AI agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant