Skip to content

Gate PR close proposals with coverage proof#219

Open
jesse-merhi wants to merge 1 commit into
mainfrom
jesse/pr-close-coverage-proof
Open

Gate PR close proposals with coverage proof#219
jesse-merhi wants to merge 1 commit into
mainfrom
jesse/pr-close-coverage-proof

Conversation

@jesse-merhi
Copy link
Copy Markdown
Member

Summary

  • add a PR close coverage proof prompt, schema, parser, and runner
  • gate duplicate/superseded PR close proposals in the main apply lane and repair apply lane
  • keep transient proof failures retryable while making proof keep-open decisions terminal for that apply attempt
  • install the proof runner before apply paths that may need it

Verification

  • fnm exec --using v24.14.1 pnpm run check
  • three consecutive clean built-in review passes on the final tree
  • three independent cold reviewers reported no actionable findings
  • live smoke examples checked: 85630 -> 71465 kept open, 85448 -> 80751 kept open, 85448 -> 85707 covered

Copilot AI review requested due to automatic review settings May 28, 2026 07:11
@jesse-merhi jesse-merhi requested a review from a team as a code owner May 28, 2026 07:11
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 28, 2026

Codex review: needs maintainer review before merge. Reviewed May 28, 2026, 3:17 AM ET / 07:17 UTC.

Summary
The PR adds a Codex-backed coverage proof prompt, schema, parser, runner, workflow setup, and tests to gate duplicate/superseded PR close proposals in the main and repair apply lanes.

Reproducibility: not applicable. This PR adds a new automation safety gate rather than reporting a current-main bug. Source inspection confirms current main lacks the new proof runner and retry/skip actions.

Review metrics: 3 noteworthy metrics.

  • Diff size: 11 files, +3551/-231. The patch spans workflow, main apply, repair apply, schema, prompt, and tests, so merge readiness depends on more than unit-level parser behavior.
  • Workflow surface: 1 workflow changed. The sweep workflow now conditionally installs Codex before apply paths, which is operationally important for scheduled and manual runs.
  • Test surface: 2 test files added, 2 existing test files changed. The PR includes focused coverage for the new proof decision logic and repair apply behavior.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🌊 off-meta tidepool
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Resolve the current merge conflicts or rebase onto main.
  • Rerun pnpm run check and the cited covered/keep-open smoke cases after the rebase.

Risk before merge

  • [P1] GitHub currently reports this PR as mergeable=false, mergeable_state=dirty, and rebaseable=false, so it needs conflict resolution before it can land safely.
  • [P1] This change inserts Codex setup and model proof into apply/comment-sync paths; a bad gate or setup condition could block intended PR cleanup even when normal tests pass.

Maintainer options:

  1. Rebase and revalidate the apply gate (recommended)
    Resolve the dirty merge state, then rerun pnpm run check and at least one apply/proof smoke case showing covered and keep-open outcomes before merge.
  2. Pause if proof cost is not acceptable
    If maintainers do not want apply to depend on a Codex proof subprocess yet, hold this PR and keep the existing deterministic duplicate/superseded PR safeguards.

Next step before merge

  • [P2] Maintainer-authored PR is currently dirty against main and touches apply workflow behavior, so the next action is maintainer rebase and validation rather than ClawSweeper repair or close.

Security
Cleared: The diff touches workflow and model subprocess execution, but the proof runner uses the existing credential-scrubbing codexEnv path and I found no concrete secret or supply-chain regression.

Review details

Best possible solution:

Resolve the current merge conflicts, rerun the Node 24 check plus the cited apply/proof smoke cases, and land only after maintainers accept the added model gate for duplicate/superseded PR closes.

Do we have a high-confidence way to reproduce the issue?

Not applicable; this PR adds a new automation safety gate rather than reporting a current-main bug. Source inspection confirms current main lacks the new proof runner and retry/skip actions.

Is this the best way to solve the issue?

Mostly yes; a separate prompt/schema/runner integrated at both apply paths is a maintainable shape for this safety check, but the branch needs conflict resolution and final apply-lane validation before merge.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 666e707a8288.

Label changes

Label changes:

  • add P2: This is a normal-priority automation safety improvement with limited blast radius but important apply-lane consequences.
  • add merge-risk: 🚨 automation: Merging this PR changes when apply/comment-sync installs and invokes Codex, which can affect automated close processing.
  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🌊 off-meta tidepool and patch quality is 🐚 platinum hermit.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The contributor proof gate does not apply to this maintainer-authored PR; the body still reports pnpm run check and live smoke examples.

Label justifications:

  • P2: This is a normal-priority automation safety improvement with limited blast radius but important apply-lane consequences.
  • merge-risk: 🚨 automation: Merging this PR changes when apply/comment-sync installs and invokes Codex, which can affect automated close processing.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🌊 off-meta tidepool and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The contributor proof gate does not apply to this maintainer-authored PR; the body still reports pnpm run check and live smoke examples.
Evidence reviewed

What I checked:

  • Repository policy read: AGENTS.md was read in full; it says the apply lane closes only unchanged high-confidence proposals and specifically treats maintainer-authored items as non-closeable. (AGENTS.md:25, 666e707a8288)
  • Current main does not have this proof gate: Current main has no src/pr-close-coverage-proof.ts and no prCloseCoverageProof/retry action symbols under src, .github/workflows, or test, so the central PR work is not already implemented on main. (src/pr-close-coverage-proof.ts, 666e707a8288)
  • Main apply gate: The PR adds candidate hydration and a model-backed gate before duplicate/superseded PR close proposals can proceed in the main apply path. (src/clawsweeper.ts:10761, 2fdb0abd28d1)
  • Repair apply gate: The PR adds validatePrCloseCoverageProof to repair apply-result so repair close actions for PRs are blocked unless the covering PR is safe and the model proof says the source work is covered. (src/repair/apply-result.ts:786, 2fdb0abd28d1)
  • Workflow install path: The workflow now preselects apply work that may need Codex and runs setup-codex before the apply step when needed. (.github/workflows/sweep.yml:1827, 2fdb0abd28d1)
  • Proof runner token hygiene: The proof runner invokes Codex through codexEnv; current main codexEnv strips GitHub/App/OpenAI credentials and only re-adds an explicit ghToken when requested. (src/codex-env.ts:5, 666e707a8288)

Likely related people:

  • Peter Steinberger: Blame and log history show Peter introduced the stale PR close promotion and much of the workflow/apply structure this PR extends. (role: feature-history owner; confidence: high; commits: 97885cf4e291, 8f40cc4ad420, 6eed37e5bfc8; files: src/clawsweeper.ts, .github/workflows/sweep.yml, src/repair/workflow-utils.ts)
  • brokemac79: Recent commit f2ec021 added the unsafe replacement guard for superseded PR closes, directly adjacent to this PR's duplicate/superseded close safety gate. (role: recent adjacent contributor; confidence: high; commits: f2ec021eb55c; files: src/clawsweeper.ts)
  • Dallin Romney: Commit 50ca51d recently changed repair lane gating and concurrency, which is adjacent to the repair apply-result path touched here. (role: repair-lane contributor; confidence: medium; commits: 50ca51d85c23; files: src/repair/apply-result.ts, .github/workflows/sweep.yml)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a “PR close coverage proof” mechanism that uses a dedicated prompt+schema+runner to decide whether a duplicate/superseded PR can be safely auto-closed, and wires that gate into both the main apply flow and the repair apply flow.

Changes:

  • Add PR close coverage proof prompt + JSON schema, plus a runner/parser + unit tests.
  • Gate duplicate/superseded PR close proposals on coverage proof results (retry on transient failures; keep-open decisions become terminal for that apply attempt).
  • Update workflows and selection utilities to install/run Codex where needed and to re-select retryable proof-failure records.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/repair/workflow-utils.test.ts Extends repair workflow selection tests to include retryable proof-failure close proposals.
test/repair/apply-result.test.ts Adds repair apply integration tests covering proof allow/block/retry behavior and prompt hygiene.
test/pr-close-coverage-proof.test.ts Adds unit tests for proof parsing/normalization and prompt content checks.
test/clawsweeper.test.ts Expands apply-decisions tests and mocks to cover proof gating, retries, and workflow ordering.
src/repair/workflow-utils.ts Allows retryable proof-failure close records to be selected for repair apply.
src/repair/apply-result.ts Adds coverage-proof gating for repair PR close actions, including bounded comment hydration.
src/pr-close-coverage-proof.ts New module implementing prompt building, Codex invocation, output parsing, and decision normalization.
src/clawsweeper.ts Adds coverage-proof gating to apply-decisions for duplicate/superseded PR close proposals.
schema/clawsweeper-pr-close-coverage-proof.schema.json Defines the model output contract for coverage proof.
prompts/pr-close-coverage-proof.md Adds the proof-checker system prompt with strict decision rules.
.github/workflows/sweep.yml Ensures Codex is installed before apply paths that may need proof, with a preselect step.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/clawsweeper.test.ts
Comment on lines +9409 to +9412
const report = JSON.parse(readFileSync(reportPath, "utf8")) as Array<{
action: string;
reason: string;
}>;
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants