Skip to content

Require proof before superseded PR closeout#196

Open
jesse-merhi wants to merge 1 commit into
mainfrom
jesse/pr-supersession-proof
Open

Require proof before superseded PR closeout#196
jesse-merhi wants to merge 1 commit into
mainfrom
jesse/pr-supersession-proof

Conversation

@jesse-merhi
Copy link
Copy Markdown
Member

Summary

  • Require hydrated PR A / PR B proof before normal apply can close PR A as superseded.
  • Add compact shared supersession proof helpers, prompt templates, and schema-backed model output checks.
  • Gate repair replacement closeout on source PR security/context checks and fall back to link-only when proof is missing or blocked.

Validation

  • fnm exec --using v24.14.1 pnpm run format
  • fnm exec --using v24.14.1 pnpm run build:all
  • fnm exec --using v24.14.1 node --test --test-name-pattern 'PRs superseded by linked pull requests|linked supersession proof is incomplete|same-file supersession is incomplete|security-labeled PRs open|comments contain security markers|supersession proof prompt' test/clawsweeper.test.ts
  • fnm exec --using v24.14.1 node --test test/repair/execute-fix-artifact-source.test.ts test/repair/execute-fix-github.test.ts
  • fnm exec --using v24.14.1 pnpm run check
  • git diff --check --cached

Copilot AI review requested due to automatic review settings May 25, 2026 06:10
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 25, 2026

Codex review: needs maintainer review before merge. Reviewed May 25, 2026, 2:14 AM ET / 06:14 UTC.

Summary
The PR adds model-backed supersession proof prompts, schema, shared helpers, and apply/repair gates before ClawSweeper closes source PRs as superseded.

Reproducibility: yes. by source inspection. Current main can promote linked supersession closeout from durable report links without hydrating and comparing PR A and PR B; I did not run a live apply repro.

Review metrics: 2 noteworthy metrics.

  • Changed surface: 10 files, +1612/-35. The PR touches both normal apply close promotion and repair replacement closeout, so maintainers should review the integrated automation behavior.
  • Proof contract added: 2 prompts, 1 schema, 1 shared helper module. These files define the model contract that decides whether a source PR can be closed as superseded.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🌊 off-meta tidepool
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Resolve the dirty merge state and rerun the declared Node 24 validation commands.

Risk before merge

  • GitHub reports this PR as dirty, so maintainers need to resolve conflicts and review the final merge result before landing.
  • The PR changes live apply/repair closeout automation and adds a model-backed proof subprocess; maintainers should confirm the fail-closed/link-only fallback is the intended operator behavior.

Maintainer options:

  1. Refresh Before Merge (recommended)
    Resolve the dirty merge state, rerun the declared Node 24 checks, and review the final apply/repair closeout behavior on the merged result.
  2. Accept Link-Only Fallback
    Maintainers can accept that missing or blocked proof will keep source PRs open with a replacement link instead of allowing superseded closeout.

Next step before merge
Maintainer review should resolve the dirty branch and decide whether the new model-backed proof gate is acceptable for live closeout automation.

Security
Cleared: No concrete security or supply-chain concern found; the diff adds no dependencies or workflow permission changes and routes security-sensitive source PRs away from auto-close.

Review details

Best possible solution:

Resolve the dirty merge state, rerun the declared Node 24 checks, and land the proof-gated closeout path if maintainers accept the added model proof step.

Do we have a high-confidence way to reproduce the issue?

Yes by source inspection. Current main can promote linked supersession closeout from durable report links without hydrating and comparing PR A and PR B; I did not run a live apply repro.

Is this the best way to solve the issue?

Yes, the proposed fail-closed proof gate is a conservative way to solve the closeout safety issue, but the dirty merge state must be resolved before the final implementation can be trusted.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 77e35f078a76.

Label changes

Label changes:

  • add P2: This is meaningful automation safety hardening with a bounded apply/repair-lane blast radius.
  • add merge-risk: 🚨 automation: Merging changes live closeout automation and adds a model proof subprocess that unit tests cannot fully validate against production GitHub state.
  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🌊 off-meta tidepool and patch quality is 🐚 platinum hermit.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: Real behavior proof is not a contributor gate here because the PR author association is MEMBER.

Label justifications:

  • P2: This is meaningful automation safety hardening with a bounded apply/repair-lane blast radius.
  • merge-risk: 🚨 automation: Merging changes live closeout automation and adds a model proof subprocess that unit tests cannot fully validate against production GitHub state.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🌊 off-meta tidepool and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: Real behavior proof is not a contributor gate here because the PR author association is MEMBER.
Evidence reviewed

What I checked:

Likely related people:

  • Peter Steinberger: Git blame and git log show the current stale PR close-promotion lane and repair replacement closeout helpers were introduced in recent commits by Peter Steinberger. (role: introduced behavior and recent area contributor; confidence: high; commits: 97885cf4e291, 4ea930af2074; files: src/clawsweeper.ts, src/repair/execute-fix-artifact.ts, src/repair/execute-fix-github.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens the “superseded PR” auto-close flow by requiring explicit, schema-validated “supersession proof” (plus security/context gating) before ClawSweeper can close a source PR as superseded by a linked replacement PR.

Changes:

  • Adds shared supersession proof utilities + schema parsing/normalization to ensure “superseded” decisions have concrete coverage evidence.
  • Updates apply/repair flows to hydrate PR context (labels/comments/files/body) and gate closeout on security signals and model proof; falls back to link-only when blocked or incomplete.
  • Expands tests and mocks to cover proof-required behavior, security gating, and prompt/content constraints.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/supersession-proof.ts New shared helpers for compact PR views + schema-backed parsing/normalization of supersession proof output.
src/clawsweeper.ts Adds supersession-proof runtime/prompt/model execution and blocks “superseded close” promotions unless proof allows it.
src/repair/execute-fix-github.ts Hydrates PR view with labels/comments/files/body and introduces a security gate helper for source PRs.
src/repair/execute-fix-artifact.ts Adds replacement closeout proof runner and enforces security + proof checks before gh pr close; otherwise link-only.
schema/clawsweeper-supersession-proof.schema.json New JSON schema for model output validation.
prompts/supersession-proof.md New prompt template for general supersession proof in apply flow.
prompts/repair/replacement-closeout-proof.md New prompt template for repair replacement closeout proof.
test/clawsweeper.test.ts Adds mocks + tests covering proof-required close behavior and security/incomplete-proof keep-open behavior.
test/repair/execute-fix-github.test.ts Adds unit tests for the source PR security gating helper.
test/repair/execute-fix-artifact-source.test.ts Adds source-level regression tests ensuring security/proof checks precede closeout calls and prompt changes are in place.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 34 to +54
export function fetchSourcePullRequestView({
repo,
number,
targetDir,
}: {
repo: string;
number: JsonValue;
targetDir: string;
}): LooseRecord {
return JSON.parse(
run(
"gh",
["pr", "view", String(number), "--repo", repo, "--json", "author,state,mergedAt,title,url"],
[
"pr",
"view",
String(number),
"--repo",
repo,
"--json",
"author,state,mergedAt,title,url,body,labels,comments,files,headRefOid,updatedAt",
],
Comment on lines +1840 to +1841
} catch {
return { close: false, reason: "replacement closeout proof output was invalid JSON" };
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 25, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 25, 2026

ClawSweeper PR egg

✨ Hatched: 🥚 common Tiny Diff Drake

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: sparkles near resolved comments.
Image traits: location artifact grotto; accessory CI status badge; palette pearl, teal, and neon green; mood proud; pose leaning over a miniature review desk; shell soft speckled shell; lighting warm desk-lamp glow; background delicate sparkle particles.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Tiny Diff Drake in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@nxmxbbd
Copy link
Copy Markdown

nxmxbbd commented May 25, 2026

Adding evidence that the supersession-proof gate in this PR would catch a real
case beyond the closed-unmerged replacement scenario.

openclaw/openclaw#86590 was auto-closed on 2026-05-25 as
duplicate_or_superseded by the merged #85063, but current main still lacks
the follow-up surfaces #86590 was adding:

  • SessionMcpRuntime.getOmittedServers() and the OmittedMcpServerReason enum
  • connect-timeout and list-tools-failed omitted-server reasons on
    materialized tools
  • the active-lease rebind guard message
    bundle-mcp runtime busy; cannot rebind workspace/config while active
  • compact's managed-runtime reuse via getOrCreateSessionMcpRuntime(...) in
    src/agents/pi-embedded-runner/compact.ts
  • the compact runtime-context omit for empty sandboxSessionKey

The ClawSweeper review body on #86590 explicitly said "Keep open for maintainer
review: ... sufficient after-fix proof and no blocking code-review findings"
and emitted <!-- clawsweeper-verdict:needs-human ... confidence=high -->,
but the close-applied marker still fired about eight minutes later. Same
family as openclaw/openclaw#86006 (verdict-vs-apply contract gap) and
#197 (supersede-by-closed-unmerged), except here PR B
(#85063) is merged and PR A (#86590) has unique uncovered work.

If the supersession-proof gate in this PR is fed PR A's file paths and PR B's
merged squash (07f500aa562d — bounded tools/list discovery only),
uniqueSourceWork should be non-empty and the decision should land as
keep_open, so this PR is likely to prevent the #86590-style close in
practice.

One question: is the marker-contract layer (apply lane gated independently on
clawsweeper-verdict:needs-human) also in scope here, or planned separately?
The model proof gate looks robust; a static marker check would be a cheap
second layer against future review-vs-apply drift.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants