Add smart rebasing and conflict resolution to the finishing skill#1149
Open
Add smart rebasing and conflict resolution to the finishing skill#1149
Conversation
Brainstorming now records the base branch and revision in the spec header. Finishing reads it, pre-checks the rebase, rebases when clean, and dispatches a drift reviewer before presenting merge options. When rebase would conflict, conflict context is passed to the reviewer and options 1/4 are suppressed from the routing menu. Tested across 7 end-to-end scenarios over 4 RED-GREEN-REFACTOR iterations (32 total runs). See PR description for full eval results.
5 tasks
Author
|
Just one comment: several long hours of tedious human manual testing were spent to ensure that we rigorously comply with the methodology specified for contributors. Of course AI was involved at all stages, but on a very right leash, with constant monitoring and human corrections. Of course all smart AIs would also say this so the statement might not worth much, but this is not an AI slop PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem are you trying to solve?
The finishing skill has no mechanism to detect that the base branch has changed since the feature branch was created. When
main(or any base branch) moves during a long implementation session — from another user pushing, a teammate merging, a submodule being updated elsewhere, or simply another shell doing work on the same repo — the finishing skill merges blindly. The result can be:git mergetime with no context about what the two sides disagree about or how to resolve it.git merge-base HEAD main, so features branched fromdevelopor other long-term branches are diffed against the wrong branch and merged to the wrong target.All three failure modes were observed in end-to-end testing across 7 scenarios (details in Evaluation section).
What does this PR change?
Three files:
skills/brainstorming/SKILL.md: New step 1 detects the base branch (via default-branch list or interactive question) and records it plus the current HEAD commit in the spec as a**Base revision:**header. When invoked for drift recovery, inherits these values from the caller.skills/finishing-a-development-branch/SKILL.md: Step 2 reads base branch and revision from the spec header instead of guessing. New Step 2.5 pre-checks the rebase (read-only), actually rebases when clean, dispatches a drift reviewer subagent with conflict context when not clean, and presents a conflict-aware routing menu (options 1 and 4 suppressed when rebase would conflict).skills/finishing-a-development-branch/drift-reviewer.md(new file): Reviewer prompt template with a new section for handling rebase-conflict input — forced DRIFT_FOUND verdict,delta_plannot available, conflict analysis required.Is this change appropriate for the core library?
Yes. Base-branch drift affects any user working on a feature branch while the base branch moves — this is not project-specific or domain-specific. The change touches two existing core skills (brainstorming, finishing) and adds one supporting file (drift-reviewer prompt template).
What alternatives did you consider?
.superpowers-session.jsonmetadata file (PR feat: worktree-first isolation with delta analysis for parallel session safety #997's approach): a separate JSON file written by brainstorming, read by finishing. Rejected because it introduced a file lifecycle (create, read, update, cleanup) with its own failure modes (file not found, stale, cleanup missed), and required different handling for worktree vs non-worktree cases. Embedding the same information in the spec document that already exists eliminates the file lifecycle entirely.Inline 3-level numeric escalation (PR feat: worktree-first isolation with delta analysis for parallel session safety #997's approach): the finishing skill evaluated drift inline using Level 0/1/2/3 categories. Rejected because drift evaluation is a judgment task that benefits from the most capable model; inline evaluation on a fast session model produced false negatives (scenario 4 in prior testing was classified Level 0 when it should have been Level 2). Subagent dispatch with the most capable model produces more reliable verdicts.
Drift detection without rebase (the intermediate approach tested during development): detect drift via
git diff merge-base..base-branchand route to fix flows, but never rebase the feature branch. This catches semantic drift but leaves ancestry divergence unresolved. In testing, this caused: (a) wasted reviewer re-dispatches — after a fix flow updates the spec and code, the merge-base remains frozen at the pre-drift commit, so the next finishing invocation sees the same diff and dispatches the reviewer again to re-confirm what was already fixed; (b) infinite symptom-chasing — when drift involves file deletions/additions, each finishing invocation discovers one more ancestry issue, and content-level patching never converges; (c) in one scenario, the reviewer broke its own structured output format to recommend "just rebase the branch" because no available routing option could express the actual fix. Adding the rebase step resolved all three issues.Does this PR contain multiple unrelated changes?
No. All changes implement one capability: detect and handle base-branch drift during the finishing flow. The brainstorming change (recording base revision) exists solely to provide the data that finishing reads. The drift-reviewer change (conflict handling) exists solely to support the conflict path in finishing.
Existing PRs
.superpowers-session.jsonand inline 3-level escalation. Closed by us for clean-room redesign after v5.0.6 introduced contributor guidelines and PR template requirements that the original submission did not follow.finishing-a-development-branchfor environment detection and cleanup ordering (bugs Inconsistent worktree cleanup for Option 2 in finishing-a-development-branch #940, finishing-a-development-branch Option 1: cleanup sequence fails inside worktree #999, fix: Worktree cleanup fails when working directory is inside worktree #238, Subagent-driven development should not auto-create worktrees without user consent #991). Does NOT include drift detection. Our changes add Step 2 and Step 2.5 before the options menu; adjust worktree handling and defer to harness tools when avail (PRI-974) #1121 rewrites Steps 3-6 (options, execution, cleanup). The two PRs modify the same file but address orthogonal concerns. If adjust worktree handling and defer to harness tools when avail (PRI-974) #1121 lands first, this PR will need to be rebased and retested against the new baseline.Environment tested
Evaluation
Development followed RED-GREEN-REFACTOR per
superpowers:writing-skillsacross four iterations, using two test suites (6 scenarios in iterations 1-3, expanded to 7 in iteration 4).Iteration history
Original RED baseline (6 scenarios, current main v5.0.7, no drift detection):
Iteration 1 GREEN — added Step 2.5 with drift-reviewer subagent dispatch:
Iteration 2 GREEN (REFACTOR) — hardened reviewer prompt with "err on caution," "minimum thoroughness" checklist, "documentation drift counts as drift":
Iteration 3 GREEN (REFACTOR) — added structured
RECOMMENDED_ACTIONand 5-option routing menu:delta_plan,spec_update,restart_brainstorming) exercised.Iteration 4 — this PR. Added rebase pre-check, actual rebase, spec-embedded base revision header, conflict-aware routing. Expanded test suite to 7 scenarios (added scenario 7 for rebase conflicts) and added Case 3 coverage (feature branched from
develop, notmain).Iteration 4 RED baseline (7 scenarios, iteration 3 skill with drift detection but no rebase):
git mergetime in Step 4, no structured guidanceIteration 4 GREEN (this PR):
Cumulative improvement across all iterations
Known limitations
mainordevelopwill commit directly there. This is a pre-existing upstream issue (see Subagent-driven development should not auto-create worktrees without user consent #991, fix(skills): restore worktree step in brainstorming workflow #675, fix: restore worktree creation step in brainstorming skill #829, adjust worktree handling and defer to harness tools when avail (PRI-974) #1121) — not introduced or addressed by this PR. Our test harness works around it by creating the feature branch in the launch script.Rigor
superpowers:writing-skillsand completed adversarial pressure testing (paste results below)Methodology: RED-GREEN-REFACTOR per
superpowers:writing-skillsacross four iterations. Iterations 1-3 developed the drift-reviewer prompt and routing menu through three REFACTOR cycles (6 scenarios each, 18 total runs). Iteration 4 added the rebase step and spec header, validated against an expanded 7-scenario suite (7 RED + 7 GREEN runs). Total: 32 end-to-end scenario runs across all iterations.All new Red Flags and Common Mistakes entries are grounded in observed failures, not hypothetical concerns. New "human partner" language was used consistently in all additions to the finishing skill. The drift-reviewer prompt template carries forward all iteration 1-3 additions (minimum thoroughness, err on caution, documentation drift) unchanged, adding only the conflict-handling section.
Adversarial coverage: scenario 7 exercises the conflict path (rebase pre-check fails, reduced menu, spec-driven conflict resolution). Scenario 3 exercises the most severe drift (file deletion + restructuring on a non-default base branch). Scenario 6 validates no regression on the duplication-detection path. Scenario 5 validates no false positive on the control case.
Human review