Add smart rebasing and conflict resolution to the finishing skill by csillag · Pull Request #1149 · obra/superpowers

csillag · 2026-04-13T12:16:35Z

What problem are you trying to solve?

The finishing skill has no mechanism to detect that the base branch has changed since the feature branch was created. When main (or any base branch) moves during a long implementation session — from another user pushing, a teammate merging, a submodule being updated elsewhere, or simply another shell doing work on the same repo — the finishing skill merges blindly. The result can be:

Silently stale merges: the spec's assumptions are invalidated (a function was renamed, a file was deleted, an API changed from sync to async), but the merge succeeds textually and tests pass. The merged state contains contradictions that aren't caught until much later.
Merge conflicts with no structured guidance: when the feature branch and base branch touch overlapping files, the conflict surfaces at git merge time with no context about what the two sides disagree about or how to resolve it.
Wrong base branch: the finishing skill hardcodes git merge-base HEAD main, so features branched from develop or other long-term branches are diffed against the wrong branch and merged to the wrong target.

All three failure modes were observed in end-to-end testing across 7 scenarios (details in Evaluation section).

What does this PR change?

Three files:

skills/brainstorming/SKILL.md: New step 1 detects the base branch (via default-branch list or interactive question) and records it plus the current HEAD commit in the spec as a **Base revision:** header. When invoked for drift recovery, inherits these values from the caller.
skills/finishing-a-development-branch/SKILL.md: Step 2 reads base branch and revision from the spec header instead of guessing. New Step 2.5 pre-checks the rebase (read-only), actually rebases when clean, dispatches a drift reviewer subagent with conflict context when not clean, and presents a conflict-aware routing menu (options 1 and 4 suppressed when rebase would conflict).
skills/finishing-a-development-branch/drift-reviewer.md (new file): Reviewer prompt template with a new section for handling rebase-conflict input — forced DRIFT_FOUND verdict, delta_plan not available, conflict analysis required.

Is this change appropriate for the core library?

Yes. Base-branch drift affects any user working on a feature branch while the base branch moves — this is not project-specific or domain-specific. The change touches two existing core skills (brainstorming, finishing) and adds one supporting file (drift-reviewer prompt template).

What alternatives did you consider?

.superpowers-session.json metadata file (PR feat: worktree-first isolation with delta analysis for parallel session safety #997's approach): a separate JSON file written by brainstorming, read by finishing. Rejected because it introduced a file lifecycle (create, read, update, cleanup) with its own failure modes (file not found, stale, cleanup missed), and required different handling for worktree vs non-worktree cases. Embedding the same information in the spec document that already exists eliminates the file lifecycle entirely.
Inline 3-level numeric escalation (PR feat: worktree-first isolation with delta analysis for parallel session safety #997's approach): the finishing skill evaluated drift inline using Level 0/1/2/3 categories. Rejected because drift evaluation is a judgment task that benefits from the most capable model; inline evaluation on a fast session model produced false negatives (scenario 4 in prior testing was classified Level 0 when it should have been Level 2). Subagent dispatch with the most capable model produces more reliable verdicts.
Drift detection without rebase (the intermediate approach tested during development): detect drift via git diff merge-base..base-branch and route to fix flows, but never rebase the feature branch. This catches semantic drift but leaves ancestry divergence unresolved. In testing, this caused: (a) wasted reviewer re-dispatches — after a fix flow updates the spec and code, the merge-base remains frozen at the pre-drift commit, so the next finishing invocation sees the same diff and dispatches the reviewer again to re-confirm what was already fixed; (b) infinite symptom-chasing — when drift involves file deletions/additions, each finishing invocation discovers one more ancestry issue, and content-level patching never converges; (c) in one scenario, the reviewer broke its own structured output format to recommend "just rebase the branch" because no available routing option could express the actual fix. Adding the rebase step resolved all three issues.

Does this PR contain multiple unrelated changes?

No. All changes implement one capability: detect and handle base-branch drift during the finishing flow. The brainstorming change (recording base revision) exists solely to provide the data that finishing reads. The drift-reviewer change (conflict handling) exists solely to support the conflict path in finishing.

Existing PRs

I have reviewed all open AND closed PRs for duplicates or prior art
Related PRs:
- feat: worktree-first isolation with delta analysis for parallel session safety #997 (open, ours) — earlier attempt at the same problem. Used .superpowers-session.json and inline 3-level escalation. Closed by us for clean-room redesign after v5.0.6 introduced contributor guidelines and PR template requirements that the original submission did not follow.
- fix: worktree-aware merge and correct cleanup sequence in finishing skill #1000 (closed, ours) — companion PR to feat: worktree-first isolation with delta analysis for parallel session safety #997 for brainstorming changes. Also closed for clean-room redesign.
- Skill chain assumes frozen codebase — parallel sessions cause spec/plan staleness #989 (closed) — the original issue. Maintainer closed it reading the scope as "multiple agents in the same worktree." We posted a follow-up comment reframing as "base branch moves from any source during a long session" — no response yet.
- adjust worktree handling and defer to harness tools when avail (PRI-974) #1121 (open, arittr) — worktree rototill rewriting finishing-a-development-branch for environment detection and cleanup ordering (bugs Inconsistent worktree cleanup for Option 2 in finishing-a-development-branch #940, finishing-a-development-branch Option 1: cleanup sequence fails inside worktree #999, fix: Worktree cleanup fails when working directory is inside worktree #238, Subagent-driven development should not auto-create worktrees without user consent #991). Does NOT include drift detection. Our changes add Step 2 and Step 2.5 before the options menu; adjust worktree handling and defer to harness tools when avail (PRI-974) #1121 rewrites Steps 3-6 (options, execution, cleanup). The two PRs modify the same file but address orthogonal concerns. If adjust worktree handling and defer to harness tools when avail (PRI-974) #1121 lands first, this PR will need to be rebased and retested against the new baseline.

Environment tested

Harness	Harness version	Model	Model version/ID
Claude Code	1.0.33	Claude Sonnet 4.6 (finishing skill)	claude-sonnet-4-6
Claude Code	1.0.33	Claude Opus 4.6 (drift reviewer subagent)	claude-opus-4-6

Evaluation

Development followed RED-GREEN-REFACTOR per superpowers:writing-skills across four iterations, using two test suites (6 scenarios in iterations 1-3, expanded to 7 in iteration 4).

Iteration history

Original RED baseline (6 scenarios, current main v5.0.7, no drift detection):

0/5 drift scenarios detected. Agent merged blindly in all cases.
1/1 control scenario: merged cleanly (correct).

Iteration 1 GREEN — added Step 2.5 with drift-reviewer subagent dispatch:

4/5 drift scenarios detected. Scenario 4 (rename) missed — reviewer shortcutted with 1 tool call and a one-sentence verdict.
Loophole identified: reviewer can return NO_DRIFT after minimal investigation.

Iteration 2 GREEN (REFACTOR) — hardened reviewer prompt with "err on caution," "minimum thoroughness" checklist, "documentation drift counts as drift":

5/5 drift scenarios detected. Scenario 4 now caught.
Loophole identified: after DRIFT_FOUND, agent presented findings with open-ended "How would you like to proceed?" — no actionable routing.

Iteration 3 GREEN (REFACTOR) — added structured RECOMMENDED_ACTION and 5-option routing menu:

5/5 drift scenarios detected with correct routing. All three recommendation types (delta_plan, spec_update, restart_brainstorming) exercised.
Loophole identified: the finishing skill never rebases the feature branch. After a routed fix flow completes, the merge-base is frozen at the pre-drift commit, causing wasted reviewer re-dispatches (scenarios 1, 2), infinite symptom-chasing (scenario 3 equivalent), and merge-time conflicts on overlapping files.

Iteration 4 — this PR. Added rebase pre-check, actual rebase, spec-embedded base revision header, conflict-aware routing. Expanded test suite to 7 scenarios (added scenario 7 for rebase conflicts) and added Case 3 coverage (feature branched from develop, not main).

Iteration 4 RED baseline (7 scenarios, iteration 3 skill with drift detection but no rebase):

Scenario	Drift type	Base branch	Outcome
1	sync→async migration	main	2 finishing invocations, wasted opus reviewer on 2nd pass
2	async→sync revert + no-async comment	main	3 finishing invocations, reviewer broke structured format to recommend rebase
3	file deleted + inlined + new file	develop	Suspended after 3rd DRIFT_FOUND — infinite symptom-chasing loop. Also: base branch defaulted to main, required manual correction
4	function rename	main	2 finishing invocations, wasted opus reviewer on 2nd pass
5	README only (control)	develop	Merged to wrong branch (main instead of develop)
6	feature already implemented	main	Correctly detected and discarded (1 invocation)
7	async→sync (overlapping files)	main	Merge conflict at `git merge` time in Step 4, no structured guidance

Iteration 4 GREEN (this PR):

Scenario	Drift type	Base branch	Outcome
1	sync→async migration	main	Drift detected, spec updated, fast-forward merge
2	async→sync revert + no-async comment	main	Drift detected, brainstorming restarted, fast-forward merge
3	file deleted + inlined + new file	develop	Drift detected against correct base, brainstorming restarted, fast-forward merge to develop
4	function rename	main	Drift detected, spec updated, fast-forward merge
5	README only (control)	develop	NO_DRIFT against correct branch, merged to develop
6	feature already implemented	main	Duplication detected, branch discarded — no regression
7	async→sync (overlapping files)	main	Conflict detected at pre-check, reduced menu, spec-driven resolution, fast-forward merge

Cumulative improvement across all iterations

Metric	Original RED	Iter 1	Iter 2	Iter 3	Iter 4 (this PR)
Drift detected before merge	0/5	4/5	5/5	5/5	6/6 + 1 conflict
False positive (control)	0/1	0/1	0/1	0/1	0/1
Correct routing after detection	n/a	n/a	n/a	5/5	7/7
Wasted reviewer re-dispatches	n/a	n/a	n/a	3/6	0/7
Wrong base branch	not tested	not tested	not tested	2/7	0/7
Merge-time conflict (no guidance)	not tested	not tested	not tested	1/7	0/7
Fast-forward merge	0/6	0/6	0/6	0/6	6/6 (scenario 6 discarded)

Known limitations

No feature branch creation: brainstorming records the base branch but does not create a feature branch. Work starting on main or develop will commit directly there. This is a pre-existing upstream issue (see Subagent-driven development should not auto-create worktrees without user consent #991, fix(skills): restore worktree step in brainstorming workflow #675, fix: restore worktree creation step in brainstorming skill #829, adjust worktree handling and defer to harness tools when avail (PRI-974) #1121) — not introduced or addressed by this PR. Our test harness works around it by creating the feature branch in the launch script.
Single harness tested: only Claude Code. The skills are harness-agnostic (plain git commands), but we have not validated on Codex, Cursor, or Gemini CLI.

Rigor

If this is a skills change: I used superpowers:writing-skills and completed adversarial pressure testing (paste results below)
This change was tested adversarially, not just on the happy path
I did not modify carefully-tuned content (Red Flags table, rationalizations, "human partner" language) without extensive evals showing the change is an improvement

Methodology: RED-GREEN-REFACTOR per superpowers:writing-skills across four iterations. Iterations 1-3 developed the drift-reviewer prompt and routing menu through three REFACTOR cycles (6 scenarios each, 18 total runs). Iteration 4 added the rebase step and spec header, validated against an expanded 7-scenario suite (7 RED + 7 GREEN runs). Total: 32 end-to-end scenario runs across all iterations.

All new Red Flags and Common Mistakes entries are grounded in observed failures, not hypothetical concerns. New "human partner" language was used consistently in all additions to the finishing skill. The drift-reviewer prompt template carries forward all iteration 1-3 additions (minimum thoroughness, err on caution, documentation drift) unchanged, adding only the conflict-handling section.

Adversarial coverage: scenario 7 exercises the conflict path (rebase pre-check fails, reduced menu, spec-driven conflict resolution). Scenario 3 exercises the most severe drift (file deletion + restructuring on a non-default base branch). Scenario 6 validates no regression on the duplication-detection path. Scenario 5 validates no false positive on the control case.

Human review

A human has reviewed the COMPLETE proposed diff before submission

Brainstorming now records the base branch and revision in the spec header. Finishing reads it, pre-checks the rebase, rebases when clean, and dispatches a drift reviewer before presenting merge options. When rebase would conflict, conflict context is passed to the reviewer and options 1/4 are suppressed from the routing menu. Tested across 7 end-to-end scenarios over 4 RED-GREEN-REFACTOR iterations (32 total runs). See PR description for full eval results.

csillag · 2026-04-13T12:20:59Z

Just one comment: several long hours of tedious human manual testing were spent to ensure that we rigorously comply with the methodology specified for contributors. Of course AI was involved at all stages, but on a very right leash, with constant monitoring and human corrections. Of course all smart AIs would also say this so the statement might not worth much, but this is not an AI slop PR.

csillag mentioned this pull request Apr 13, 2026

feat: worktree-first isolation with delta analysis for parallel session safety #997

Closed

5 tasks

csillag changed the title ~~feat: add rebase-before-merge and drift detection to finishing skill~~ Add smart rebasing and conflict resolution to the finishing skill Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add smart rebasing and conflict resolution to the finishing skill#1149

Add smart rebasing and conflict resolution to the finishing skill#1149
csillag wants to merge 1 commit intoobra:mainfrom
csillag:csillag/parallel-session-drift-v2

csillag commented Apr 13, 2026

Uh oh!

csillag commented Apr 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

csillag commented Apr 13, 2026

What problem are you trying to solve?

What does this PR change?

Is this change appropriate for the core library?

What alternatives did you consider?

Does this PR contain multiple unrelated changes?

Existing PRs

Environment tested

Evaluation

Iteration history

Iteration 4 RED baseline (7 scenarios, iteration 3 skill with drift detection but no rebase):

Iteration 4 GREEN (this PR):

Cumulative improvement across all iterations

Known limitations

Rigor

Human review

Uh oh!

csillag commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

csillag commented Apr 13, 2026 •

edited

Loading