Skip to content

Add smart rebasing and conflict resolution to the finishing skill#1149

Open
csillag wants to merge 1 commit intoobra:mainfrom
csillag:csillag/parallel-session-drift-v2
Open

Add smart rebasing and conflict resolution to the finishing skill#1149
csillag wants to merge 1 commit intoobra:mainfrom
csillag:csillag/parallel-session-drift-v2

Conversation

@csillag
Copy link
Copy Markdown

@csillag csillag commented Apr 13, 2026

What problem are you trying to solve?

The finishing skill has no mechanism to detect that the base branch has changed since the feature branch was created. When main (or any base branch) moves during a long implementation session — from another user pushing, a teammate merging, a submodule being updated elsewhere, or simply another shell doing work on the same repo — the finishing skill merges blindly. The result can be:

  1. Silently stale merges: the spec's assumptions are invalidated (a function was renamed, a file was deleted, an API changed from sync to async), but the merge succeeds textually and tests pass. The merged state contains contradictions that aren't caught until much later.
  2. Merge conflicts with no structured guidance: when the feature branch and base branch touch overlapping files, the conflict surfaces at git merge time with no context about what the two sides disagree about or how to resolve it.
  3. Wrong base branch: the finishing skill hardcodes git merge-base HEAD main, so features branched from develop or other long-term branches are diffed against the wrong branch and merged to the wrong target.

All three failure modes were observed in end-to-end testing across 7 scenarios (details in Evaluation section).

What does this PR change?

Three files:

  • skills/brainstorming/SKILL.md: New step 1 detects the base branch (via default-branch list or interactive question) and records it plus the current HEAD commit in the spec as a **Base revision:** header. When invoked for drift recovery, inherits these values from the caller.
  • skills/finishing-a-development-branch/SKILL.md: Step 2 reads base branch and revision from the spec header instead of guessing. New Step 2.5 pre-checks the rebase (read-only), actually rebases when clean, dispatches a drift reviewer subagent with conflict context when not clean, and presents a conflict-aware routing menu (options 1 and 4 suppressed when rebase would conflict).
  • skills/finishing-a-development-branch/drift-reviewer.md (new file): Reviewer prompt template with a new section for handling rebase-conflict input — forced DRIFT_FOUND verdict, delta_plan not available, conflict analysis required.

Is this change appropriate for the core library?

Yes. Base-branch drift affects any user working on a feature branch while the base branch moves — this is not project-specific or domain-specific. The change touches two existing core skills (brainstorming, finishing) and adds one supporting file (drift-reviewer prompt template).

What alternatives did you consider?

  1. .superpowers-session.json metadata file (PR feat: worktree-first isolation with delta analysis for parallel session safety #997's approach): a separate JSON file written by brainstorming, read by finishing. Rejected because it introduced a file lifecycle (create, read, update, cleanup) with its own failure modes (file not found, stale, cleanup missed), and required different handling for worktree vs non-worktree cases. Embedding the same information in the spec document that already exists eliminates the file lifecycle entirely.

  2. Inline 3-level numeric escalation (PR feat: worktree-first isolation with delta analysis for parallel session safety #997's approach): the finishing skill evaluated drift inline using Level 0/1/2/3 categories. Rejected because drift evaluation is a judgment task that benefits from the most capable model; inline evaluation on a fast session model produced false negatives (scenario 4 in prior testing was classified Level 0 when it should have been Level 2). Subagent dispatch with the most capable model produces more reliable verdicts.

  3. Drift detection without rebase (the intermediate approach tested during development): detect drift via git diff merge-base..base-branch and route to fix flows, but never rebase the feature branch. This catches semantic drift but leaves ancestry divergence unresolved. In testing, this caused: (a) wasted reviewer re-dispatches — after a fix flow updates the spec and code, the merge-base remains frozen at the pre-drift commit, so the next finishing invocation sees the same diff and dispatches the reviewer again to re-confirm what was already fixed; (b) infinite symptom-chasing — when drift involves file deletions/additions, each finishing invocation discovers one more ancestry issue, and content-level patching never converges; (c) in one scenario, the reviewer broke its own structured output format to recommend "just rebase the branch" because no available routing option could express the actual fix. Adding the rebase step resolved all three issues.

Does this PR contain multiple unrelated changes?

No. All changes implement one capability: detect and handle base-branch drift during the finishing flow. The brainstorming change (recording base revision) exists solely to provide the data that finishing reads. The drift-reviewer change (conflict handling) exists solely to support the conflict path in finishing.

Existing PRs

Environment tested

Harness Harness version Model Model version/ID
Claude Code 1.0.33 Claude Sonnet 4.6 (finishing skill) claude-sonnet-4-6
Claude Code 1.0.33 Claude Opus 4.6 (drift reviewer subagent) claude-opus-4-6

Evaluation

Development followed RED-GREEN-REFACTOR per superpowers:writing-skills across four iterations, using two test suites (6 scenarios in iterations 1-3, expanded to 7 in iteration 4).

Iteration history

Original RED baseline (6 scenarios, current main v5.0.7, no drift detection):

  • 0/5 drift scenarios detected. Agent merged blindly in all cases.
  • 1/1 control scenario: merged cleanly (correct).

Iteration 1 GREEN — added Step 2.5 with drift-reviewer subagent dispatch:

  • 4/5 drift scenarios detected. Scenario 4 (rename) missed — reviewer shortcutted with 1 tool call and a one-sentence verdict.
  • Loophole identified: reviewer can return NO_DRIFT after minimal investigation.

Iteration 2 GREEN (REFACTOR) — hardened reviewer prompt with "err on caution," "minimum thoroughness" checklist, "documentation drift counts as drift":

  • 5/5 drift scenarios detected. Scenario 4 now caught.
  • Loophole identified: after DRIFT_FOUND, agent presented findings with open-ended "How would you like to proceed?" — no actionable routing.

Iteration 3 GREEN (REFACTOR) — added structured RECOMMENDED_ACTION and 5-option routing menu:

  • 5/5 drift scenarios detected with correct routing. All three recommendation types (delta_plan, spec_update, restart_brainstorming) exercised.
  • Loophole identified: the finishing skill never rebases the feature branch. After a routed fix flow completes, the merge-base is frozen at the pre-drift commit, causing wasted reviewer re-dispatches (scenarios 1, 2), infinite symptom-chasing (scenario 3 equivalent), and merge-time conflicts on overlapping files.

Iteration 4 — this PR. Added rebase pre-check, actual rebase, spec-embedded base revision header, conflict-aware routing. Expanded test suite to 7 scenarios (added scenario 7 for rebase conflicts) and added Case 3 coverage (feature branched from develop, not main).

Iteration 4 RED baseline (7 scenarios, iteration 3 skill with drift detection but no rebase):

Scenario Drift type Base branch Outcome
1 sync→async migration main 2 finishing invocations, wasted opus reviewer on 2nd pass
2 async→sync revert + no-async comment main 3 finishing invocations, reviewer broke structured format to recommend rebase
3 file deleted + inlined + new file develop Suspended after 3rd DRIFT_FOUND — infinite symptom-chasing loop. Also: base branch defaulted to main, required manual correction
4 function rename main 2 finishing invocations, wasted opus reviewer on 2nd pass
5 README only (control) develop Merged to wrong branch (main instead of develop)
6 feature already implemented main Correctly detected and discarded (1 invocation)
7 async→sync (overlapping files) main Merge conflict at git merge time in Step 4, no structured guidance

Iteration 4 GREEN (this PR):

Scenario Drift type Base branch Outcome
1 sync→async migration main Drift detected, spec updated, fast-forward merge
2 async→sync revert + no-async comment main Drift detected, brainstorming restarted, fast-forward merge
3 file deleted + inlined + new file develop Drift detected against correct base, brainstorming restarted, fast-forward merge to develop
4 function rename main Drift detected, spec updated, fast-forward merge
5 README only (control) develop NO_DRIFT against correct branch, merged to develop
6 feature already implemented main Duplication detected, branch discarded — no regression
7 async→sync (overlapping files) main Conflict detected at pre-check, reduced menu, spec-driven resolution, fast-forward merge

Cumulative improvement across all iterations

Metric Original RED Iter 1 Iter 2 Iter 3 Iter 4 (this PR)
Drift detected before merge 0/5 4/5 5/5 5/5 6/6 + 1 conflict
False positive (control) 0/1 0/1 0/1 0/1 0/1
Correct routing after detection n/a n/a n/a 5/5 7/7
Wasted reviewer re-dispatches n/a n/a n/a 3/6 0/7
Wrong base branch not tested not tested not tested 2/7 0/7
Merge-time conflict (no guidance) not tested not tested not tested 1/7 0/7
Fast-forward merge 0/6 0/6 0/6 0/6 6/6 (scenario 6 discarded)

Known limitations

Rigor

  • If this is a skills change: I used superpowers:writing-skills and completed adversarial pressure testing (paste results below)
  • This change was tested adversarially, not just on the happy path
  • I did not modify carefully-tuned content (Red Flags table, rationalizations, "human partner" language) without extensive evals showing the change is an improvement

Methodology: RED-GREEN-REFACTOR per superpowers:writing-skills across four iterations. Iterations 1-3 developed the drift-reviewer prompt and routing menu through three REFACTOR cycles (6 scenarios each, 18 total runs). Iteration 4 added the rebase step and spec header, validated against an expanded 7-scenario suite (7 RED + 7 GREEN runs). Total: 32 end-to-end scenario runs across all iterations.

All new Red Flags and Common Mistakes entries are grounded in observed failures, not hypothetical concerns. New "human partner" language was used consistently in all additions to the finishing skill. The drift-reviewer prompt template carries forward all iteration 1-3 additions (minimum thoroughness, err on caution, documentation drift) unchanged, adding only the conflict-handling section.

Adversarial coverage: scenario 7 exercises the conflict path (rebase pre-check fails, reduced menu, spec-driven conflict resolution). Scenario 3 exercises the most severe drift (file deletion + restructuring on a non-default base branch). Scenario 6 validates no regression on the duplication-detection path. Scenario 5 validates no false positive on the control case.

Human review

  • A human has reviewed the COMPLETE proposed diff before submission

Brainstorming now records the base branch and revision in the spec
header. Finishing reads it, pre-checks the rebase, rebases when clean,
and dispatches a drift reviewer before presenting merge options. When
rebase would conflict, conflict context is passed to the reviewer and
options 1/4 are suppressed from the routing menu.

Tested across 7 end-to-end scenarios over 4 RED-GREEN-REFACTOR
iterations (32 total runs). See PR description for full eval results.
@csillag csillag changed the title feat: add rebase-before-merge and drift detection to finishing skill Add smart rebasing and conflict resolution to the finishing skill Apr 13, 2026
@csillag
Copy link
Copy Markdown
Author

csillag commented Apr 13, 2026

Just one comment: several long hours of tedious human manual testing were spent to ensure that we rigorously comply with the methodology specified for contributors. Of course AI was involved at all stages, but on a very right leash, with constant monitoring and human corrections. Of course all smart AIs would also say this so the statement might not worth much, but this is not an AI slop PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant