Skip to content

fix(remove): atomic CAS branch deletion unifies safe-delete semantics#2903

Open
max-sixty wants to merge 8 commits into
mainfrom
agent-a3bad31af2bb71c8e-cas
Open

fix(remove): atomic CAS branch deletion unifies safe-delete semantics#2903
max-sixty wants to merge 8 commits into
mainfrom
agent-a3bad31af2bb71c8e-cas

Conversation

@max-sixty
Copy link
Copy Markdown
Owner

@max-sixty max-sixty commented May 25, 2026

Three follow-ups to #2870, combined. The diff includes the three commits
also in #2900 — that PR is the small/conservative path; this one (#2903)
is the same fixes plus the larger CAS-based unification of safe-delete
semantics. Merge one or the other, not both. If you want the
incremental path, merge #2900 first and rebase this one — the CAS
commit will then stand alone.

Item 1 — drop the defensive check_lock (refactor)

The check_lock RwLock<()> in step_prune was carried over from
#2808's Windows .git/config race fix. After #2870 restructured the
flow, the phase ordering already serializes integration-check readers
against removals: the background par_iter is the only reader, the
for ... in rx loop exits only once that thread has finished, and
removals run strictly after. Belt-and-suspenders per CLAUDE.md — removed.

Item 2 — warn when background branch deletion silently retains (fix)

Both branch-deletion paths in execute_instant_removal_or_fallback
swallowed errors at log::debug!. Promoted the surprise
Ok(NotDeleted) (planner predicted deletion, branch survived — a
hook moved the tip) to a warning_message with a
wt remove --force-delete <branch> recovery hint. Suppressed when the
planner already predicted retention so the existing print_hints
unmerged-branch message doesn't duplicate. Err failures now log at
warn level (developer-facing) rather than user-actionable warnings —
the failure modes (git update-ref exec error, refs DB I/O) aren't
actionable beyond re-running.

Item 3 / CAS rewrite — atomic safe-delete (the big one)

delete_branch_if_safe now uses git update-ref -d refs/heads/<branch> <expected-sha> instead of git branch -D. The expected SHA comes from
the snapshot the integration check already consulted, so the
check→delete sequence becomes one atomic step against a known SHA. If
anything moved the ref in between, git rejects the CAS — the new
BranchDeletionOutcome::RetainedRaced outcome is returned and surfaced
to the user. Force-delete keeps git branch -D (explicit override).

This unifies the divergent safe-delete semantics in
execute_instant_removal_or_fallback:

  • Fast path + SynchronousForNonCurrent fallback routed through
    delete_branch_if_safe, so they pick up CAS for free.
  • Detached fallback (rename failed AND current worktree): the shell
    && git branch -d <branch> is replaced by a foreground integration
    check + atomic && git update-ref -d <ref> <expected-sha> tail
    (build_cas_branch_delete_tail). Squash-merged / patch-id / ancestor
    branches the planner accepts are now accepted at delete time too,
    matching the fast path.
  • Branch-only deletion (handle_branch_only_output) switches from
    bare git branch -D (effectively a force-delete after a stale
    integration check) to delete_branch_if_safe. CAS protects it too.

Race coverage

CAS closes the TOCTOU window inside delete_branch_if_safe (between
its own capture_refs and update-ref -d). Hook-driven races
continue to be caught by the existing fresh integration check that
runs after pre-remove hooks; CAS narrows the residual window from
"between check and delete" to "never".

Tests

Two new unit tests in git::remove::tests:

  • cas_rejects_delete_when_branch_advances — captures snapshot, moves
    branch externally, calls delete_branch_if_safe, asserts
    RetainedRaced and that the branch survives at the post-race SHA.
    Concretely pins the CAS rejection mechanism per the user's
    suggestion (rather than just the observable "branch still there"
    outcome).
  • cas_deletes_when_branch_unchanged — sanity check the common case.

test_prune_fallback_config_race_canary's wrapper-git script picks up
the new update-ref -d refs/heads/<branch> shape.

cargo run -- hook pre-merge --yes passes locally: 3836 nextest tests

  • 2 new CAS unit tests + doctests + pre-commit + clippy + docs.

Test plan

  • CAS error-discrimination in cas_delete_branch_outcome
    re-checking ref presence via rev-parse --verify --quiet exit code
    rather than parsing the error message (per CLAUDE.md "structured
    output over error-message parsing").
  • build_cas_branch_delete_tail shell construction — uses
    shell_escape::unix::escape for both the ref name and the SHA.
  • Consider whether RetainedRaced warrants its own message in
    print_hints — currently the BranchOnly path emits a dedicated
    warning; the worktree-removal paths fall through
    handle_branch_deletion_result's unmerged-hint path.

This was written by Claude Code on behalf of Maximilian Roos

max-sixty and others added 2 commits May 25, 2026 14:28
The `check_lock` in `step_prune` was carried over from #2808's
Windows `.git/config` race fix. After #2870 restructured the flow so
hook approval is exact, the phase ordering already serializes
integration-check readers against the rename-failure fallback's
`.git/config` rewrite: the background par_iter is the only reader, and
the channel-drain loop (`for (idx, result) in rx`) only exits once that
thread has finished. Removals run strictly after.

That made the lock belt-and-suspenders — one mechanism per guarantee,
per CLAUDE.md. Drop the `RwLock<()>`, the `Arc` wrapping it, the
read-guard in the par_iter worker, the write-guard at the top of
`try_remove`, and the `check_lock` field on `RemovalContext`. Comments
on the par_iter spawn and the `SynchronousForNonCurrent` variant now
name the real invariants (phase ordering, summary-accurate iteration)
rather than the dropped lock.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e branch

Both branch-deletion paths in `execute_instant_removal_or_fallback`
swallowed errors at `log::debug!` — invisible in default verbosity.
When the worktree is already gone but the branch survives (a hook
advanced it, a `git branch -D` exec failed, refs scan errored), the
user got no signal that the second half of the operation didn't happen.

Promote both to a `warning_message` with a `wt remove --force-delete <branch>`
recovery hint. To avoid duplicating the planner's existing "Branch
unmerged" hint, the warning suppresses the `Ok(NotDeleted)` case when
the planner already predicted retention (unmerged, or
`--no-delete-branch`); race-driven `NotDeleted` outcomes still warn.
`Err` always warns, since the prior `log::debug!` was invisible.

`BackgroundRemoval` carries a new `planner_expected_retention` bit so
the helper can make this distinction without re-deriving planner state.

This adds noise to a previously silent path — the tradeoff is that
silent half-completion of a destructive operation is worse than an
extra warning line on the rare race.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@worktrunk-bot worktrunk-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CAS approach unifies the four SafeDelete paths cleanly: one library helper (delete_branch_if_safe) for fast/synchronous/branch-only, plus a CAS-tail builder for the Detached fallback. The update-ref -d <ref> <oid> primitive picks up IntegrationReason semantics for free across all paths — the description of the prior && git branch -d <branch> shell tail silently losing squash-merged/patch-id matches is accurate. The race-rejection discrimination (try update-ref -d, then rev-parse --verify --quiet to distinguish "ref still present" from "real error") is the right shape per the structured-output rule.

The only gap is the one you raised in test plan item 3, and it shows up the most on handle_named_removed_worktree_foreground. In that path output.branch_result carries Ok(RetainedRaced) straight through RemovalDisplayInfo::from_branch_resulthandle_branch_deletion_result, which sets show_unmerged_hint = true. RemovalDisplayInfo::print_message then prints "Removed feature worktree" with no mention that the branch was kept (since branch_deleted() is false), and print_hints falls into the self.show_unmerged_hint || … arm and emits "Branch unmerged; to delete, run wt remove -D". The branch wasn't unmerged from the planner's view — it was integrated and then moved — so that hint mis-attributes the cause.

The other three paths handle this correctly: the rename-into-trash fast path and SynchronousForNonCurrent go through warn_if_branch_retained (handlers.rs:311), and handle_branch_only_output has its own dedicated branch (handlers.rs:1123). The detached fallback can't surface it (CAS happens in the detached child), so foreground is the one missing case.

Cleanest fix is probably either: (1) split show_unmerged_hint in BranchDeletionDisplay from a show_race_warning flag for RetainedRaced, then have print_hints emit the dedicated "moved during deletion" message; or (2) keep the field but add a RetainedRaced early branch in print_hints that prints the warning before falling through.

Nit: assert_ne!(original_sha, advanced_sha) works because TestRepo::git_output trims (src/testing/mod.rs:1120), but it's worth knowing — if that trim ever goes away, the assertion becomes vacuous since the trailing \n would always make the strings differ.

The `Err` arm of `warn_if_branch_retained` covered failure modes the user
cannot act on differently than re-running the command (`git branch -D` exec
errors, refs DB I/O failures). Promote it from `warning_message` to
`log::warn!` to match the writing-user-outputs guideline for non-user-fixable
issues, and let the `Ok(NotDeleted)` race path stay user-visible since the
recovery is `wt remove --force-delete <branch>`.

Net effect on the prior commit: removes the duplicate "to retry, run ..."
recovery hint for I/O errors (the prior wording suggested the same recovery
either way, conflating two different failure shapes) and cuts an
integration-test-unreachable code path that was the source of the
codecov/patch miss on #2900.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@max-sixty max-sixty force-pushed the agent-a3bad31af2bb71c8e-cas branch from c127ed5 to 96f7f14 Compare May 25, 2026 22:14
@max-sixty max-sixty changed the base branch from agent-a3bad31af2bb71c8e to main May 25, 2026 22:16
max-sixty and others added 3 commits May 25, 2026 15:21
Integration tests cover only one of the four arms — the
NotDeleted+planner-expected-deletion warning surfaced by
test_merge_pre_remove_new_commit_keeps_branch. Add a binary-crate unit
test that calls the helper with each outcome (Ok(NotDeleted) with both
planner predictions, Ok(ForceDeleted), Ok(Integrated), Err) so the
suppression logic and the log::warn Err arm are exercised in coverage.

Output goes to stderr and isn't captured — the assertion is the call
doesn't panic on any arm. User-visible message shapes stay pinned by
the integration tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three follow-ups to #2870 collapse into one rewrite of
`delete_branch_if_safe`: it now performs an atomic compare-and-swap
delete via `git update-ref -d refs/heads/<branch> <expected-sha>`,
where the expected SHA comes from the snapshot the integration check
already consulted. If the ref moves between the integration check and
the delete (a hook, a concurrent push), git rejects the CAS — the
branch is retained (fail-closed) and we surface the new
`BranchDeletionOutcome::RetainedRaced` outcome rather than dropping
the unmerged commits silently. Force-delete still uses `git branch -D`
(the user's explicit override).

This unifies the divergent safe-delete semantics in
`execute_instant_removal_or_fallback`:

  - Fast path and `SynchronousForNonCurrent` fallback already routed
    through `delete_branch_if_safe`, so they pick up CAS for free.
  - Detached fallback used to bake `&& git branch -d <branch>` into
    the shell, which only honored git's reachability-from-HEAD check
    (so squash-merged / patch-id / ancestor branches the planner
    accepted as integrated were rejected at delete time). Now
    `build_cas_branch_delete_tail` runs worktrunk's full integration
    check in the foreground and, if integrated, appends an atomic
    `git update-ref -d <ref> <expected-sha>` to the detached shell —
    same semantics as the other paths, plus CAS safety against tip
    movement between the foreground check and the detached delete.
  - `handle_branch_only_output` also switches its
    safe-delete path from `git branch -D` (an unsafe force-delete after
    a stale integration check) to `delete_branch_if_safe`, so the
    CAS protects branch-only deletions the same way.

The display layer absorbs the new variant: `flag_note` treats
`RetainedRaced` like `NotDeleted` for the success badge,
`handle_branch_deletion_result` shows the unmerged hint, and the
branch-only output path emits a dedicated "moved during deletion"
warning with a `wt remove --force-delete <branch>` recovery hint.

CAS protects the TOCTOU window inside `delete_branch_if_safe` (between
its own snapshot capture and `update-ref -d`). Hook-driven races are
still caught by the existing fresh integration check that runs after
pre-remove hooks; CAS narrows the residual unprotected window from
\"between check and delete\" to \"never\".

Tests: `git update-ref -d <ref> <sha>` is structured — exit 0 on
deletion, non-zero on stale-SHA rejection (\"cannot lock ref ... is at
X but expected Y\"). The CAS helper re-checks ref presence via
`rev-parse --verify --quiet` rather than parsing the error message,
keeping with the structured-output policy. Two new unit tests in
`git::remove::tests`:

  - `cas_rejects_delete_when_branch_advances` — captures a snapshot,
    moves the branch externally, then calls `delete_branch_if_safe`
    and asserts `RetainedRaced` plus surviving tip at the post-race
    SHA. Pins the CAS rejection mechanism.
  - `cas_deletes_when_branch_unchanged` — sanity check the common
    integrated case still deletes.

`test_prune_fallback_config_race_canary`'s wrapper-git script now
matches `update-ref -d refs/heads/<branch>` alongside the
`branch -d|-D <branch>` shapes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The doc rewrite that came with the CAS commit only mentioned the
NotDeleted-race case. Spell out RetainedRaced, NotDeleted (with the
planner-predicted-retention suppression), the silent success arms, and
the log::warn! Err arm so the file reads consistently after the rebase
onto the PR-2900 simplification.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@max-sixty max-sixty force-pushed the agent-a3bad31af2bb71c8e-cas branch from 96f7f14 to 992790f Compare May 25, 2026 22:22
Copy link
Copy Markdown
Collaborator

@worktrunk-bot worktrunk-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CAS rewrite addresses the previous review's structural concerns. One small follow-up on the new test (inline).

Comment thread src/output/handlers.rs
worktrunk-bot and others added 2 commits May 25, 2026 22:35
The arm this PR introduces — the only one that warns regardless of
planner_expected_retention — was the one missing from the per-arm test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…8e-cas

# Conflicts:
#	src/commands/step/prune.rs
worktrunk-bot
worktrunk-bot previously approved these changes May 28, 2026
Copy link
Copy Markdown
Collaborator

@worktrunk-bot worktrunk-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codecov/patch is failing on the CAS changes. Triaging the report:

False positives (test exists, llvm-cov mis-attributes the middle lines of eprintln!(warning_message(cformat!(...)))): src/output/handlers.rs L324 and L335 in warn_if_branch_retained. The eprintln! openers at L322 and L333 are hit (L322 hits=2, L333 hits=1 in cobertura.xml), so the new warn_if_branch_retained_arms test added in 80e4b5e is doing its job — the gap is reporting noise inside the macro expansion.

Genuine gaps (hits=0 in cobertura):

  • src/output/handlers.rs L1127/L1129 — the RetainedRaced arm of handle_branch_only_output. Distinct from warn_if_branch_retained; the new unit test doesn't reach this function. Easiest fix: a sibling unit test that calls handle_branch_only_output with a BranchDeletionOutcome::RetainedRaced and asserts it doesn't panic, mirroring the pattern at handlers.rs:2016.
  • src/output/handlers.rs L393 — the None => remove_command branch of build_remove_command_with_tail. The Some(tail) arm is exercised (L391 hits=1); the no-tail arm is trivial to cover with a unit test on the helper.
  • src/git/remove.rs L437-438 — the snapshot_sha returns None fallback. By the inline comment this is intentionally a corner-case preservation of pre-CAS behavior, reachable only via a synthetic snapshot/branch mismatch. Either construct a RefSnapshot that hits it, or — if the path is genuinely unreachable — remove it and let the caller fall through to a single deletion code path per the CLAUDE.md coverage guidance ("remove unused code… where falling through to a general handler suffices").
  • src/git/remove.rs L495 — the real-error propagation in cas_delete_branch_outcome when both update-ref -d and the post-failure rev-parse --verify fail. Hardest to exercise without a fault-injection hook.

I'm dismissing my prior approval per CLAUDE.md's "never merge a PR with failing codecov/patch without explicit user approval" — the two helper-function gaps look mechanical to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants