fix(prune): serialize integration checks against branch -D (Windows .git/config race)#2808
Merged
Conversation
…git/config race) On Windows, `wt step prune` intermittently failed with `unable to access '.git/config': Permission denied`: parallel `integration_reason` checks read `.git/config` while the inline removal's synchronous `git branch -D` rewrites it via lockfile+rename. Windows holds the file with delete access during the rename and a concurrent reader's `fopen` (no FILE_SHARE_DELETE, no retry) fails; POSIX is unaffected. Add a prune-local `RwLock<()>`: each `integration_reason` call takes a read guard, `try_remove` takes the write guard, so no reader git process is in flight while the synchronous fast-path `branch -D` runs. Parallel checks themselves are not serialized; streaming/interleaved output is unchanged. The cross-fs/.gitmodules/Windows-file-lock fallback defers `branch -D` into a detached command (the worktree still references the branch), which runs outside the guard — documented in-code as a known residual gap, not closed. `test_prune_fallback_config_race_canary` exercises that fallback under parallel fan-out: a deterministic POSIX pass, a Windows canary for whether the residual gap matters in practice. Ref #2801 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
worktrunk-bot
approved these changes
May 19, 2026
max-sixty
added a commit
that referenced
this pull request
May 19, 2026
…Context (#2811) Follow-up to #2808. That PR fixed the Windows `.git/config` race in `wt step prune` by adding a prune-local `RwLock` parameter to `try_remove`, pushing it to 8 parameters and a `#[allow(clippy::too_many_arguments)]` — deliberately deferred as an optional standalone cleanup to keep the race fix minimal. Of `try_remove`'s 8 arguments, only `candidate` varies; the other 7 are identical at all three call sites in `step_prune` (two in the `rx` loop, one for `deferred_current`). This bundles them into a `RemovalContext<'a>` struct of borrows, constructed once and passed by reference, following the existing `CommandContext` / `RemovedWorktreeOutputContext` / `TemplateContext` pattern. Each call site collapses from a 9-line argument block to one line, the loop-invariant relationship is now expressed in the type rather than re-asserted by hand three times, and the `#[allow]` is removed (clippy is clean without it). The B-prime concurrency semantics are intact: the struct carries `&RwLock<()>` like the other borrows, and `try_remove` still acquires the `check_lock` write guard as the first statement of its body, held over the whole body — load-bearing for the Windows fix. Pure refactor, no behavior change. Rebased onto main after #2806 (the approval-boundary TOCTOU fix), which renamed `try_remove`'s `run_hooks: bool` to `hook_plan: &ApprovedHookPlan`; `RemovalContext` carries that field unchanged. Follow-up commit: the struct refactor incidentally pushed `handle_remove_output(...)` past rustfmt's `fn_call_width` (the `ctx.`-prefixed args), splitting it multi-line and moving the trailing `)?;` onto its own line — which LLVM-cov flags as a `codecov/patch` miss despite the line executing (line 205 is covered and 205→210 is straight-line). Reading the `Copy` fields into locals keeps that call byte-identical to its covered form on main, so the diff stays a faithful no-behavior-change refactor. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
On Windows,
wt step pruneintermittently panickedprune should succeedwithunable to access '.git/config': Permission denied.pruneruns parallelintegration_reasonbranch-integration checks (each spawns git subprocesses that read.git/config) on a background thread while the main loop performs inline removals — and the fast-path removal synchronously runsgit branch -D, which rewrites.git/config(it removes the branch's[branch "<name>"]section) via git's lockfile + atomic-rename. On Windows the rename briefly holds.git/configwith delete access; a concurrent reader'sfopen(...,"r")does not passFILE_SHARE_DELETEand is not retried, so it fails. POSIXopen()/rename()semantics never exhibit this — Linux/macOS CI is green.Approach (the "B-prime" option)
A prune-local
Arc<RwLock<()>>: eachintegration_reasoncall is wrapped in a read guard (scoped to drop before the channel send);try_removetakes the write guard over its body. The write guard blocks until every in-flight reader has dropped and blocks new ones until the removal completes, so no integration-check git process is alive while the synchronousgit branch -Drewrites.git/config. The parallel checks themselves are not serialized — only a reader-vs-removal overlap is excluded — so prune's streaming/interleaved output (the property that fixed the historical start-silence) is unchanged. The lock is confined toprune.rs; the read path is a single guarded scope with no changes tointegration_reason,Cmd, orhandlers.rs.Alternatives considered and rejected: full serialization (reintroduces start-silence);
git update-ref -d+ deferred config-section purge (a Ctrl-C between ref delete and purge leaves a silent ghost-upstream that can corrupt prune's own integration decisions — a data-safety regression); a global git mutex (invasive, kills the parallelism). Poisoning is recovered viaunwrap_or_else(|e| e.into_inner())rather than.expect()— the lock guards(), so a poisoned lock is meaningless and panicking would only cascade an unrelated panic across every later removal.Known residual gap (deliberate, documented — not closed here)
B-prime closes the race on the instant-rename fast path, which is the path the observed CI failure took (synchronous
git branch -Dunder the guard). On the cross-filesystem /.gitmodules/ Windows-file-lock fallback,git branch -Dmust be deferred into a detachedgit worktree remove && git branch -D(the worktree still references the branch until it is removed, so an in-processbranch -Dwould fail) — that deferred write runs after the guard drops and is not covered. The fallback is rare for prune targets (integrated worktrees the user is done with; the lock-prone current worktree is deferred last, after the check thread has drained), so the residual exposure is small but non-zero. Thecheck_lockrationale comment documents this precisely rather than overclaiming an unconditional guarantee.test_prune_fallback_config_race_canaryis the empirical probe for whether that residual gap matters: it forces the fallback for one non-current integrated worktree (pre-blocking its staged path) while several other integrated worktrees keep the parallel integration fan-out running, so the deferred config-rewritingbranch -Doverlaps live.git/configreaders. On POSIX it is a deterministic fallback-path smoke test (a concurrent read and rename never collide). On Windows it is a canary: a flake here during normal CI runs is empirical proof the fallback path needs closing (which would require synchronous/foreground fallback removal — a real perf regression, hence deferred to evidence rather than decided on speculation). The canary asserts only the immediate race signal (prune exit + absence of the.git/configerror); removal-completion polling was intentionally dropped after it proved load-sensitive (a 60swait_fortimeout under heavy concurrent load) and carries no canary value.Testing
pre-commit run --all-filesclean; full suite green (3737 passed, 0 skipped); prune suite36 passed, 0 failed. The fix's correctness is established by macOS reasoning + the green macOS suite; the actual Windows-only behavior can only be confirmed bytest (windows)here (the originally-flakingtest_prune_locally_merged_when_upstream_divergedgoing/staying green) and by the canary's flake rate over subsequent Windows runs.Ref #2801