fix: reap orphaned fsmonitor daemons in wt remove internal sweep#2814
Merged
Conversation
Defense-in-depth for git fsmonitor--daemon processes orphaned by paths that bypass wt remove's synchronous daemon stop (plain git worktree remove, manual rm -rf, crashed wt). New src/git/fsmonitor.rs identifies provably-orphaned daemons via the IPC socket + live-worktree set and escalates SIGTERM->bounded wait->SIGKILL. Hooked into the existing repo-wide internal sweep op (run_internal_sweep) alongside the stale trash sweep, after primary output. Unix only; no-op on Windows. Finishing pass: - Fix rustdoc -Dwarnings failures that broke the pre-merge doc step (public docs linked private REAP_KILL_DEADLINE / IPC_SOCKET_NAME; Repository::git_dir was actually WorkingTree::git_dir). - Gate the escalate_terminate test scaffolding under #[cfg(unix)] so the Windows test build compiles under -D warnings: FakeSignaller, its two impls, its three tests, and the RefCell/HashMap imports were ungated while escalate_terminate itself is #[cfg(unix)]. - Strengthen the 'a live worktree's daemon is never reaped' invariant: live_git_dirs now returns Option; an unknowable live set (git worktree list failed) disables class-2 reaping entirely rather than being treated as an empty set that would reap every resolved-socket daemon under the repo. New unit test unknowable_live_set_spares_resolved_socket_daemons. - Drop the structurally-dead catch-all arm in canonicalize_socket; both remaining arms are exercised by the existing symlink/gone test. Co-Authored-By: Claude <noreply@anthropic.com>
worktrunk-bot
approved these changes
May 19, 2026
Collaborator
worktrunk-bot
left a comment
There was a problem hiding this comment.
One question on the pgrep→lsof handoff in enumerate_daemons — see inline. Otherwise the data-safety contract reads carefully and the unit tests cover the classification well.
`pgrep -f "git fsmonitor--daemon"` is a substring match on the full command line, so a non-daemon process whose argv merely contains that string — a debugger `gdb …/git fsmonitor--daemon`, a wrapper — is enumerated as a candidate. Such a process holds no IPC socket, so `parse_lsof_socket_path` returned None and `classify_orphans` treated `socket == None` as class 1 (reap) and SIGTERM/SIGKILL'd the unrelated process during the sweep. Factor the per-PID lsof-stdout → `Option<DaemonProcess>` mapping into `daemon_from_lsof_stdout` and guard it: if the lsof output does not contain `IPC_SOCKET_NAME` at all, return None rather than misclassify. A real orphaned daemon's lsof output always contains the socket name (resolved path or bare), so class-1 behavior is unchanged for real orphans. Three deterministic unit tests cover the pgrep-false-positive skip, the resolved-socket pass-through, and the bare-name class-1 preservation. Co-Authored-By: Claude <noreply@anthropic.com>
worktrunk-bot
approved these changes
May 19, 2026
max-sixty
added a commit
that referenced
this pull request
May 19, 2026
…#2813) ## Why `wt remove` already calls `git fsmonitor--daemon stop` before staging a worktree to trash, but that stop is an IPC request to the daemon itself. When the daemon is *wedged* — the common failure that makes `git status` (and therefore `wt list`) hang — it can't answer the IPC, so the stop is a silent no-op and the daemon leaks. In practice this accumulated ~80 orphaned `git fsmonitor--daemon` processes on one machine, one of which wedged and hung `wt list` outright. ## What Adds a canonical `stop_fsmonitor_daemon` helper: after the IPC stop, it resolves the daemon's PID from its IPC socket and escalates SIGTERM → bounded wait → SIGKILL, scoped strictly to the removed worktree's own socket (never matched by process name, never another worktree's daemon). All three removal paths (library, foreground, detached-background) route through the one helper; the redundant `git ... fsmonitor--daemon stop` shell fragment in the detached path is deleted so there is a single canonical stop. Force-kill is Unix-only with a clean `#[cfg(not(unix))]` no-op (Windows keeps the IPC stop, where the daemon uses a named pipe). This is the source-level fix for the leak on the removal path. Companion PR #2814 adds a defense-in-depth sweep for daemons orphaned by paths that bypass `wt remove` (plain `git worktree remove`, manual `rm`, a crashed `wt`). The two branches share low-level fsmonitor mechanics and both touch `faq.md` / `troubleshooting.md` / `CHANGELOG.md`, so whichever lands second will need a small dedup and doc reconciliation. ## Testing Unit tests cover the deterministic logic: git-dir/socket resolution for a linked worktree, the SIGTERM-honored fast path, and SIGTERM-ignored → SIGKILL escalation (with a readiness handshake to avoid a signal-vs-trap-install race). `wt hook pre-merge --yes` is green. The `lsof`-failure and git-dir-error arms are single best-effort `log::debug!`+return lines that fall through to the existing fail-open behavior. --------- Co-authored-by: Claude <noreply@anthropic.com>
Resolves doc + CHANGELOG conflicts after #2813 landed: - CHANGELOG.md: moves the orphan-sweep entry from `## 0.51.0 / Fixed` (where it was incorrectly placed when this branch first diffed against 6fd2b70) into `## Unreleased / Fixed`, alongside #2813's wedged-daemon-on-removal entry. - docs/content/faq.md: keeps both "Other cleanup" bullets — #2813's synchronous-on-removal force-kill and this PR's background orphan-sweep (distinct mechanisms, both worth documenting). - skills/worktrunk/reference/troubleshooting.md: collapses two near-duplicate `pkill` snippets into one and combines #2813's wt-remove-reaps-even-wedged paragraph with this PR's residual-case-on-a-live-worktree paragraph. - skills/worktrunk/reference/faq.md: regenerated by the doc-sync test from docs/content/faq.md. src/main.rs / src/git/mod.rs / src/commands/process.rs auto-merged cleanly; 18 fsmonitor unit tests pass on the merged tree, including this PR's `daemon_from_lsof_*` tests and #2813's `terminate_pid` tests. Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Companion to #2813. That PR stops the fsmonitor-daemon leak at its source (
wt removeforce-terminating a wedged daemon). This PR is defense-in-depth for daemons orphaned by paths that bypasswt removeentirely — plaingit worktree remove, a manualrm -rf, or a crashedwt— which #2813 cannot cover. Background: a wedgedgit fsmonitor--daemonmakesgit status(andwt list) hang; orphans had accumulated ~80-deep on one machine.What
Adds an orphan sweep (
src/git/fsmonitor.rs:classify_orphans,reap_orphan_fsmonitor_daemons) hooked into the existingwt removeinternal sweep op — no new always-on/timer mechanism, and deliberately not wired towt listor any read-only command (killing processes as a side effect of a read-only command is out of bounds). It reaps only daemons whose IPC socket resolves to a worktree that no longer exists, SIGTERM → bounded wait → SIGKILL, socket-scoped vialsof -a -p <pid>(never broad process-name matching). A daemon serving a live worktree is never reaped — proven by unit tests, and the invariant holds even when the live-worktree set is unknowable:live_git_dirsreturnsOption, and an unknowable set disables resolved-socket reaping entirely rather than falling back to "reap everything".Shares low-level fsmonitor mechanics with #2813 and both touch
faq.md/troubleshooting.md/CHANGELOG.md; whichever of the two lands second needs a small dedup and doc reconciliation.Testing
wt hook pre-merge --yesgreen (3757/3757). 11 deterministic unit tests cover the pure logic, includinglive_worktree_daemon_is_never_reapedandunknowable_live_set_spares_resolved_socket_daemons.src/git/fsmonitor.rsis a new file, so it is entirely in patch scope: 94.5% line / 95.4% region. The uncovered lines are real-OS-signal and live daemon-enumeration paths only reachable by spawning realgit fsmonitor--daemonprocesses; their pure logic is exhaustively unit-tested. A real-daemon end-to-end test was prototyped and dropped as flaky and net-negative (daemon-spawn races; risk of SIGKILLing unrelated dev-machine daemons). These uncovered OS-only lines are an accepted trade-off — ifcodecov/patchflags them it is a known false positive, not a gap to fill with a fragile test.