Skip to content

fix(src): close KB watcher start/start TOCTOU race#72

Merged
saagpatel merged 1 commit into
masterfrom
codex/fix/kb-watcher-mutex
Apr 22, 2026
Merged

fix(src): close KB watcher start/start TOCTOU race#72
saagpatel merged 1 commit into
masterfrom
codex/fix/kb-watcher-mutex

Conversation

@saagpatel
Copy link
Copy Markdown
Owner

Summary

start_kb_watcher_impl constructs + starts a KbWatcher before acquiring the KB_WATCHER StdMutex. Two concurrent starts can both construct and start their own watcher instances before either stores in the slot — the second assignment overwrites the first, and the first watcher's notify threads keep running on the filesystem as an unreferenced orphan that stop_kb_watcher can't reach.

Fix

Hold the KB_WATCHER guard across the full check/create/start/store sequence. A racing second starter now observes the populated slot and returns Ok(false) without ever constructing a second watcher.

  • KbWatcher::new and watcher.start() are synchronous — holding the StdMutex guard across them is safe under tokio
  • tokio::spawn event-forwarder stays outside the critical section so emitting events never contends with KB commands
  • Added an inline comment explaining the race and why the guard scope is correct

Audit context

The original 5-agent audit described this as "global state without Arc<Mutex<…>>." On reading the file, the mutex was already in place — but the sequencing inside start_kb_watcher_impl left a genuine TOCTOU gap that the audit's framing missed. This PR fixes the real race.

Test plan

  • cargo check --all-targets
  • cargo test --lib — 311/312 pass (one pre-existing #[ignore]'d model-download test)
  • (manual) exercise double-click "Start watcher" in Settings — should be idempotent, second call returns false

🤖 Generated with Claude Code

start_kb_watcher_impl previously performed the lifecycle steps in this
order:

  1. KbWatcher::new(&path)
  2. watcher.start()        (spawns notify threads, starts watching)
  3. KB_WATCHER.lock()
  4. *guard = Some(watcher)

Between 2 and 3 the watcher is already running on the filesystem but
unreferenced by the global slot. Two concurrent start_kb_watcher
invocations — e.g., a settings "Start" click that double-fires, or a
frontend that retries after a timeout while the first call is still in
flight — could each pass their own check-then-act without observing
each other, and both reach step 4 with their own watcher instance. The
second guard assignment would overwrite the first, but the first
watcher's notify threads keep running on the filesystem as an orphan
with no handle for stop_kb_watcher to reach.

Fix: hold the KB_WATCHER StdMutex guard across the full
check/create/start/store sequence. A racing second starter now observes
the populated slot and returns Ok(false) without ever constructing a
second KbWatcher. KbWatcher::new and watcher.start() are synchronous,
so holding the guard across them does not block other async tasks on
the runtime; the tokio::spawn event-forwarder is moved outside the
critical section so receiving events never contends with KB commands.

Added a SAFETY-style explanatory comment inline; no API or behavior
change for callers other than the correct race resolution.

cargo check --all-targets clean; cargo test --lib passes 311/312 (one
pre-existing #[ignore]'d model-download test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@saagpatel saagpatel merged commit e91748e into master Apr 22, 2026
16 of 17 checks passed
@saagpatel saagpatel deleted the codex/fix/kb-watcher-mutex branch May 31, 2026 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants