The Caching Concurrency Investigation

Two chapters of the same general story: concurrency in this codebase needed real verification, twice, on the same two caches. The second chapter was found while investigating a consequence of the first chapter's own design — ThreadPoolExecutor plus ContextVar — so they belong together even though they're separate bugs with separate fixes.

Chapter 1: synthetic traffic was leaking into both real caches

Adversarial Self-Testing runs synthetic, generated queries through the genuinely real route_with_source() pipeline — that's the entire point of the feature, proving real routing/fallback/fusion behavior actually works, not a simulation of it. But route_with_source() writes to both the result cache and the routing cache as an unconditional side effect of any successful query, synthetic or real, several calls deep inside the routing logic (_resolve_single_source()'s _set_cached() call, _llm_detect()/_llm_pick_fusion_sources()'s _set_routing() calls) — there's no way for a caller several frames up to opt out after the fact.

This was found via a deliberate cross-check while researching an unrelated design doc: Adversarial Self-Testing's own code claimed it "never touches cache.json, routing_cache.json... or any real user-facing state." That claim was wrong, confirmed directly with an unmocked call — a single synthetic adversarial query really did land in the real, in-memory cache dict, and would have persisted to disk on the next batched save. The existing test guarding this claim only ever mocked route_with_source() out entirely, so it could never have caught this — it was only ever proving "if route_with_source doesn't run, nothing else here touches the cache files either," not the actual claim it was named for.

Fixed with router.suppress_cache_writes(), a context manager Adversarial Self-Testing now wraps its real routing call in. Deliberately built on contextvars.ContextVar, not a plain module-level boolean — a plain flag would have been a genuine, real race condition: BackgroundScheduler runs Adversarial Self-Testing on its own thread pool, fully concurrent with FastAPI's request-handling threads, and a real live request's legitimate cache write landing in the same window a plain global flag was set would have been silently dropped too — a strictly worse bug than the one this fixes. ContextVar is thread-local (and task-local under asyncio) by design, confirmed directly: a real concurrent request on one thread is completely unaffected by suppression active on another.

The sharp edge this design left behind

ThreadPoolExecutor does not propagate ContextVar state into worker threads by default — this is documented, deliberate Python behavior, not a bug in concurrent.futures. This surfaced as a real, live regression once Query Expansion's concurrent-fetch fix shipped: a synthetic Adversarial Self-Testing query running under suppress_cache_writes(), calling into searxng.py's newly-parallelized search(), leaked a real routing-cache write from inside the concurrent alternate-phrasing thread — confirmed directly with a failing test before the fix.

Any code submitting work to an executor while suppression might be active needs to capture contextvars.copy_context() per task (not one shared copy — a single Context object can't be entered by two threads at once) and submit copy.run(fn, ...) rather than fn directly. searxng.py's search() and router.py's own _resolve_conditional() (the conditional-query latency fix — see The Latency Parallelization Investigation) are two real, independently-verified applications of the same pattern.

Chapter 2: the file-write race underneath both caches

While auditing every writer of the routing cache that get_alternate_phrasing() touches — part of researching whether Query Expansion's two sequential SearXNG fetches could safely run concurrently — a second, separate concurrency bug turned up: both caches' own disk-persistence functions (_save_routing_cache(), _save_cache()) used to do a bare open(path, "w") followed by json.dump().

This wasn't a risk the parallelization work would have introduced. FastAPI's /search endpoint is a synchronous route, so Starlette already runs genuinely concurrent real requests on its own thread pool today — two requests landing close enough together to both trigger a save is a real, already-live scenario, not a hypothetical one.

The actual risk was never really the in-memory dict (a single dict mutation is already safe under the GIL) — it was the file itself. If one thread's open(path, "w") truncates the file while another thread's open(path, "w") also truncates it before the first thread's json.dump() finishes, the file can end up malformed. Confirmed directly, not just reasoned about: a deliberate stress test with 8 concurrent writer threads and 8 concurrent reader threads against the old pattern produced 79,609 JSON corruption errors in two seconds. The real-world blast radius was bounded — load_routing_cache()'s own existing except json.JSONDecodeError fallback already catches a corrupted file and starts fresh rather than crashing — but silently losing the entire on-disk cache on next restart is still a real, avoidable cost.

Fixed with the standard pattern for exactly this problem: a shared _atomic_write_json() helper writes to a temporary file in the same directory, then os.replace()s it onto the real target. os.replace() is atomic on POSIX (what this project actually runs on), so the target file is always either the complete old version or the complete new one — never a partial write from either side, no matter how two concurrent calls interleave. The identical 8-writer/8-reader stress test against the fix: zero errors. Both caches' save functions now use this one shared helper.

The lesson: this codebase's caching logic looked safe twice, in two different ways, until concurrent real traffic was actually simulated against it rather than assumed safe by inspection. The in-memory side (a dict mutation) really was safe under the GIL — but that correct conclusion about one layer didn't transfer to the layer underneath it (the file on disk), and a ContextVar-based fix for one race had its own undocumented gap once a third actor (a ThreadPoolExecutor) entered the picture. Each fix held up under a real stress test; none of them would have been caught by reasoning about the code alone.

The Caching Concurrency Investigation

The Caching Concurrency Investigation

Chapter 1: synthetic traffic was leaking into both real caches

The sharp edge this design left behind

Chapter 2: the file-write race underneath both caches

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally