fix(cache): Ensure per-thread pymemcache clients in ReconnectingMemcache#111545
fix(cache): Ensure per-thread pymemcache clients in ReconnectingMemcache#111545
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| created_at=time.time(), | ||
| client=client, | ||
| ) | ||
| self._backend_var.set(state) |
There was a problem hiding this comment.
Per-task client creation instead of per-thread reuse
Medium Severity
With ContextPropagatingThreadPoolExecutor, each submit() call creates a fresh context copy via copy_context(). When the worker thread accesses _cache, it detects state.thread_id != current_tid, creates a new HashClient, and stores it in the ephemeral copied context. When the next task runs on the same worker thread, it gets a new context copy (again with the parent's state), detects the thread mismatch again, and creates yet another client. The previous client is never explicitly closed — it relies on GC to clean up socket connections. This yields per-task client creation rather than the intended per-thread reuse, causing significant connection churn under load.
markstory
left a comment
There was a problem hiding this comment.
Makes sense to me. While the cache adapter is stored in a threadlocal, copying context makes what use to be thread isolated values shared which introduced a thread race. Another layer of threadlocals that check thread identities will solve that.
Django's CacheHandler stores connections in a contextvar. When contextvars are copied across threads (e.g. via copy_context() in ContextPropagatingThreadPoolExecutor), multiple threads end up sharing a single pymemcache HashClient instance. HashClient is not thread-safe — _retry_dead() races on a shared dict, causing KeyError crashes when a twemproxy node flaps. Fix by storing the backend in a contextvar with a thread ID check. When a copied context is accessed from a different thread, a fresh client is created instead of reusing the parent's. Fixes SENTRY-5H8T Agent transcript: https://claudescope.sentry.dev/share/8M_nAJwoRwoO5SzOBr1QIGwraQS8Z4M31i0MtYYHnrc
9be43b6 to
16befd9
Compare
…tor consumer (#111568) Re-applies the monitor consumer portion of #111464 (reverted in 7e509b5). The revert was caused by a `pymemcache.HashClient._retry_dead()` race condition — Django's `CacheHandler` stores connections in a contextvar, and `copy_context()` caused all worker threads to share a single non-thread-safe `HashClient`. This is fixed by #111545, which gives each thread its own client. Depends on #111545. Agent transcript: https://claudescope.sentry.dev/share/bXndRR4KT3FDez_xnzf1nwcoicOkmF93CYhAl-bcoKY


Django's
CacheHandlerstores connections in a contextvar (asgiref.local._CVar). Whencontextvars.copy_context()copies these across threads — asContextPropagatingThreadPoolExecutordoes — multiple threads share a singlepymemcache.HashClientinstance.HashClientis not thread-safe:_retry_dead()races on a shared_dead_clientsdict, causingKeyErrorcrashes when a twemproxy node flaps.This was surfaced by the
ThreadPoolExecutor→ContextPropagatingThreadPoolExecutormigration in #111464 (reverted in 7e509b5), which caused consumer worker threads to inherit the parent's cache connection instead of creating their own. The same race also affects web workers (SENTRY-5MFX, SENTRY-5E8T) via granian's thread pool.The fix stores the backend in a contextvar with a thread ID check. When a copied context is accessed from a different thread, a new client is created. This preserves contextvar semantics (async-safe) while restoring per-thread isolation for
HashClient.Fixes SENTRY-5H8T
Agent transcript: https://claudescope.sentry.dev/share/DUqlzydq70wt3Np4p2CKKgDFZAM251ooNsKQg4RIXiY