fix(cache): Ensure per-thread pymemcache clients in ReconnectingMemcache by gricha · Pull Request #111545 · getsentry/sentry

gricha · 2026-03-25T18:41:50Z

Django's CacheHandler stores connections in a contextvar (asgiref.local._CVar). When contextvars.copy_context() copies these across threads — as ContextPropagatingThreadPoolExecutor does — multiple threads share a single pymemcache.HashClient instance. HashClient is not thread-safe: _retry_dead() races on a shared _dead_clients dict, causing KeyError crashes when a twemproxy node flaps.

This was surfaced by the ThreadPoolExecutor → ContextPropagatingThreadPoolExecutor migration in #111464 (reverted in 7e509b5), which caused consumer worker threads to inherit the parent's cache connection instead of creating their own. The same race also affects web workers (SENTRY-5MFX, SENTRY-5E8T) via granian's thread pool.

The fix stores the backend in a contextvar with a thread ID check. When a copied context is accessed from a different thread, a new client is created. This preserves contextvar semantics (async-safe) while restoring per-thread isolation for HashClient.

Fixes SENTRY-5H8T

Agent transcript: https://claudescope.sentry.dev/share/DUqlzydq70wt3Np4p2CKKgDFZAM251ooNsKQg4RIXiY

src/sentry/cache/backends/reconnectingmemcache.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-25T18:51:49Z

src/sentry/cache/backends/reconnectingmemcache.py

+                created_at=time.time(),
+                client=client,
+            )
+            self._backend_var.set(state)


Per-task client creation instead of per-thread reuse

Medium Severity

With ContextPropagatingThreadPoolExecutor, each submit() call creates a fresh context copy via copy_context(). When the worker thread accesses _cache, it detects state.thread_id != current_tid, creates a new HashClient, and stores it in the ephemeral copied context. When the next task runs on the same worker thread, it gets a new context copy (again with the parent's state), detects the thread mismatch again, and creates yet another client. The previous client is never explicitly closed — it relies on GC to clean up socket connections. This yields per-task client creation rather than the intended per-thread reuse, causing significant connection churn under load.

markstory

Makes sense to me. While the cache adapter is stored in a threadlocal, copying context makes what use to be thread isolated values shared which introduced a thread race. Another layer of threadlocals that check thread identities will solve that.

Django's CacheHandler stores connections in a contextvar. When contextvars are copied across threads (e.g. via copy_context() in ContextPropagatingThreadPoolExecutor), multiple threads end up sharing a single pymemcache HashClient instance. HashClient is not thread-safe — _retry_dead() races on a shared dict, causing KeyError crashes when a twemproxy node flaps. Fix by storing the backend in a contextvar with a thread ID check. When a copied context is accessed from a different thread, a fresh client is created instead of reusing the parent's. Fixes SENTRY-5H8T Agent transcript: https://claudescope.sentry.dev/share/8M_nAJwoRwoO5SzOBr1QIGwraQS8Z4M31i0MtYYHnrc

Agent transcript: https://claudescope.sentry.dev/share/TSGlquuRaBa9cZn0P7YbBKLbxTyGbcb1OAdZxvOYIX0

…PropagatingThreadPoolExecutor Re-applies the monitor consumer portion of #111464 (reverted in 7e509b5). The original revert was due to a pymemcache HashClient thread-safety issue triggered by contextvar-based cache connection sharing, which is fixed by #111545.

…tor consumer (#111568) Re-applies the monitor consumer portion of #111464 (reverted in 7e509b5). The revert was caused by a `pymemcache.HashClient._retry_dead()` race condition — Django's `CacheHandler` stores connections in a contextvar, and `copy_context()` caused all worker threads to share a single non-thread-safe `HashClient`. This is fixed by #111545, which gives each thread its own client. Depends on #111545. Agent transcript: https://claudescope.sentry.dev/share/bXndRR4KT3FDez_xnzf1nwcoicOkmF93CYhAl-bcoKY

…le S016 (#111569) Re-applies the non-monitor-consumer portion of #111464 (reverted in 7e509b5): occurrence consumer, result consumer, and S016 lint rule re-enablement. Depends on #111545 and #111568. Agent transcript: https://claudescope.sentry.dev/share/Jgc7K9L3nJgpb6s4imUcdeHJ3iJNAmxOcsHq09hQW4M

gricha requested a review from a team as a code owner March 25, 2026 18:41

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 25, 2026

gricha requested a review from markstory March 25, 2026 18:43

vercel bot deployed to Preview March 25, 2026 18:44 View deployment

sentry bot reviewed Mar 25, 2026

View reviewed changes

src/sentry/cache/backends/reconnectingmemcache.py Show resolved Hide resolved

cursor bot reviewed Mar 25, 2026

View reviewed changes

markstory approved these changes Mar 25, 2026

View reviewed changes

gricha requested a review from a team as a code owner March 25, 2026 19:23

vercel bot deployed to Preview March 25, 2026 19:25 View deployment

gricha added 2 commits March 25, 2026 12:39

fix: Address CI failures — mypy type ignore and CODEOWNERS entry

16befd9

Agent transcript: https://claudescope.sentry.dev/share/TSGlquuRaBa9cZn0P7YbBKLbxTyGbcb1OAdZxvOYIX0

gricha force-pushed the gricha/fix-memcache-thread-safety branch from 9be43b6 to 16befd9 Compare March 25, 2026 19:39

vercel bot deployed to Preview March 25, 2026 19:42 View deployment

gricha merged commit 3df942b into master Mar 25, 2026
63 of 64 checks passed

gricha deleted the gricha/fix-memcache-thread-safety branch March 25, 2026 20:35

This was referenced Mar 25, 2026

ref(monitors): Re-enable ContextPropagatingThreadPoolExecutor in monitor consumer #111568

Merged

ref: Migrate remaining consumer ThreadPoolExecutor usages and re-enable S016 #111569

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cache): Ensure per-thread pymemcache clients in ReconnectingMemcache#111545

fix(cache): Ensure per-thread pymemcache clients in ReconnectingMemcache#111545
gricha merged 2 commits intomasterfrom
gricha/fix-memcache-thread-safety

gricha commented Mar 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 25, 2026

Uh oh!

markstory left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gricha commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 25, 2026

Choose a reason for hiding this comment

Per-task client creation instead of per-thread reuse

Uh oh!

markstory left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gricha commented Mar 25, 2026 •

edited

Loading