fix: move postEvict callbacks outside queueLock in FIFO cache#24117
Merged
mergify[bot] merged 4 commits intomatrixorigin:3.0-devfrom Apr 15, 2026
Merged
fix: move postEvict callbacks outside queueLock in FIFO cache#24117mergify[bot] merged 4 commits intomatrixorigin:3.0-devfrom
mergify[bot] merged 4 commits intomatrixorigin:3.0-devfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses lock convoy and severe latency spikes in the FIFO-based in-memory cache eviction path by moving postEvict work (e.g., value.Release() and metrics updates) out of the global queueLock, reducing contention under memory/GC pressure.
Changes:
- Collect evicted entries under
queueLockinto a pending list during eviction. - Release
queueLockbefore executingpostEvictcallbacks. - Refactor eviction helpers (
evict1/evict2/enqueueGhost) to return pending post-evict data instead of invoking callbacks inline.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When MemCache is full (100% utilization), every Set() triggers Evict() which holds the global queueLock while executing postEvict callbacks (value.Release + metrics updates). Under GC pressure with STW pauses up to ~1s, this creates a lock convoy where all concurrent cache operations serialize through the single queueLock, inflating per-Set() latency from <0.1ms to 300-444ms. This change collects evicted items under the queueLock, then executes postEvict callbacks after releasing the lock. The item's valueOK flag is still set to false under the shard lock (to avoid data races with Get()), but the expensive Release() and metrics callbacks run lock-free. In stability testing, this bottleneck caused query execution times to inflate from 1-5s to 60-107s, exceeding client-side timeouts and producing connection disconnections in TPCC (40 events) and fulltext (57 events) workloads. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Wrap queueLock.Unlock + postEvict callbacks + done signal in a defer to ensure panic safety (lock is always released even on unexpected panic). - Move done <- target after postEvict callbacks complete, preserving the IOVectorCache.Evict contract that done signals eviction is finished. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7106133 to
bad39a1
Compare
Contributor
Merge Queue Status
This pull request spent 11 seconds in the queue, including 2 seconds running CI. Required conditions to merge
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
Which issue(s) this PR fixes:
issue #24106
What this PR does / why we need it:
Move postEvict callbacks (value.Release + metrics updates) outside the global
queueLockin the FIFO cache to eliminate lock convoy under memory pressure.Root Cause
When MemCache (12GB FIFO cache) is 100% full, every
Set()triggersEvict()which holds the globalqueueLockwhile executingpostEvictcallbacks. Under GC pressure (STW pauses up to ~1s), this creates a lock convoy where all concurrent cache operations serialize through the single lock:Set()takes 300-444ms (normal: <0.1ms, 3000-4400x slower)Evidence from fileservice slow event trace:
Fix
queueLockinto a pending listqueueLockpostEvictcallbacks (value.Release + metrics) outside the lockThe
item.valueOKflag is still set tofalseunder the shard lock (not queueLock) to prevent data races with concurrentGet()calls.Impact
In stability testing (commit 725b723):