-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Fix SearchContext CB memory accounting #138002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
| if (context.checkRealMemoryCB(locallyAccumulatedBytes[0], "fetch source")) { | ||
| // if we checked the real memory breaker, we restart our local accounting | ||
| locallyAccumulatedBytes[0] = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the time these batches were too small, so this didn't trigger.
server/src/main/java/org/elasticsearch/search/SearchService.java
Outdated
Show resolved
Hide resolved
| docIdsToLoad[i] = topDocs.scoreDocs[i].doc; | ||
| } | ||
| FetchSearchResult fetchResult = runFetchPhase(subSearchContext, docIdsToLoad); | ||
| FetchSearchResult fetchResult = runFetchPhase(subSearchContext, docIdsToLoad, this::addRequestCircuitBreakerBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we batch here and avoid invoking the CB for every document?
Maybe addRequestCircuitBreakerBytes should take care of this?
I suspect that fetching source is way more expensive than invoking the CB, so I'm not sure we want more complication here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth checking an esrally benchmark here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll run some aggs nightlies and see what happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran nyc-taxis track, and I see no slowdown (actually it seems faster).
Now I'm running NOAA, that contains more aggs
|
Hi @luigidellaquila, I've created a changelog YAML for you. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| public final boolean checkRealMemoryCB(int locallyAccumulatedBytes, String label) { | ||
| if (locallyAccumulatedBytes >= memAccountingBufferSize()) { | ||
| circuitBreaker().addEstimateBytesAndMaybeBreak(0, label); | ||
| circuitBreaker().addEstimateBytesAndMaybeBreak(locallyAccumulatedBytes, label); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the crux
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checkRealMemoryCB() functions usually add 0 explicitly to force a memory check; are we sure we want to add memory here?
My major questions here would be:
- Are we freeing it later?
- Should we rename the method then?
- This conditionally adds the memory depending on the value, which may lead to wrongly freeing memory later (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that nobody is using this method anymore (apart from a unit test).
The checkRealMemoryCB() functions usually add 0 explicitly to force a memory check; are we sure we want to add memory here?
Maybe this was the intention at the beginning, but apparently it was not so effective, since in my tests the CB was always empty. And if I remember well, it was a ChildMemoryCircuitBreaker
Are we freeing it later?
With this fix I'm delegating the memory accounting to AggregatorBase, that handles tracking and releasing CB memory, so we should be safe.
Should we rename the method then?
I think we can just delete it
This conditionally adds the memory depending on the value, which may lead to wrongly freeing memory later
I guess, since the CB memory management is delegated to AggregatorBase, we should be safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be conservative here, I'll make the memoryChecker nullable, and in case I'll keep trying to triggering the real memory CB
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
| // This works most of the time, but it's not consistent: it still triggers OOM sometimes. | ||
| // The test env is too small and non-deterministic to hold all these data and results. | ||
| @AwaitsFix(bugUrl = "see comment above") | ||
| public void testBreakAndRecover() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like to have this test working, but I couldn't find a way to make it pass deterministically; depending on the system conditions sometimes it OOMs before CB, and if I reduce the memory further, it doesn't CB anymore.
We have unit tests for FetchPhase CB, but having an integration test would be really really good.
I don't know if we should keep this here or delete it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I had a similar problem with the aggs reduction phase memory test, which is now muted: #134667
Test works, but it fails sometimes. I tried tweaking it to have an exact amount of nodes of each kind, as well as docs, queyr limits and CB settings, and it improved, but still flaky.
Maybe you could try forcing a set amount of nodes, like in here:
Line 44 in c6ddf5d
| @ESIntegTestCase.ClusterScope(minNumDataNodes = 1, maxNumDataNodes = 2, numClientNodes = 1) |
Less random, but less flaky (luckily 🤞 )
|
Thanks for the contribution and for the off-line conversation @drempapis |
| * @return true if the circuit breaker is called and false otherwise | ||
| */ | ||
| public final boolean checkRealMemoryCB(int locallyAccumulatedBytes, String label) { | ||
| public final boolean checkCircuitBreaker(int locallyAccumulatedBytes, String label) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer RealMemory, but rather normal (Request) CB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case, is there something else using this, that could not be freeing the accounted memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the only usage for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add to the Javadoc that callers are responsible for decrementing the breaker by the same amount at some point.
ivancea
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I would wait for another review if possible, apart of the benchmark, if it makes sense
| * @return true if the circuit breaker is called and false otherwise | ||
| */ | ||
| public final boolean checkRealMemoryCB(int locallyAccumulatedBytes, String label) { | ||
| public final boolean checkCircuitBreaker(int locallyAccumulatedBytes, String label) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case, is there something else using this, that could not be freeing the accounted memory?
| docIdsToLoad[i] = topDocs.scoreDocs[i].doc; | ||
| } | ||
| FetchSearchResult fetchResult = runFetchPhase(subSearchContext, docIdsToLoad); | ||
| FetchSearchResult fetchResult = runFetchPhase(subSearchContext, docIdsToLoad, this::addRequestCircuitBreakerBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth checking an esrally benchmark here?
| // This works most of the time, but it's not consistent: it still triggers OOM sometimes. | ||
| // The test env is too small and non-deterministic to hold all these data and results. | ||
| @AwaitsFix(bugUrl = "see comment above") | ||
| public void testBreakAndRecover() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I had a similar problem with the aggs reduction phase memory test, which is now muted: #134667
Test works, but it fails sometimes. I tried tweaking it to have an exact amount of nodes of each kind, as well as docs, queyr limits and CB settings, and it improved, but still flaky.
Maybe you could try forcing a set amount of nodes, like in here:
Line 44 in c6ddf5d
| @ESIntegTestCase.ClusterScope(minNumDataNodes = 1, maxNumDataNodes = 2, numClientNodes = 1) |
Less random, but less flaky (luckily 🤞 )
server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java
Outdated
Show resolved
Hide resolved
….java Co-authored-by: Iván Cea Fontenla <ivancea96@outlook.com>
|
Buildkite benchmark this with nyc_taxis-1n-8g please |
|
Buildkite benchmark this with wikipedia please |
|
Buildkite benchmark this with noaa-1n-1g please |
💚 Build Succeeded
This build ran two noaa-1n-1g benchmarks to evaluate performance impact of this PR. History |
|
|
||
| private long requestBreakerBytes; | ||
|
|
||
| public void addRequestBreakerBytes(long delta) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’d be good to document that this field is strictly for fetch-phase memory accounting, will be reversed at the end of the FetchPhase, and shouldn’t be accessed or modified by subclasses. (if any added in the future)
| BytesReference sourceRef = hit.hit().getSourceRef(); | ||
| if (sourceRef != null) { | ||
| locallyAccumulatedBytes[0] += sourceRef.length(); | ||
| // This is an empirical value that seems to work well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify the logic, we could create an IntConsumer once when constructing the FetchPhaseDocsIterator
IntConsumer checker = memoryChecker != null
? memoryChecker
: bytes -> {
locallyAccumulatedBytes[0] += bytes;
if (context.checkCircuitBreaker(locallyAccumulatedBytes[0], "fetch source")) {
addRequestBreakerBytes(locallyAccumulatedBytes[0]);
locallyAccumulatedBytes[0] = 0;
}
};
and simply call
if (sourceRef != null) {
checker.accept(sourceRef.length() * 2);
}
We should double check if that works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @drempapis, I'll give it a try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change and yes, the code looks much better now 👍
|
Thanks for the reviews @ivancea the benchmarks didn't show any significant changes in performance. @drempapis please let me know if you have any further comments, otherwise I'll more forward and merge. Thanks! |
drempapis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you @luigidellaquila
| } finally { | ||
| long bytes = docsIterator.getRequestBreakerBytes(); | ||
| if (bytes > 0L) { | ||
| context.circuitBreaker().addWithoutBreaking(-bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luigidellaquila @drempapis Sorry for the late comment here. Thanks for iterating on this.
I think we're releasing the bytes from the circuit breaker too soon here? The response has not been sent to the client yet so these hits are still in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreidan I agree, we should do better here.
IMHO the right approach would be to manage the CB allocation/release together with the request lifecycle. That's what I did for Aggs, using a memoryChecker that was already part of the aggregation memory accounting, but for Search I couldn't find anything similar. I don't know that part of the codebase in depth though, so I could have missed something obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreidan, we missed it!
I’ve started an alternative here: #139124. It’s WIP and expected to take more time.
I’ll work on it in parallel and make it a priority, given the complexity of the alternative.
Fixing FetchPhase memory management
Problems and TODO (spotted so far):
SearchContext.checkRealMemoryCBdoesn't account for CB memory (always zero) - this just triggers Real Memory CB, but it's not enough apparentlyFetchPhase.buildSearchHitsbatches are too small (eg. in TopHits), the memory buffer never accumulates enough to be trackedTopHitsAggregatormemory management lifecycleplumbInnerHitsPhasememory management lifecycleplumbSearchService.execute*Phase()memory management lifecycleTopHitsAggregator.subSearchContext.closeFuturegrows too much - this is due to this block, so it's irrelevant in prod.Fixes: #136836