Fix SearchContext CB memory accounting #138002

luigidellaquila · 2025-11-13T09:06:21Z

Fixing FetchPhase memory management

Problems and TODO (spotted so far):

SearchContext.checkRealMemoryCB doesn't account for CB memory (always zero) - this just triggers Real Memory CB, but it's not enough apparently
sometimes FetchPhase.buildSearchHits batches are too small (eg. in TopHits), the memory buffer never accumulates enough to be tracked
We don't release CB memory, we only rely on real memory CB
plumb TopHitsAggregator memory management lifecycle
Add tests triggering CB
~~plumb InnerHitsPhase memory management lifecycle~~
~~plumb SearchService.execute*Phase() memory management lifecycle~~
Fall back to try/finally logic for scenarios that don't have a clear CB lifecycle
TopHitsAggregator.subSearchContext.closeFuture grows too much - this is due to this block, so it's irrelevant in prod.

luigidellaquila · 2025-11-14T08:47:20Z

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

                }
-                if (context.checkRealMemoryCB(locallyAccumulatedBytes[0], "fetch source")) {
-                    // if we checked the real memory breaker, we restart our local accounting
-                    locallyAccumulatedBytes[0] = 0;


Most of the time these batches were too small, so this didn't trigger.

server/src/main/java/org/elasticsearch/search/SearchService.java

luigidellaquila · 2025-11-14T08:55:47Z

server/src/main/java/org/elasticsearch/search/aggregations/metrics/TopHitsAggregator.java

            docIdsToLoad[i] = topDocs.scoreDocs[i].doc;
        }
-        FetchSearchResult fetchResult = runFetchPhase(subSearchContext, docIdsToLoad);
+        FetchSearchResult fetchResult = runFetchPhase(subSearchContext, docIdsToLoad, this::addRequestCircuitBreakerBytes);


Should we batch here and avoid invoking the CB for every document?
Maybe addRequestCircuitBreakerBytes should take care of this?

I suspect that fetching source is way more expensive than invoking the CB, so I'm not sure we want more complication here.

Maybe worth checking an esrally benchmark here?

I'll run some aggs nightlies and see what happens

I ran nyc-taxis track, and I see no slowdown (actually it seems faster).
Now I'm running NOAA, that contains more aggs

elasticsearchmachine · 2025-11-14T08:57:29Z

Hi @luigidellaquila, I've created a changelog YAML for you.

elasticsearchmachine · 2025-11-14T08:57:29Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

luigidellaquila · 2025-11-14T09:00:50Z

server/src/main/java/org/elasticsearch/search/internal/SearchContext.java

    public final boolean checkRealMemoryCB(int locallyAccumulatedBytes, String label) {
        if (locallyAccumulatedBytes >= memAccountingBufferSize()) {
-            circuitBreaker().addEstimateBytesAndMaybeBreak(0, label);
+            circuitBreaker().addEstimateBytesAndMaybeBreak(locallyAccumulatedBytes, label);


This was the crux

The checkRealMemoryCB() functions usually add 0 explicitly to force a memory check; are we sure we want to add memory here?
My major questions here would be:

Are we freeing it later?

Should we rename the method then?

This conditionally adds the memory depending on the value, which may lead to wrongly freeing memory later (?)

I just realized that nobody is using this method anymore (apart from a unit test).

The checkRealMemoryCB() functions usually add 0 explicitly to force a memory check; are we sure we want to add memory here?

Maybe this was the intention at the beginning, but apparently it was not so effective, since in my tests the CB was always empty. And if I remember well, it was a ChildMemoryCircuitBreaker

Are we freeing it later?

With this fix I'm delegating the memory accounting to AggregatorBase, that handles tracking and releasing CB memory, so we should be safe.

Should we rename the method then?

I think we can just delete it

This conditionally adds the memory depending on the value, which may lead to wrongly freeing memory later

I guess, since the CB memory management is delegated to AggregatorBase, we should be safe

Let's be conservative here, I'll make the memoryChecker nullable, and in case I'll keep trying to triggering the real memory CB

elasticsearchmachine · 2025-11-17T09:00:58Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

luigidellaquila · 2025-11-20T09:27:07Z

...c/internalClusterTest/java/org/elasticsearch/search/aggregations/metrics/LargeTopHitsIT.java

+    // This works most of the time, but it's not consistent: it still triggers OOM sometimes.
+    // The test env is too small and non-deterministic to hold all these data and results.
+    @AwaitsFix(bugUrl = "see comment above")
+    public void testBreakAndRecover() throws IOException {


I'd really like to have this test working, but I couldn't find a way to make it pass deterministically; depending on the system conditions sometimes it OOMs before CB, and if I reduce the memory further, it doesn't CB anymore.

We have unit tests for FetchPhase CB, but having an integration test would be really really good.

I don't know if we should keep this here or delete it

FWIW I had a similar problem with the aggs reduction phase memory test, which is now muted: #134667

Test works, but it fails sometimes. I tried tweaking it to have an exact amount of nodes of each kind, as well as docs, queyr limits and CB settings, and it improved, but still flaky.

Maybe you could try forcing a set amount of nodes, like in here:

elasticsearch/modules/aggregations/src/internalClusterTest/java/org/elasticsearch/aggregations/bucket/AggregationReductionCircuitBreakingIT.java

Line 44 in c6ddf5d

@ESIntegTestCase.ClusterScope(minNumDataNodes = 1, maxNumDataNodes = 2, numClientNodes = 1)

Less random, but less flaky (luckily 🤞 )

…e control

@drempapis

Thanks @drempapis

luigidellaquila · 2025-11-21T11:01:21Z

Thanks for the contribution and for the off-line conversation @drempapis
If you have a chance, please have a look and see if it covers your expectations.
Unfortunately I can't rely on the try/finally logic in the Aggs scenarios, since the batches are too small and I need to keep the memory accounted for the whole duration of the aggregation, that could span hundreds (sometimes thousands) of FetchPhase executions

luigidellaquila · 2025-11-21T11:02:04Z

server/src/main/java/org/elasticsearch/search/internal/SearchContext.java

+     * @return true if the circuit breaker is called and false otherwise
     */
-    public final boolean checkRealMemoryCB(int locallyAccumulatedBytes, String label) {
+    public final boolean checkCircuitBreaker(int locallyAccumulatedBytes, String label) {


This is no longer RealMemory, but rather normal (Request) CB

Just in case, is there something else using this, that could not be freeing the accounted memory?

That's the only usage for now

We should add to the Javadoc that callers are responsible for decrementing the breaker by the same amount at some point.

ivancea

LGTM! I would wait for another review if possible, apart of the benchmark, if it makes sense

ivancea · 2025-11-21T12:34:44Z

server/src/main/java/org/elasticsearch/search/internal/SearchContext.java

+     * @return true if the circuit breaker is called and false otherwise
     */
-    public final boolean checkRealMemoryCB(int locallyAccumulatedBytes, String label) {
+    public final boolean checkCircuitBreaker(int locallyAccumulatedBytes, String label) {


Just in case, is there something else using this, that could not be freeing the accounted memory?

ivancea · 2025-11-21T12:38:47Z

server/src/main/java/org/elasticsearch/search/aggregations/metrics/TopHitsAggregator.java

            docIdsToLoad[i] = topDocs.scoreDocs[i].doc;
        }
-        FetchSearchResult fetchResult = runFetchPhase(subSearchContext, docIdsToLoad);
+        FetchSearchResult fetchResult = runFetchPhase(subSearchContext, docIdsToLoad, this::addRequestCircuitBreakerBytes);


Maybe worth checking an esrally benchmark here?

ivancea · 2025-11-21T12:44:40Z

...c/internalClusterTest/java/org/elasticsearch/search/aggregations/metrics/LargeTopHitsIT.java

+    // This works most of the time, but it's not consistent: it still triggers OOM sometimes.
+    // The test env is too small and non-deterministic to hold all these data and results.
+    @AwaitsFix(bugUrl = "see comment above")
+    public void testBreakAndRecover() throws IOException {


FWIW I had a similar problem with the aggs reduction phase memory test, which is now muted: #134667

Test works, but it fails sometimes. I tried tweaking it to have an exact amount of nodes of each kind, as well as docs, queyr limits and CB settings, and it improved, but still flaky.

Maybe you could try forcing a set amount of nodes, like in here:

elasticsearch/modules/aggregations/src/internalClusterTest/java/org/elasticsearch/aggregations/bucket/AggregationReductionCircuitBreakingIT.java

Line 44 in c6ddf5d

@ESIntegTestCase.ClusterScope(minNumDataNodes = 1, maxNumDataNodes = 2, numClientNodes = 1)

Less random, but less flaky (luckily 🤞 )

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

….java Co-authored-by: Iván Cea Fontenla <ivancea96@outlook.com>

luigidellaquila · 2025-11-21T13:00:07Z

Buildkite benchmark this with nyc_taxis-1n-8g please

luigidellaquila · 2025-11-24T08:50:58Z

Buildkite benchmark this with wikipedia please

luigidellaquila · 2025-11-24T08:53:48Z

Buildkite benchmark this with noaa-1n-1g please

elasticmachine · 2025-11-24T08:55:29Z

💚 Build Succeeded

Buildkite Build
Commit: 51ea8a6
Baseline: d9e286a (env ID e612a468-5989-41a5-b11b-4cc84c864bfa)
Contender: 51ea8a6 (env ID 44d428e1-8b28-47ab-bbb0-33f0ea642565)
Benchmark results

This build ran two noaa-1n-1g benchmarks to evaluate performance impact of this PR.

History

💚 Build #54 succeeded 51ea8a6
💚 Build #52 succeeded 51ea8a6

drempapis · 2025-11-24T09:37:49Z

server/src/main/java/org/elasticsearch/search/fetch/FetchPhaseDocsIterator.java


+    private long requestBreakerBytes;
+
+    public void addRequestBreakerBytes(long delta) {


It’d be good to document that this field is strictly for fetch-phase memory accounting, will be reversed at the end of the FetchPhase, and shouldn’t be accessed or modified by subclasses. (if any added in the future)

drempapis · 2025-11-24T10:03:29Z

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

                    BytesReference sourceRef = hit.hit().getSourceRef();
                    if (sourceRef != null) {
-                        locallyAccumulatedBytes[0] += sourceRef.length();
+                        // This is an empirical value that seems to work well.


To simplify the logic, we could create an IntConsumer once when constructing the FetchPhaseDocsIterator

IntConsumer checker = memoryChecker != null ? memoryChecker : bytes -> { locallyAccumulatedBytes[0] += bytes; if (context.checkCircuitBreaker(locallyAccumulatedBytes[0], "fetch source")) { addRequestBreakerBytes(locallyAccumulatedBytes[0]); locallyAccumulatedBytes[0] = 0; } };

and simply call

if (sourceRef != null) { checker.accept(sourceRef.length() * 2); }

We should double check if that works

Thanks @drempapis, I'll give it a try

I made this change and yes, the code looks much better now 👍

luigidellaquila · 2025-11-24T12:27:52Z

Thanks for the reviews

@ivancea the benchmarks didn't show any significant changes in performance.

@drempapis please let me know if you have any further comments, otherwise I'll more forward and merge.

Thanks!

drempapis

LGTM! Thank you @luigidellaquila

andreidan · 2025-12-05T10:00:01Z

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

+        } finally {
+            long bytes = docsIterator.getRequestBreakerBytes();
+            if (bytes > 0L) {
+                context.circuitBreaker().addWithoutBreaking(-bytes);


@luigidellaquila @drempapis Sorry for the late comment here. Thanks for iterating on this.

I think we're releasing the bytes from the circuit breaker too soon here? The response has not been sent to the client yet so these hits are still in memory.

@andreidan I agree, we should do better here.
IMHO the right approach would be to manage the CB allocation/release together with the request lifecycle. That's what I did for Aggs, using a memoryChecker that was already part of the aggregation memory accounting, but for Search I couldn't find anything similar. I don't know that part of the codebase in depth though, so I could have missed something obvious.

@andreidan, we missed it!
I’ve started an alternative here: #139124. It’s WIP and expected to take more time.

I’ll work on it in parallel and make it a priority, given the complexity of the alternative.

Fix SearchContext CB memory accounting

19ae0a7

elasticsearchmachine added the v9.3.0 label Nov 13, 2025

luigidellaquila added 4 commits November 13, 2025 12:15

Track TopHitsAggregator memory

15e8e90

Fix test

4b32644

cleanup

0bb77a0

Tests and better accounting

7b40abe

luigidellaquila commented Nov 14, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/search/SearchService.java Outdated Show resolved Hide resolved

luigidellaquila commented Nov 14, 2025

View reviewed changes

luigidellaquila marked this pull request as ready for review November 14, 2025 08:56

luigidellaquila added >bug :Analytics/Aggregations Aggregations labels Nov 14, 2025

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 14, 2025

Update docs/changelog/138002.yaml

9ed0188

luigidellaquila requested a review from nik9000 November 14, 2025 08:58

luigidellaquila commented Nov 14, 2025

View reviewed changes

luigidellaquila added the :Search Foundations/Search Catch all for Search Foundations label Nov 17, 2025

elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Nov 17, 2025

luigidellaquila added 2 commits November 17, 2025 16:52

Merge branch 'main' into fetchphase_cb

38e68f0

Merge branch 'main' into fetchphase_cb

4103834

luigidellaquila commented Nov 20, 2025

View reviewed changes

luigidellaquila added 3 commits November 20, 2025 15:12

Preserve real memory CB for cases where we don't have memory lifecycl…

b36faf0

…e control

Merge branch 'main' into fetchphase_cb

47c45c3

Fall back to try/finally logic when memoryChecker is not available

ddfbaf0

Thanks @drempapis

luigidellaquila requested a review from drempapis November 21, 2025 10:59

luigidellaquila commented Nov 21, 2025

View reviewed changes

ivancea approved these changes Nov 21, 2025

View reviewed changes

Update server/src/main/java/org/elasticsearch/search/fetch/FetchPhase…

51ea8a6

….java Co-authored-by: Iván Cea Fontenla <ivancea96@outlook.com>

drempapis reviewed Nov 24, 2025

View reviewed changes

Comment

c15fa9f

drempapis reviewed Nov 24, 2025

View reviewed changes

refactor

801d6d0

drempapis approved these changes Nov 24, 2025

View reviewed changes

Better javadoc

221f947

luigidellaquila enabled auto-merge (squash) November 24, 2025 13:05

luigidellaquila added 2 commits November 24, 2025 14:26

Merge branch 'main' into fetchphase_cb

93775d8

Merge branch 'main' into fetchphase_cb

bbf097a

luigidellaquila merged commit d2b4355 into elastic:main Nov 24, 2025
34 checks passed

afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Nov 26, 2025

Fix SearchContext CB memory accounting (elastic#138002)

23bf992

ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Nov 26, 2025

Fix SearchContext CB memory accounting (elastic#138002)

0bc67e2

andreidan reviewed Dec 5, 2025

View reviewed changes


		private long requestBreakerBytes;

		public void addRequestBreakerBytes(long delta) {

Fix SearchContext CB memory accounting #138002

Fix SearchContext CB memory accounting #138002

Uh oh!

Conversation

luigidellaquila commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luigidellaquila Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

luigidellaquila Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Nov 14, 2025

Uh oh!

elasticsearchmachine commented Nov 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Nov 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luigidellaquila commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivancea left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

luigidellaquila commented Nov 21, 2025

Uh oh!

luigidellaquila commented Nov 24, 2025

Uh oh!

luigidellaquila commented Nov 24, 2025

Uh oh!

elasticmachine commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💚 Build Succeeded

History

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luigidellaquila commented Nov 13, 2025 •

edited

Loading

luigidellaquila Nov 14, 2025 •

edited

Loading

luigidellaquila Nov 14, 2025 •

edited

Loading

luigidellaquila commented Nov 21, 2025 •

edited

Loading

elasticmachine commented Nov 24, 2025 •

edited

Loading