CollectionUtils#ensureNoSelfReferences creates too much allocation. #49277

wenhoujx · 2019-11-18T23:44:26Z

Describe the feature:

Elasticsearch version (bin/elasticsearch --version):

6.8.3

Plugins installed: []

JVM version (java -version):
java 11

OS version (uname -a if on a Unix-like system):
Linux il-pg-alpha-4421120.use1.palantir.global 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
while using scripted metrics aggregation, i noticed this method is creating lots of allocations. My script only has one parameter, but this method allocated a default sized IdentityHashMap(), which creates a new Object[64] regardless, this multiples quicks with the number of parent buckets.

I wonder if we can call new IdentityHashMap(4) or come up with a better way to check self-reference.

Steps to reproduce:
I don't think a repro is needed, it's right there in the code.

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

Provide logs (if relevant):

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-11-19T11:45:58Z

Pinging @elastic/es-core-infra (:Core/Infra/Scripting)

cbuescher · 2019-11-19T11:52:09Z

My script only has one parameter, but this method allocated a default sized IdentityHashMap(), which creates a new Object[64] regardless, this multiples quicks with the number of parent buckets.

Can you describe if this has a measureable impact for you in a real-world scenario? Coming up with good defaults here depends both on the size and the depth of the iterable here, and a value thats too small will lead to higher cost for resizing later?

wenhoujx · 2019-11-19T16:01:37Z

here is a JFR of a running ES cluster with heavy scripted metrics aggregation requests. You can see this method creates about 10g heap here in 30seconds and causing lot of GC pressure.

I don't think the iteration depth is the problem here, the problem is the sheer number of invocation that creates too many shallow instances of IdentityHashMap. The inefficiency also comes from that each leaf script-metric-agg validates this self-reference separately for the same scripts params. I imagine most real world cases would benefit from a conservative default, similar to how millions of users benefited when java change new ArrayList() to use a static empty array internally.

rjernst · 2019-11-19T23:27:35Z

the inefficiency also comes from that each leaf script-metric-agg validates this self-reference separately for the same scripts params

The self reference check is on the objects produced by each script, not the params to the scripts. In the scripted metric agg case, in the past the returned state was conflated into the params map, but was changed to the state variable last year. We no longer check params for self reference as of 7.0.

wenhoujx · 2019-11-20T04:21:29Z

@rjernst i named the wrong variable, i meant state, as far as i can tell, on the develop branch, we're still calling this method unnecessarily for every leaf scripted metric bucket: https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/search/aggregations/metrics/ScriptedMetricAggregatorFactory.java#L99
https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/search/aggregations/metrics/ScriptedMetricAggregator.java#L92
and https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/search/aggregations/metrics/ScriptedMetricAggregator.java#L107

i understand this method needs to be called to validate against self referencing collections/objects, but does it needs to be revalidated this many time at each leaf bucket?

thanks

wenhoujx · 2019-11-22T15:58:09Z

@rjernst any update on this?

rjernst · 2019-11-27T01:30:26Z

@wenhoujx state is essentially the return of the various metric agg scripts, and thus we need to validate it whenever we return from executing one of those scripts, as they may have modified it, and we don't want to try serializing or passing that state anywhere else that may try to iterate through the state and end in a stack overflow.

stu-elastic · 2019-12-11T21:07:48Z

We're calling ensureNoSelfReferences per segment since 6.4.3. The number of segments typically isn't very high so it's a reasonable scaling factor.

How many segments are you working with?

GET /_cat/segments/<index>

dliappis · 2020-02-03T16:43:57Z

@stu-elastic As this has been open >1 month pending feedback, do you think it's still worth keeping it open?

wenhoujx · 2020-02-03T20:33:51Z

@stu-elastic i am on es 6.8.x and reading from the code it seems the method is called for every leaf aggregator? As in you have to check for self-reference for the state in each bucket, am i misunderstanding it somehow?

stu-elastic · 2020-02-11T20:24:19Z

As in you have to check for self-reference for the state in each bucket, am i misunderstanding it somehow?

Per @rjernst's comment above #49277 (comment) we need to check whenever an object is returned or modified from a script to avoid a stack overflow.

it seems the method is called for every leaf aggregator

If you're referring to the check after executing the combineScript, this is necessary because we build the InternalScriptedMetric and need to send that result to the coordinating node. If there were self-references, that would infinite loop.

Next steps

We can keep a thread local IdentityHashMap in CollectionUtils.ensureNoSelfReferences and either clear() it or recreate the IdentityHashMap if the maps size() has grown too large.

wenhoujx · 2020-02-11T20:27:39Z

@stu-elastic kk, i think i understand the problem and why it's necessary to allocate this IdentityMap, i think a threadlocal map would be great to save a tons of allocation when there is lots of states.

rjernst · 2024-05-25T01:06:23Z

This has been open for quite a while, and we still believe the cost of ensureNoSelfReferences is worth it to ensure a stack overflow does not occur. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed.

cbuescher added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Nov 19, 2019

stu-elastic added the feedback_needed label Dec 11, 2019

dliappis removed the feedback_needed label Feb 7, 2020

rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020

rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020

stu-elastic removed the needs:triage Requires assignment of a team area label label Dec 9, 2020

stu-elastic mentioned this issue Apr 11, 2022

Reduce script processor overhead from safety check #85790

Closed

rjernst closed this as completed May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CollectionUtils#ensureNoSelfReferences creates too much allocation. #49277

CollectionUtils#ensureNoSelfReferences creates too much allocation. #49277

wenhoujx commented Nov 18, 2019

elasticmachine commented Nov 19, 2019

cbuescher commented Nov 19, 2019

wenhoujx commented Nov 19, 2019 •

edited

Loading

rjernst commented Nov 19, 2019

wenhoujx commented Nov 20, 2019

wenhoujx commented Nov 22, 2019

rjernst commented Nov 27, 2019

stu-elastic commented Dec 11, 2019

dliappis commented Feb 3, 2020

wenhoujx commented Feb 3, 2020

stu-elastic commented Feb 11, 2020

wenhoujx commented Feb 11, 2020 •

edited

Loading

rjernst commented May 25, 2024

CollectionUtils#ensureNoSelfReferences creates too much allocation. #49277

CollectionUtils#ensureNoSelfReferences creates too much allocation. #49277

Comments

wenhoujx commented Nov 18, 2019

elasticmachine commented Nov 19, 2019

cbuescher commented Nov 19, 2019

wenhoujx commented Nov 19, 2019 • edited Loading

rjernst commented Nov 19, 2019

wenhoujx commented Nov 20, 2019

wenhoujx commented Nov 22, 2019

rjernst commented Nov 27, 2019

stu-elastic commented Dec 11, 2019

dliappis commented Feb 3, 2020

wenhoujx commented Feb 3, 2020

stu-elastic commented Feb 11, 2020

wenhoujx commented Feb 11, 2020 • edited Loading

rjernst commented May 25, 2024

wenhoujx commented Nov 19, 2019 •

edited

Loading

wenhoujx commented Feb 11, 2020 •

edited

Loading