Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CollectionUtils#ensureNoSelfReferences creates too much allocation. #49277

Closed
wenhoujx opened this issue Nov 18, 2019 · 13 comments
Closed

CollectionUtils#ensureNoSelfReferences creates too much allocation. #49277

wenhoujx opened this issue Nov 18, 2019 · 13 comments
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache Team:Core/Infra Meta label for core/infra team

Comments

@wenhoujx
Copy link

Describe the feature:

Elasticsearch version (bin/elasticsearch --version):

6.8.3

Plugins installed: []

JVM version (java -version):
java 11

OS version (uname -a if on a Unix-like system):
Linux il-pg-alpha-4421120.use1.palantir.global 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
while using scripted metrics aggregation, i noticed this method is creating lots of allocations. My script only has one parameter, but this method allocated a default sized IdentityHashMap(), which creates a new Object[64] regardless, this multiples quicks with the number of parent buckets.

I wonder if we can call new IdentityHashMap(4) or come up with a better way to check self-reference.

Steps to reproduce:
I don't think a repro is needed, it's right there in the code.

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

Provide logs (if relevant):

@cbuescher cbuescher added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Nov 19, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Scripting)

@cbuescher
Copy link
Member

My script only has one parameter, but this method allocated a default sized IdentityHashMap(), which creates a new Object[64] regardless, this multiples quicks with the number of parent buckets.

Can you describe if this has a measureable impact for you in a real-world scenario? Coming up with good defaults here depends both on the size and the depth of the iterable here, and a value thats too small will lead to higher cost for resizing later?

@wenhoujx
Copy link
Author

wenhoujx commented Nov 19, 2019

image
here is a JFR of a running ES cluster with heavy scripted metrics aggregation requests. You can see this method creates about 10g heap here in 30seconds and causing lot of GC pressure.

I don't think the iteration depth is the problem here, the problem is the sheer number of invocation that creates too many shallow instances of IdentityHashMap. The inefficiency also comes from that each leaf script-metric-agg validates this self-reference separately for the same scripts params. I imagine most real world cases would benefit from a conservative default, similar to how millions of users benefited when java change new ArrayList() to use a static empty array internally.

@rjernst
Copy link
Member

rjernst commented Nov 19, 2019

the inefficiency also comes from that each leaf script-metric-agg validates this self-reference separately for the same scripts params

The self reference check is on the objects produced by each script, not the params to the scripts. In the scripted metric agg case, in the past the returned state was conflated into the params map, but was changed to the state variable last year. We no longer check params for self reference as of 7.0.

@wenhoujx
Copy link
Author

@rjernst i named the wrong variable, i meant state, as far as i can tell, on the develop branch, we're still calling this method unnecessarily for every leaf scripted metric bucket: https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/search/aggregations/metrics/ScriptedMetricAggregatorFactory.java#L99
https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/search/aggregations/metrics/ScriptedMetricAggregator.java#L92
and https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/search/aggregations/metrics/ScriptedMetricAggregator.java#L107

i understand this method needs to be called to validate against self referencing collections/objects, but does it needs to be revalidated this many time at each leaf bucket?

thanks

@wenhoujx
Copy link
Author

@rjernst any update on this?

@rjernst
Copy link
Member

rjernst commented Nov 27, 2019

@wenhoujx state is essentially the return of the various metric agg scripts, and thus we need to validate it whenever we return from executing one of those scripts, as they may have modified it, and we don't want to try serializing or passing that state anywhere else that may try to iterate through the state and end in a stack overflow.

@stu-elastic
Copy link
Contributor

We're calling ensureNoSelfReferences per segment since 6.4.3. The number of segments typically isn't very high so it's a reasonable scaling factor.

How many segments are you working with?

GET /_cat/segments/<index>

@dliappis
Copy link
Contributor

dliappis commented Feb 3, 2020

@stu-elastic As this has been open >1 month pending feedback, do you think it's still worth keeping it open?

@wenhoujx
Copy link
Author

wenhoujx commented Feb 3, 2020

@stu-elastic i am on es 6.8.x and reading from the code it seems the method is called for every leaf aggregator? As in you have to check for self-reference for the state in each bucket, am i misunderstanding it somehow?

@stu-elastic
Copy link
Contributor

As in you have to check for self-reference for the state in each bucket, am i misunderstanding it somehow?

Per @rjernst's comment above #49277 (comment) we need to check whenever an object is returned or modified from a script to avoid a stack overflow.

it seems the method is called for every leaf aggregator

If you're referring to the check after executing the combineScript, this is necessary because we build the InternalScriptedMetric and need to send that result to the coordinating node. If there were self-references, that would infinite loop.

Next steps

We can keep a thread local IdentityHashMap in CollectionUtils.ensureNoSelfReferences and either clear() it or recreate the IdentityHashMap if the maps size() has grown too large.

@wenhoujx
Copy link
Author

wenhoujx commented Feb 11, 2020

@stu-elastic kk, i think i understand the problem and why it's necessary to allocate this IdentityMap, i think a threadlocal map would be great to save a tons of allocation when there is lots of states.

@rjernst rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020
@rjernst rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020
@stu-elastic stu-elastic removed the needs:triage Requires assignment of a team area label label Dec 9, 2020
@rjernst
Copy link
Member

rjernst commented May 25, 2024

This has been open for quite a while, and we still believe the cost of ensureNoSelfReferences is worth it to ensure a stack overflow does not occur. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed.

@rjernst rjernst closed this as completed May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

No branches or pull requests

6 participants