Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XContentBuilder.ensureNoSelfReferences has an impact on performance #26907

Closed
jgq2008303393 opened this issue Oct 6, 2017 · 9 comments
Closed
Labels
:Core/Infra/Core Core issues without another label >enhancement :Performance All issues related to Elasticsearch performance including regressions and investigations Team:Core/Infra Meta label for core/infra team Team:Performance Meta label for performance team

Comments

@jgq2008303393
Copy link

jgq2008303393 commented Oct 6, 2017

Hi, all.
When using pipeline to parse log, I found that XContentBuilder.ensureNoSelfReferences has an impact on performance. This method cost about 17.36% of the total CPU resource.

My ES version is 5.5.2. The jstack result is as follows:
elasticsearch[1505480279000001809][bulk][T#19]" #255 daemon prio=5 os_prio=0 tid=0x00007fdd6c012800 nid=0xa86 runnable [0x00007fe823238000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.System.identityHashCode(Native Method)
        at java.util.IdentityHashMap.hash(IdentityHashMap.java:294)
        at java.util.IdentityHashMap.put(IdentityHashMap.java:425)
        at java.util.Collections$SetFromMap.add(Collections.java:5461)
        at org.elasticsearch.common.xcontent.XContentBuilder.ensureNoSelfReferences(XContentBuilder.java:1088)
        at org.elasticsearch.common.xcontent.XContentBuilder.ensureNoSelfReferences(XContentBuilder.java:1071)
        at org.elasticsearch.common.xcontent.XContentBuilder.map(XContentBuilder.java:872)
        at org.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:375)
        at org.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:364)
        at org.elasticsearch.ingest.PipelineExecutionService.innerExecute(PipelineExecutionService.java:179)
        at org.elasticsearch.ingest.PipelineExecutionService.access$000(PipelineExecutionService.java:41)
        at org.elasticsearch.ingest.PipelineExecutionService$2.doRun(PipelineExecutionService.java:88)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

@jpountz
Copy link
Contributor

jpountz commented Oct 6, 2017

For the record, this sanity check is important important so that poison documents cannot fail an entire cluster.

However maybe there are ways that we could speed it up a bit, eg. by

  • not doing the check again for sub maps/lists
  • doing a preliminary test of whether the depth of the structure is >= 5 (for instance, any reasonable number would work) and only actually accumulate ancestors in a hash table if the map/list is deeper than this threshold

@jasontedor
Copy link
Member

It's interesting that System#identityHashCode is at the top of the stack here. I want to make sure this is not an artifact of JVM behavior (for example, we've seen issues in the past where users have set the JVM code cache too small and System#identityHashCode ended up showing up as hot). Can you confirm the JVM flags that you're running with? These will show up as a log line when you start Elasticsearch:

[2017-10-06T08:43:08,372][INFO ][o.e.n.Node               ] JVM arguments [...]

@jgq2008303393
Copy link
Author

jgq2008303393 commented Oct 7, 2017

I only change -Xms and -Xmx, the full JVM arguments is as follows:

[2017-10-06T09:53:33,308][INFO ][o.e.n.Node               ] [1505480279000001809] JVM arguments [-Xms30720m, -Xmx30720m, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/data1/containers/1505480279000001809/es]

@jgq2008303393
Copy link
Author

jgq2008303393 commented Oct 7, 2017

I think the reason is the Number of executions: number of documents * number of fields.

Maybe the ensureNoSelfReferences method can be overloaded with a check switch, and some internal calls can skip the check?

@ywelsch
Copy link
Contributor

ywelsch commented Oct 9, 2017

Instead of explicitly checking for self references, it might be simpler/faster to track the current stack depth of the structure that's being used to populate the xcontentbuilder and just check that it does not pass a certain threshold (e.g. 100). This could be done for example by overloading some of the methods with an internal method that takes an additional "stack depth" parameter (map(...), unknownValue(...),...)

@jasontedor
Copy link
Member

Thanks for confirming @jgq2008303393, it's unlikely to be the source of the problem here then. We will need some benchmarks to validate any of the suggested approaches.

jpountz added a commit to jpountz/elasticsearch that referenced this issue Jan 15, 2018
Currently we test all maps, arrays or iterables. However, in the case that maps
contain sub maps for instance, we will test the sub maps again even though the
work has already been done for the top-level map.

Relates elastic#26907
jpountz added a commit that referenced this issue Jan 15, 2018
Currently we test all maps, arrays or iterables. However, in the case that maps
contain sub maps for instance, we will test the sub maps again even though the
work has already been done for the top-level map.

Relates #26907
jpountz added a commit that referenced this issue Jan 15, 2018
Currently we test all maps, arrays or iterables. However, in the case that maps
contain sub maps for instance, we will test the sub maps again even though the
work has already been done for the top-level map.

Relates #26907
@colings86 colings86 added the :Core/Infra/Core Core issues without another label label Apr 24, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@rjernst rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020
@rjernst rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020
@jaymode jaymode added :Performance All issues related to Elasticsearch performance including regressions and investigations and removed needs:triage Requires assignment of a team area label labels Dec 14, 2020
@elasticmachine elasticmachine added the Team:Performance Meta label for performance team label Dec 14, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-perf (Team:Performance)

@DJRickyB
Copy link
Contributor

calling this completed via #87337

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >enhancement :Performance All issues related to Elasticsearch performance including regressions and investigations Team:Core/Infra Meta label for core/infra team Team:Performance Meta label for performance team
Projects
None yet
Development

No branches or pull requests