XContentBuilder.ensureNoSelfReferences has an impact on performance #26907

jgq2008303393 · 2017-10-06T07:01:22Z

Hi, all.
When using pipeline to parse log, I found that XContentBuilder.ensureNoSelfReferences has an impact on performance. This method cost about 17.36% of the total CPU resource.

My ES version is 5.5.2. The jstack result is as follows:

elasticsearch[1505480279000001809][bulk][T#19]" #255 daemon prio=5 os_prio=0 tid=0x00007fdd6c012800 nid=0xa86 runnable [0x00007fe823238000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.System.identityHashCode(Native Method)
        at java.util.IdentityHashMap.hash(IdentityHashMap.java:294)
        at java.util.IdentityHashMap.put(IdentityHashMap.java:425)
        at java.util.Collections$SetFromMap.add(Collections.java:5461)
        at org.elasticsearch.common.xcontent.XContentBuilder.ensureNoSelfReferences(XContentBuilder.java:1088)
        at org.elasticsearch.common.xcontent.XContentBuilder.ensureNoSelfReferences(XContentBuilder.java:1071)
        at org.elasticsearch.common.xcontent.XContentBuilder.map(XContentBuilder.java:872)
        at org.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:375)
        at org.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:364)
        at org.elasticsearch.ingest.PipelineExecutionService.innerExecute(PipelineExecutionService.java:179)
        at org.elasticsearch.ingest.PipelineExecutionService.access$000(PipelineExecutionService.java:41)
        at org.elasticsearch.ingest.PipelineExecutionService$2.doRun(PipelineExecutionService.java:88)
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

The text was updated successfully, but these errors were encountered:

jpountz · 2017-10-06T08:02:45Z

For the record, this sanity check is important important so that poison documents cannot fail an entire cluster.

However maybe there are ways that we could speed it up a bit, eg. by

not doing the check again for sub maps/lists
doing a preliminary test of whether the depth of the structure is >= 5 (for instance, any reasonable number would work) and only actually accumulate ancestors in a hash table if the map/list is deeper than this threshold

jasontedor · 2017-10-06T13:10:25Z

It's interesting that System#identityHashCode is at the top of the stack here. I want to make sure this is not an artifact of JVM behavior (for example, we've seen issues in the past where users have set the JVM code cache too small and System#identityHashCode ended up showing up as hot). Can you confirm the JVM flags that you're running with? These will show up as a log line when you start Elasticsearch:

[2017-10-06T08:43:08,372][INFO ][o.e.n.Node               ] JVM arguments [...]

jgq2008303393 · 2017-10-07T01:46:46Z

I only change -Xms and -Xmx, the full JVM arguments is as follows:

[2017-10-06T09:53:33,308][INFO ][o.e.n.Node               ] [1505480279000001809] JVM arguments [-Xms30720m, -Xmx30720m, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/data1/containers/1505480279000001809/es]

jgq2008303393 · 2017-10-07T01:47:48Z

I think the reason is the Number of executions: number of documents * number of fields.

Maybe the ensureNoSelfReferences method can be overloaded with a check switch, and some internal calls can skip the check?

ywelsch · 2017-10-09T12:12:20Z

Instead of explicitly checking for self references, it might be simpler/faster to track the current stack depth of the structure that's being used to populate the xcontentbuilder and just check that it does not pass a certain threshold (e.g. 100). This could be done for example by overloading some of the methods with an internal method that takes an additional "stack depth" parameter (map(...), unknownValue(...),...)

jasontedor · 2017-10-13T13:53:47Z

Thanks for confirming @jgq2008303393, it's unlikely to be the source of the problem here then. We will need some benchmarks to validate any of the suggested approaches.

Currently we test all maps, arrays or iterables. However, in the case that maps contain sub maps for instance, we will test the sub maps again even though the work has already been done for the top-level map. Relates elastic#26907

Currently we test all maps, arrays or iterables. However, in the case that maps contain sub maps for instance, we will test the sub maps again even though the work has already been done for the top-level map. Relates #26907

elasticmachine · 2018-04-24T09:16:39Z

Pinging @elastic/es-core-infra

elasticmachine · 2020-12-14T21:08:22Z

Pinging @elastic/es-perf (Team:Performance)

DJRickyB · 2022-07-13T19:39:20Z

calling this completed via #87337

jpountz added :Internal discuss >enhancement labels Oct 6, 2017

jpountz mentioned this issue Oct 6, 2017

Add a track that uses ingest elastic/rally-tracks#33

Closed

jpountz mentioned this issue Oct 9, 2017

Avoid doing redundant work when checking for self references. #26927

Merged

clintongormley removed the discuss label Oct 13, 2017

clintongormley removed the :Internal label Feb 14, 2018

colings86 added the :Core/Infra/Core Core issues without another label label Apr 24, 2018

jasontedor assigned dliappis Nov 15, 2018

rjernst added the Team:Core/Infra Meta label for core/infra team label May 4, 2020

rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020

jaymode added :Performance All issues related to Elasticsearch performance including regressions and investigations and removed needs:triage Requires assignment of a team area label labels Dec 14, 2020

elasticmachine added the Team:Performance Meta label for performance team label Dec 14, 2020

dliappis removed their assignment Apr 12, 2021

dliappis mentioned this issue Oct 7, 2021

Speed up toXContent Collection Serialization in some Spots #78742

Merged

DJRickyB closed this as completed Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XContentBuilder.ensureNoSelfReferences has an impact on performance #26907

XContentBuilder.ensureNoSelfReferences has an impact on performance #26907

jgq2008303393 commented Oct 6, 2017 •

edited

Loading

jpountz commented Oct 6, 2017

jasontedor commented Oct 6, 2017

jgq2008303393 commented Oct 7, 2017 •

edited

Loading

jgq2008303393 commented Oct 7, 2017 •

edited

Loading

ywelsch commented Oct 9, 2017

jasontedor commented Oct 13, 2017

elasticmachine commented Apr 24, 2018

elasticmachine commented Dec 14, 2020

DJRickyB commented Jul 13, 2022

XContentBuilder.ensureNoSelfReferences has an impact on performance #26907

XContentBuilder.ensureNoSelfReferences has an impact on performance #26907

Comments

jgq2008303393 commented Oct 6, 2017 • edited Loading

jpountz commented Oct 6, 2017

jasontedor commented Oct 6, 2017

jgq2008303393 commented Oct 7, 2017 • edited Loading

jgq2008303393 commented Oct 7, 2017 • edited Loading

ywelsch commented Oct 9, 2017

jasontedor commented Oct 13, 2017

elasticmachine commented Apr 24, 2018

elasticmachine commented Dec 14, 2020

DJRickyB commented Jul 13, 2022

jgq2008303393 commented Oct 6, 2017 •

edited

Loading

jgq2008303393 commented Oct 7, 2017 •

edited

Loading

jgq2008303393 commented Oct 7, 2017 •

edited

Loading