Speed up ReplicationTracker Logic on Data Nodes a Little #79837

original-brownbear · 2021-10-26T15:38:55Z

This stuff shows up a little in profiling when the data-node holds a lot of shards.
Mainly just avoiding extra loops in calculateReplicationGroup (biggest cost in profiling), allocating streams to iterate 2 steps and some other indirection that starts to hurt if executed over thousands of shards.
Also, we were submitting a lot of empty tasks to the generic pool at times, which temporarily showed up in profiling.

I think we can make bigger improvements here and run this logic more selectively, it seems a non-trivial number of noops are executed here.

relates #77466

This stuff shows up a little in profiling when the datanode holds a lot of shards.

elasticmachine · 2021-10-26T15:38:59Z

Pinging @elastic/es-distributed (Team:Distributed)

henningandersen

Can you elaborate on the potential benefit of these changes? I think most of this is on what I would normally consider the "slow-is-ok" path, i.e., a place where code clarity normally trumps micro-optimizations.

The bit in PendingReplicationActions are easily correct and just as readable so those are good to go separately I think.

henningandersen · 2021-10-27T05:10:42Z

server/src/main/java/org/elasticsearch/index/shard/ReplicationGroup.java

        this.routingTable = routingTable;
        this.inSyncAllocationIds = inSyncAllocationIds;
        this.trackedAllocationIds = trackedAllocationIds;
        this.version = version;

-        this.unavailableInSyncShards = Sets.difference(inSyncAllocationIds, routingTable.getAllAllocationIds());


I wonder if the overhead here is really noticeable, since the inSyncAllocationIds would typically be just 2-3 entries?

This is the profiling for a data node where nothing really changes in terms of shards at the point of taking the profile (hot lots of CS updates at the time of creating the profile though):

The code adjusted here is in aggregate as expensive as building RoutingNodes which I found surprisingly expensive given that routing nodes are pretty expensive here in a cluster with close to 100k shards. I highlighted the relative importance of this particular line.
I think the issue here is mainly with how badly all this high-level stream stuff inlines. It's also particularly unfortunate for 2-3 entry sets because setting up the stream etc. takes more time than just doing the iteration/hashing/equals checks and also adds more CPU cycles wasted on the GC end of things.

About the specific line, perhaps we could optimize Sets.difference by not using streams?

Other than that, a big contribution here seems to be the synchronized block (which may admittedly improve once some of the inner workings is reduced). Can you share the full flame graph too?

I think we can make bigger improvements here and run this logic more selectively, it seems a non-trivial number of noops are executed here.

I would be interested in this, I think you mean to shortcut this logic at a higher level? If we do that, I think this PR becomes obsolete and perhaps it is worth looking into this first?

The full profile is this:

(sorry could find a way to upload the html here for the life of me, GH doesn't even allow a zip of it, happy to provide it on another channel).

This is not a node under heavy load but it's quite jittery in terms of how long each individual CS application takes which I blame on this code at least in part. Since the sync blocks also have some contention with other functionality I think we should be careful about the performance here because side-effects of slowness aren't entirely trivial to predict.

I would be interested in this, I think you mean to shortcut this logic at a higher level? If we do that, I think this PR becomes obsolete and perhaps it is worth looking into this first?

This I could code up in 20 minutes and it at least eliminates a little of the data node jitter in CS application :) (note that the code here is about 25% of the total CS applier time) Fixing this at a higher level will be trickier, at least it seemed that way when I tried. I agree that it makes fixes here less relevant though I still think it's nice to not have any obviously inefficient code in sync blocks the the applier thread has to enter.

About the specific line, perhaps we could optimize Sets.difference by not using streams?

We could do that. This is the only hot user of that code that I could find in what we are currently benchmarking though and this use case is very specific due to the tiny sets involved that make the current implementation so unexpectedly inefficient.

...er/src/main/java/org/elasticsearch/action/support/replication/PendingReplicationActions.java

original-brownbear · 2021-10-27T13:25:52Z

Jenkins run elasticsearch-ci/part-2 (known issue)

original-brownbear · 2021-10-27T18:17:26Z

@elasticmachine update branch

original-brownbear · 2021-10-28T09:37:55Z

@elasticmachine update branch

original-brownbear · 2021-11-24T16:18:51Z

@elasticmachine update branch

original-brownbear · 2021-11-24T16:44:31Z

Jenkins run elasticsearch-ci/part-2 (known issue)

…slow-routing-sync

original-brownbear · 2022-03-02T21:31:03Z

@elasticmachine update branch

tlrx · 2022-08-22T09:14:16Z

@original-brownbear do you think this is worth pursuing? This pull request is opened since October.

original-brownbear · 2022-08-22T10:10:43Z

@tlrx yea I'd still love a review from @henningandersen here ;) this one would still save tons of cycles on large data nodes.

Speed up ReplicationTracker Logic on Data Nodes a Little

ffa06d6

This stuff shows up a little in profiling when the datanode holds a lot of shards.

original-brownbear added >non-issue :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.0.0 labels Oct 26, 2021

elasticmachine added the Team:Distributed Meta label for distributed team label Oct 26, 2021

henningandersen reviewed Oct 27, 2021

View reviewed changes

original-brownbear added 2 commits October 27, 2021 14:38

Merge remote-tracking branch 'elastic/master' into fix-slow-routing-sync

f075139

shorter stream logic

b7f784d

original-brownbear requested a review from henningandersen October 27, 2021 14:31

Merge remote-tracking branch 'elastic/master' into fix-slow-routing-sync

6e71c95

Merge branch 'master' into fix-slow-routing-sync

cba7f7b

Merge branch 'master' into fix-slow-routing-sync

1a1df31

Merge branch 'master' into fix-slow-routing-sync

311e7fe

original-brownbear mentioned this pull request Dec 27, 2021

Fix PendingReplicationActions Submitting lots of NOOP Tasks to GENERIC #82092

Merged

original-brownbear added 3 commits December 29, 2021 10:45

Merge remote-tracking branch 'elastic/master' into fix-slow-routing-sync

516c02d

Merge remote-tracking branch 'origin/fix-slow-routing-sync' into fix-…

e00ed3d

…slow-routing-sync

format

dd19d80

arteam added v8.1.0 and removed v8.0.0 labels Jan 12, 2022

mark-vieira added v8.2.0 and removed v8.1.0 labels Feb 2, 2022

Merge remote-tracking branch 'elastic/master' into fix-slow-routing-sync

b71d841

Merge branch 'master' into fix-slow-routing-sync

4faba49

craigtaverner added the v8.4.0 label May 25, 2022

original-brownbear added 3 commits June 24, 2022 12:39

Merge remote-tracking branch 'elastic/master' into fix-slow-routing-sync

c1da6ba

Merge remote-tracking branch 'elastic/master' into fix-slow-routing-sync

1995009

Merge remote-tracking branch 'elastic/master' into fix-slow-routing-sync

0d9972a

elasticsearchmachine changed the base branch from master to main July 22, 2022 23:10

mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022

Merge remote-tracking branch 'elastic/main' into fix-slow-routing-sync

1d697df

csoulios added v8.6.0 and removed v8.5.0 labels Sep 21, 2022

kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022

rjernst added v8.8.0 and removed v8.7.0 labels Feb 8, 2023

gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023

pugnascotia added v8.10.0 and removed v8.9.0 labels Jun 22, 2023

quux00 added v8.11.0 and removed v8.10.0 labels Aug 16, 2023

mattc58 added v8.12.0 and removed v8.11.0 labels Oct 4, 2023

brianseeders added v8.13.0 and removed v8.12.0 labels Dec 6, 2023

elasticsearchmachine added v8.14.0 and removed v8.13.0 labels Feb 14, 2024

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up ReplicationTracker Logic on Data Nodes a Little #79837

Speed up ReplicationTracker Logic on Data Nodes a Little #79837

original-brownbear commented Oct 26, 2021

elasticmachine commented Oct 26, 2021

henningandersen left a comment

henningandersen Oct 27, 2021

original-brownbear Oct 27, 2021

henningandersen Oct 28, 2021

original-brownbear Oct 28, 2021

original-brownbear commented Oct 27, 2021

original-brownbear commented Oct 27, 2021

original-brownbear commented Oct 28, 2021

original-brownbear commented Nov 24, 2021

original-brownbear commented Nov 24, 2021

original-brownbear commented Mar 2, 2022

tlrx commented Aug 22, 2022

original-brownbear commented Aug 22, 2022

Speed up ReplicationTracker Logic on Data Nodes a Little #79837

Are you sure you want to change the base?

Speed up ReplicationTracker Logic on Data Nodes a Little #79837

Conversation

original-brownbear commented Oct 26, 2021

elasticmachine commented Oct 26, 2021

henningandersen left a comment

Choose a reason for hiding this comment

henningandersen Oct 27, 2021

Choose a reason for hiding this comment

original-brownbear Oct 27, 2021

Choose a reason for hiding this comment

henningandersen Oct 28, 2021

Choose a reason for hiding this comment

original-brownbear Oct 28, 2021

Choose a reason for hiding this comment

original-brownbear commented Oct 27, 2021

original-brownbear commented Oct 27, 2021

original-brownbear commented Oct 28, 2021

original-brownbear commented Nov 24, 2021

original-brownbear commented Nov 24, 2021

original-brownbear commented Mar 2, 2022

tlrx commented Aug 22, 2022

original-brownbear commented Aug 22, 2022