Allow force-merges to run in parallel on a node #69416

ywelsch · 2021-02-23T09:33:27Z

Increasing the number of threads to be used for force-merging does not automatically give you any parallelism, even if
you have many shards per node, as force-merge requests are split into node-level subrequests (see
TransportBroadcastByNodeAction, superclass of TransportForceMergeAction), one for each node, and these
subrequests are then executing sequentially for all the shards on that node.

Closes #69327

elasticmachine · 2021-02-23T14:59:11Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner

Looks good, I left a few small comments.

DaveCTurner · 2021-02-23T15:43:35Z

...c/main/java/org/elasticsearch/action/admin/indices/forcemerge/TransportForceMergeAction.java

+                                  ActionListener<TransportBroadcastByNodeAction.EmptyResult> listener) {
+        threadPool.executor(ThreadPool.Names.FORCE_MERGE).execute(ActionRunnable.run(listener,
+            () -> {
+                if (task instanceof CancellableTask && ((CancellableTask)task).isCancelled()) {


task is never a CancellableTask here is it?

Not right now. Given that there was already a general integration of cancellation in the base class, and that with my change here it might introduce a pitfall for cancelling force-merges in case we ever activated cancellation at the level of BroadcastRequest (which is my plan), I decided to keep that here.

I think I'd prefer assert (task instanceof CancellableTask) == false for now, otherwise we'll forget to remove this if when a force-merge shard task becomes cancellable and then we'll wonder why it's here some years later.

...ugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/action/TransportForgetFollowerAction.java

DaveCTurner · 2021-02-23T15:50:19Z

...ain/java/org/elasticsearch/action/support/broadcast/node/TransportBroadcastByNodeAction.java

            List<BroadcastShardOperationFailedException> accumulatedExceptions = new ArrayList<>();
            List<ShardOperationResult> results = new ArrayList<>();
            for (int i = 0; i < totalShards; i++) {
-                if (shardResultOrExceptions[i] instanceof BroadcastShardOperationFailedException) {
-                    accumulatedExceptions.add((BroadcastShardOperationFailedException) shardResultOrExceptions[i]);
+                if (shardResultOrExceptions.get(i) instanceof TaskCancelledException) {


Why not just check the task for cancellation?

++, changed in b0cc8f3

DaveCTurner · 2021-02-23T15:55:06Z

...ain/java/org/elasticsearch/action/support/broadcast/node/TransportBroadcastByNodeAction.java

+                logger.trace("[{}]  executing operation for shard [{}]", actionName, shardRouting.shortSummary());
+            }
+            final Consumer<Exception> failureHandler = e -> {
+                if (e instanceof TaskCancelledException) {


If we just checked the task for cancellation in finishHim, I think we wouldn't need this special case handling: we could just treat it like any other shard-level exception.

++, fixed in b0cc8f3

...ain/java/org/elasticsearch/action/support/broadcast/node/TransportBroadcastByNodeAction.java

DaveCTurner

LGTM, I left one optional request

DaveCTurner · 2021-02-25T10:33:15Z

...c/main/java/org/elasticsearch/action/admin/indices/forcemerge/TransportForceMergeAction.java

+                                  ActionListener<TransportBroadcastByNodeAction.EmptyResult> listener) {
+        threadPool.executor(ThreadPool.Names.FORCE_MERGE).execute(ActionRunnable.run(listener,
+            () -> {
+                if (task instanceof CancellableTask && ((CancellableTask)task).isCancelled()) {


I think I'd prefer assert (task instanceof CancellableTask) == false for now, otherwise we'll forget to remove this if when a force-merge shard task becomes cancellable and then we'll wonder why it's here some years later.

Increasing the number of threads to be used for force-merging does not automatically give you any parallelism, even if you have many shards per node, as force-merge requests are split into node-level subrequests (see TransportBroadcastByNodeAction, superclass of TransportForceMergeAction), one for each node, and these subrequests are then executing sequentially for all the shards on that node.

Relates elastic#69416

Relates #69416

Allow force-merges to run in parallel on a node

bb637f9

ywelsch added >enhancement :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.13.0 >bug and removed >enhancement labels Feb 23, 2021

better task cancellation

84136e2

ywelsch marked this pull request as ready for review February 23, 2021 14:59

elasticmachine added the Team:Distributed Meta label for distributed team label Feb 23, 2021

ywelsch requested a review from DaveCTurner February 23, 2021 15:33

DaveCTurner reviewed Feb 23, 2021

View reviewed changes

review feedback

b0cc8f3

ywelsch requested a review from DaveCTurner February 23, 2021 17:22

DaveCTurner approved these changes Feb 25, 2021

View reviewed changes

ywelsch added 4 commits February 25, 2021 12:22

Assert about absence of cancellation instead

4af87e5

Merge remote-tracking branch 'elastic/master' into parallel-force-merge

b914940

no style

e4d85be

Merge remote-tracking branch 'elastic/master' into parallel-force-merge

35c4678

ywelsch merged commit e768605 into elastic:master Feb 25, 2021

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 1, 2021

Note that forcemerges now run in parallel in docs

3c0aca3

Relates elastic#69416

DaveCTurner mentioned this pull request Mar 1, 2021

Note that forcemerges now run in parallel in docs #69688

Merged

DaveCTurner added a commit that referenced this pull request Mar 1, 2021

Note that forcemerges now run in parallel in docs (#69688)

86b97ab

Relates #69416

DaveCTurner added a commit that referenced this pull request Mar 1, 2021

Note that forcemerges now run in parallel in docs (#69688)

1be5ac4

Relates #69416

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow force-merges to run in parallel on a node #69416

Allow force-merges to run in parallel on a node #69416

ywelsch commented Feb 23, 2021 •

edited

Loading

elasticmachine commented Feb 23, 2021

DaveCTurner left a comment

DaveCTurner Feb 23, 2021

ywelsch Feb 23, 2021

DaveCTurner Feb 25, 2021

DaveCTurner Feb 23, 2021

ywelsch Feb 23, 2021

DaveCTurner Feb 23, 2021

ywelsch Feb 23, 2021

DaveCTurner left a comment

DaveCTurner Feb 25, 2021

Allow force-merges to run in parallel on a node #69416

Allow force-merges to run in parallel on a node #69416

Conversation

ywelsch commented Feb 23, 2021 • edited Loading

elasticmachine commented Feb 23, 2021

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywelsch commented Feb 23, 2021 •

edited

Loading