Support task cancellation cross clusters #55779

dnhatn · 2020-04-27T01:39:46Z

Today task cancellation works in a single cluster. Canceling cross-cluster search requests will leave sub-tasks on the remote cluster untouched. This change implements task cancellation across multiple clusters.

elasticmachine · 2020-04-27T03:05:08Z

Pinging @elastic/es-distributed (:Distributed/Task Management)

server/src/main/java/org/elasticsearch/tasks/TaskManager.java

...ava/org/elasticsearch/action/admin/cluster/node/tasks/cancel/TransportCancelTasksAction.java

ywelsch

Thanks Nhat, I've left some comments to discuss. Overall looking very good already.

server/src/main/java/org/elasticsearch/transport/Transport.java

ywelsch · 2020-04-27T11:27:36Z

...ava/org/elasticsearch/action/admin/cluster/node/tasks/cancel/TransportCancelTasksAction.java

+            } else {
+                final ArrayList<DiscoveryNode> subNodes = new ArrayList<>(entry.getValue());
+                final DiscoveryNode targetNode = subNodes.remove(0);
+                if (targetNode.getVersion().onOrAfter(Version.V_8_0_0)) {


In case of "proxy mode" connections (see also ProxyConnectionStrategy), targetNode.getVersion() will always return Version.CURRENT.minimumCompatibilityVersion() AFAICS, which means that this request won't be send to those nodes.

The actual version of the node on the other side is available using channel.getVersion().
On the other hand channel.getNode will return the (possibly fake) DiscoveryNode object that was used to create the connection.

/cc: @tbrooks8 This is quite trappy, anything we can change in the transport in the short term to avoid this?

I've modified the handshaking to update the version of the target node with its actual version. I also exposed the proxy node so that we can check the version of the proxy node. I think we are good here.

...ava/org/elasticsearch/action/admin/cluster/node/tasks/cancel/TransportCancelTasksAction.java

server/src/main/java/org/elasticsearch/tasks/TaskManager.java

ywelsch

I've left two more comments for discussion

...ava/org/elasticsearch/action/admin/cluster/node/tasks/cancel/TransportCancelTasksAction.java

server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java

server/src/main/java/org/elasticsearch/tasks/TaskManager.java

dnhatn · 2020-05-27T15:29:17Z

@ywelsch @javanna Thanks for your reviews. I have reworked this PR due to the changes in #56620. Can you please take a look?

ywelsch

I've left one comment too discuss, before I go deeper into this. Also, I'm wondering if there's a way to test the versioned BWC logic (which is quite complex)

...alClusterTest/java/org/elasticsearch/action/admin/cluster/node/tasks/CancellableTasksIT.java

ywelsch · 2020-05-29T13:49:29Z

server/src/main/java/org/elasticsearch/tasks/TaskCancellationService.java

+    }
+
+    private static class HeartbeatRequest extends TransportRequest {
+        final String nodeId;


I wonder if node id is sufficient here, or whether we need to mention concrete tasks here. Assume that a remote cluster is disconnected during a request, and then reconnects pretty quickly with parent tasks on source cluster going away (due to failure of child requests). In that case, we keep the child task alive on the remote cluster (as long as new requests are being processed), even though there's no point in doing so. Another option here (instead of sending parent task ids to renew) might be to use some numbering scheme (periodically incrementing numbers which correspond to named leases that represent an abstract set of long-running requests)

Good point. I will rework this.

dnhatn · 2020-05-29T15:45:20Z

I'm wondering if there's a way to test the versioned BWC logic (which is quite complex).

+1. I will work on this infra first in a separate PR.

dnhatn · 2020-09-11T03:42:29Z

This PR is quite old. I am closing it and work on a new one.

Support task cancellation cross clusters

b1f93d8

dnhatn added WIP :Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. and removed WIP labels Apr 27, 2020

dnhatn added >enhancement v7.8.0 v8.0.0 labels Apr 27, 2020

dnhatn requested a review from ywelsch April 27, 2020 03:05

dnhatn commented Apr 27, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/tasks/TaskManager.java Outdated Show resolved Hide resolved

...ava/org/elasticsearch/action/admin/cluster/node/tasks/cancel/TransportCancelTasksAction.java Outdated Show resolved Hide resolved

dnhatn marked this pull request as ready for review April 27, 2020 03:08

ywelsch requested a review from Tim-Brooks April 27, 2020 10:20

ywelsch reviewed Apr 27, 2020

View reviewed changes

dnhatn added 4 commits April 27, 2020 13:00

do not batch ban requests

85081f5

BWC

617bc35

nullable

5316931

Merge branch 'master' into cross-cluster-cancellation

12264b7

ywelsch reviewed Apr 28, 2020

View reviewed changes

...ava/org/elasticsearch/action/admin/cluster/node/tasks/cancel/TransportCancelTasksAction.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java Outdated Show resolved Hide resolved

rjernst added the Team:Distributed Meta label for distributed team label May 4, 2020

pugnascotia added v7.8.1 v7.9.0 and removed v7.8.0 v7.8.1 labels May 6, 2020

javanna reviewed May 6, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java Outdated Show resolved Hide resolved

javanna reviewed May 6, 2020

View reviewed changes

server/src/main/java/org/elasticsearch/tasks/TaskManager.java Outdated Show resolved Hide resolved

Merge branch 'master' into cross-cluster-cancellation

8ed7ed0

This was referenced May 12, 2020

Introduce task heartbeat service #56619

Closed

Cancel task and descendants on channel disconnects #56620

Merged

dnhatn added 2 commits May 25, 2020 12:49

start over

5349d86

Merge branch 'master' into cross-cluster-cancellation-startover

340b43e

Integrate with heartbeat

7bc74a7

dnhatn requested review from javanna and ywelsch May 27, 2020 15:29

dnhatn added 2 commits May 27, 2020 11:46

Merge branch 'master' into cross-cluster-cancellation

ba5c6ac

fix test

b46cbe3

ywelsch reviewed May 29, 2020

View reviewed changes

dnhatn added 3 commits June 2, 2020 09:25

Merge branch 'master' into cross-cluster-cancellation

f66238f

heartbeat with task list

62274db

Merge branch 'master' into cross-cluster-cancellation

3dc38e7

pugnascotia added v7.10.0 and removed v7.9.0 labels Jul 15, 2020

Merge branch 'master' into cross-cluster-cancellation

bec72b8

dnhatn mentioned this pull request Jul 18, 2020

Add QA module to test task cancellation #59828

Closed

dnhatn closed this Sep 11, 2020

dnhatn deleted the cross-cluster-cancellation branch September 11, 2020 03:42

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support task cancellation cross clusters #55779

Support task cancellation cross clusters #55779

dnhatn commented Apr 27, 2020 •

edited

elasticmachine commented Apr 27, 2020

ywelsch left a comment

ywelsch Apr 27, 2020

dnhatn May 27, 2020

ywelsch left a comment

dnhatn commented May 27, 2020

ywelsch left a comment

ywelsch May 29, 2020

dnhatn May 29, 2020

dnhatn commented May 29, 2020

dnhatn commented Sep 11, 2020

Support task cancellation cross clusters #55779

Support task cancellation cross clusters #55779

Conversation

dnhatn commented Apr 27, 2020 • edited

elasticmachine commented Apr 27, 2020

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Apr 27, 2020

Choose a reason for hiding this comment

dnhatn May 27, 2020

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

dnhatn commented May 27, 2020

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch May 29, 2020

Choose a reason for hiding this comment

dnhatn May 29, 2020

Choose a reason for hiding this comment

dnhatn commented May 29, 2020

dnhatn commented Sep 11, 2020

dnhatn commented Apr 27, 2020 •

edited