Use general cluster state batching mechanism for shard failures #15016

jasontedor · 2015-11-25T13:59:03Z

This commit modifies the handling of shard failure cluster state updates
to use the general cluster state batching mechanism. An advantage of
this approach is we now get correct per-listener notification on
failures.

Relates #14899, relates #14725

jasontedor · 2015-11-25T14:01:21Z

@bleskes I'll rebase this pull request on master when #14899 is reintegrated there. The salient commit for this review is thus ccc89c3666780d0bf7b2425f9e8c76dbe6316187 pending #14899 (all the changes for that commit are in ShardStateAction.java and some minor modifications in AllocationService.java).

jasontedor · 2015-11-26T14:31:54Z

@bleskes I've rebased this pull request on the latest changes in #14899.

This commit modifies the handling of shard failure cluster state updates to use the general cluster state batching mechanism. An advantage of this approach is we now get correct per-listener notification on failures.

jasontedor · 2015-12-01T15:20:44Z

@bleskes This pull request has been rebased on master since #14899 has been integrated there.

bleskes · 2015-12-01T15:39:08Z

core/src/main/java/org/elasticsearch/cluster/action/shard/ShardStateAction.java

+                try {
+                    RoutingAllocation.Result result = allocationService.applyFailedShard(
+                            currentState,
+                            new FailedRerouteAllocation.FailedShard(task.shardRouting, task.message, task.failure));


why did we loose the batch application of shard failures?

My thinking was to get task failures only for the shards for which we were unsuccessful in marking as failed.

we share some logic when we do it in a batch, most notably the reroute. I wonder if we should improve the reporting from the applyFailedShards so we know what happened (we'll need it)

I wonder if we should improve the reporting from the applyFailedShards so we know what happened (we'll need it)

That's what I'm thinking now, but it should be in a separate issue on the #14252 work I think?

I pushed 413688b to apply the failures in a single batch. For now, we will not get task-specific failures but that will come in a follow up.

bleskes · 2015-12-01T16:19:59Z

looking good. Left some comments.

bleskes · 2015-12-03T14:54:44Z

core/src/main/java/org/elasticsearch/cluster/action/shard/ShardStateAction.java

+            BatchResult.Builder<ShardRoutingEntry> builder = BatchResult.builder();
+            List<FailedRerouteAllocation.FailedShard> shardRoutingsToBeApplied = new ArrayList<>(tasks.size());
+            for (ShardRoutingEntry task : tasks) {
+                task.processed = true;


we don't need this processed with this change..

Removed in b58d82f.

bleskes · 2015-12-03T14:57:04Z

LGTM. Left trivial comments.

Use general cluster state batching mechanism for shard failures

bleskes · 2015-12-03T19:10:57Z

One note - the comment from the shard started pr about error reporting holds for this one as well. Is the plan to do a followup? As with the shard started, we can just let the exception bubble up

jasontedor · 2015-12-03T19:14:17Z

Is the plan to do a followup?

@bleskes Yeah. :)

jasontedor · 2015-12-14T22:36:05Z

Is the plan to do a followup?

Yeah. :)

I opened #15428.

jasontedor added >enhancement review v5.0.0-alpha1 v2.2.0 labels Nov 25, 2015

jasontedor assigned bleskes Nov 25, 2015

jasontedor added the :Cluster label Nov 25, 2015

clintongormley removed the :Internal label Nov 28, 2015

Use general cluster state batching mechanism for shard failures

d7f4dd0

This commit modifies the handling of shard failure cluster state updates to use the general cluster state batching mechanism. An advantage of this approach is we now get correct per-listener notification on failures.

bleskes reviewed Dec 1, 2015
View reviewed changes

Apply shard failures in a single batch

413688b

bleskes reviewed Dec 3, 2015
View reviewed changes

jasontedor added 2 commits December 3, 2015 13:41

Rename variable for clarity in ShardFailedClusterStateHandler#execute

73a2d30

Remove unnecessary method in AllocationService

b70d97f

jasontedor added a commit that referenced this pull request Dec 3, 2015

Merge pull request #15016 from jasontedor/shard-failure-batch

182c22f

Use general cluster state batching mechanism for shard failures

jasontedor merged commit 182c22f into elastic:master Dec 3, 2015

jasontedor deleted the shard-failure-batch branch December 3, 2015 19:02

jasontedor mentioned this pull request Dec 14, 2015

Wait on shard failures #14252

Closed

9 tasks

jasontedor mentioned this pull request Dec 14, 2015

Explicitly log cluster state update failures #15428

Merged

clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use general cluster state batching mechanism for shard failures #15016

Use general cluster state batching mechanism for shard failures #15016

jasontedor commented Nov 25, 2015

jasontedor commented Nov 25, 2015

jasontedor commented Nov 26, 2015

jasontedor commented Dec 1, 2015

bleskes Dec 1, 2015

jasontedor Dec 1, 2015

bleskes Dec 1, 2015

jasontedor Dec 1, 2015

jasontedor Dec 1, 2015

bleskes commented Dec 1, 2015

bleskes Dec 3, 2015

jasontedor Dec 3, 2015

bleskes commented Dec 3, 2015

bleskes commented Dec 3, 2015

jasontedor commented Dec 3, 2015

jasontedor commented Dec 14, 2015

Use general cluster state batching mechanism for shard failures #15016

Use general cluster state batching mechanism for shard failures #15016

Conversation

jasontedor commented Nov 25, 2015

jasontedor commented Nov 25, 2015

jasontedor commented Nov 26, 2015

jasontedor commented Dec 1, 2015

bleskes Dec 1, 2015

Choose a reason for hiding this comment

jasontedor Dec 1, 2015

Choose a reason for hiding this comment

bleskes Dec 1, 2015

Choose a reason for hiding this comment

jasontedor Dec 1, 2015

Choose a reason for hiding this comment

jasontedor Dec 1, 2015

Choose a reason for hiding this comment

bleskes commented Dec 1, 2015

bleskes Dec 3, 2015

Choose a reason for hiding this comment

jasontedor Dec 3, 2015

Choose a reason for hiding this comment

bleskes commented Dec 3, 2015

bleskes commented Dec 3, 2015

jasontedor commented Dec 3, 2015

jasontedor commented Dec 14, 2015