Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use general cluster state batching mechanism for shard failures #15016

Merged
merged 4 commits into from
Dec 3, 2015
Merged

Use general cluster state batching mechanism for shard failures #15016

merged 4 commits into from
Dec 3, 2015

Conversation

jasontedor
Copy link
Member

This commit modifies the handling of shard failure cluster state updates
to use the general cluster state batching mechanism. An advantage of
this approach is we now get correct per-listener notification on
failures.

Relates #14899, relates #14725

@jasontedor
Copy link
Member Author

@bleskes I'll rebase this pull request on master when #14899 is reintegrated there. The salient commit for this review is thus ccc89c3666780d0bf7b2425f9e8c76dbe6316187 pending #14899 (all the changes for that commit are in ShardStateAction.java and some minor modifications in AllocationService.java).

@jasontedor
Copy link
Member Author

@bleskes I've rebased this pull request on the latest changes in #14899.

This commit modifies the handling of shard failure cluster state updates
to use the general cluster state batching mechanism. An advantage of
this approach is we now get correct per-listener notification on
failures.
@jasontedor
Copy link
Member Author

@bleskes This pull request has been rebased on master since #14899 has been integrated there.

try {
RoutingAllocation.Result result = allocationService.applyFailedShard(
currentState,
new FailedRerouteAllocation.FailedShard(task.shardRouting, task.message, task.failure));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we loose the batch application of shard failures?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was to get task failures only for the shards for which we were unsuccessful in marking as failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we share some logic when we do it in a batch, most notably the reroute. I wonder if we should improve the reporting from the applyFailedShards so we know what happened (we'll need it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should improve the reporting from the applyFailedShards so we know what happened (we'll need it)

That's what I'm thinking now, but it should be in a separate issue on the #14252 work I think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 413688b to apply the failures in a single batch. For now, we will not get task-specific failures but that will come in a follow up.

@bleskes
Copy link
Contributor

bleskes commented Dec 1, 2015

looking good. Left some comments.

BatchResult.Builder<ShardRoutingEntry> builder = BatchResult.builder();
List<FailedRerouteAllocation.FailedShard> shardRoutingsToBeApplied = new ArrayList<>(tasks.size());
for (ShardRoutingEntry task : tasks) {
task.processed = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this processed with this change..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in b58d82f.

@bleskes
Copy link
Contributor

bleskes commented Dec 3, 2015

LGTM. Left trivial comments.

jasontedor added a commit that referenced this pull request Dec 3, 2015
Use general cluster state batching mechanism for shard failures
@jasontedor jasontedor merged commit 182c22f into elastic:master Dec 3, 2015
@jasontedor jasontedor deleted the shard-failure-batch branch December 3, 2015 19:02
@bleskes
Copy link
Contributor

bleskes commented Dec 3, 2015

One note - the comment from the shard started pr about error reporting holds for this one as well. Is the plan to do a followup? As with the shard started, we can just let the exception bubble up

@jasontedor
Copy link
Member Author

Is the plan to do a followup?

@bleskes Yeah. :)

@jasontedor jasontedor mentioned this pull request Dec 14, 2015
9 tasks
@jasontedor
Copy link
Member Author

Is the plan to do a followup?

Yeah. :)

I opened #15428.

@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement v2.2.0 v5.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants