Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primary shard allocator observes limits in forcing allocation #19811

Merged
merged 7 commits into from
Aug 16, 2016

Conversation

abeyad
Copy link

@abeyad abeyad commented Aug 4, 2016

Previously, during primary shards allocation of shards
with prior allocation IDs, if all nodes returned a
NO decision for allocation (e.g. the settings blocked
allocation on that node), we would chose one of those
nodes and force the primary shard to be allocated to it.

However, this meant that primary shard allocation
would not adhere to the decision of the MaxRetryAllocationDecider,
which would lead to attempting to allocate a shard
which has failed N number of times already (presumably
due to some configuration issue).

This commit solves this issue by introducing the
notion of force allocating a primary shard to a node
and each decider implementation must implement whether
this is allowed or not. In the case of MaxRetryAllocationDecider,
it just forwards the request to canAllocate.

Closes #19446

@abeyad
Copy link
Author

abeyad commented Aug 4, 2016

@ywelsch FYI, initial pass on this, would welcome your feedback.

@abeyad abeyad changed the title Primary shard allocation observes limits in forcing allocation Primary shard allocator observes limits in forcing allocation Aug 4, 2016
public Decision canForceAllocatePrimary(ShardRouting shardRouting, RoutingNode node, RoutingAllocation allocation) {
assert shardRouting.primary() : "must not call canForceAllocatePrimary on a non-primary shard routing [" +
shardRouting.shardId() + "]";
return Decision.YES;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means that throttling is just ignored.
Assume scenario where FilterAllocationDecider says NO and ThrottlingAllocationDecider says THROTTLE. The implementation here would just forcefully allocate the shard, ignoring the throttling.

@abeyad abeyad force-pushed the improve-primary-allocation-retries branch from 7324b43 to 397e374 Compare August 8, 2016 16:48
@abeyad
Copy link
Author

abeyad commented Aug 8, 2016

@ywelsch I pushed 397e374 to address your feedback

/**
* Returns a {@link Decision} whether the given primary shard can be
* forcibly allocated on the given node. This method should only be called
* on nodes for which previous allocations exist for the primary shard.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This method should only be called on nodes for which previous allocations exist for the primary shard." should be more like "This method should only be called for unassigned primary shards where the node has a shard copy on disk."
Can you also assert shardRouting.unassigned() ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ywelsch
Copy link
Contributor

ywelsch commented Aug 9, 2016

@abeyad I've left some comments. Can you also check if there are tests that FilterAllocationDecider and other deciders are indeed allowing force-allocating primaries?

I would also like to see some docs where we can explain the implemented behavior. This could be useful to understand why primary shards are / are not allocated.

Ali Beyad added 3 commits August 12, 2016 00:04
Previously, during primary shards allocation of shards
with prior allocation IDs, if all nodes returned a
NO decision for allocation (e.g. the settings blocked
allocation on that node), we would chose one of those
nodes and force the primary shard to be allocated to it.

However, this meant that primary shard allocation
would not adhere to the decision of the MaxRetryAllocationDecider,
which would lead to attempting to allocate a shard
which has failed N number of times already (presumably
due to some configuration issue).

This commit solves this issue by introducing the
notion of force allocating a primary shard to a node
and each decider implementation must implement whether
this is allowed or not. In the case of MaxRetryAllocationDecider,
it just forwards the request to canAllocate.

Closes elastic#19446
@abeyad abeyad force-pushed the improve-primary-allocation-retries branch from 397e374 to 9e47cd1 Compare August 12, 2016 04:48
@abeyad
Copy link
Author

abeyad commented Aug 12, 2016

@ywelsch I pushed 9e47cd1, which addresses your review comments, and adds PrimaryAllocationIT test that covers filter allocation decider force allocating primaries.

As we discussed, I augmented the PrimaryShardAllocator with some more javadocs to explain the behavior. Docs explaining shard allocation in general (including primary shard allocation) will come in a separate PR.

public Decision canForceAllocatePrimary(ShardRouting shardRouting, RoutingNode node, RoutingAllocation allocation) {
assert shardRouting.primary() : "must not call canForceAllocatePrimary on a non-primary shard " + shardRouting;
assert shardRouting.unassigned() : "must not call canForceAllocatePrimary on an assigned shard " + shardRouting;
return Decision.YES; // by default, a decider will allow force allocation of the primary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer the way it was before with the default being

decision = canAllocate(...)
if (decision == Decision.NO) {
    decision = Decision.single(Type.YES, "force override of " + decision.label, ...)
}

This keeps the override logic out of AllocationDeciders.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with your suggestion (which was the initial approach) is that for any decider that overrides canForceAllocatePrimary, it is easy for it to forget to take into account the decision of canAllocate. That's why I figured it would be better to put the invocation of canAllocate in AllocationDeciders, then if the decision is NO (as opposed to throttle, for example), then call the deciders canForceAllocatePrimary implementation. The only downside to this that I can think of is that its rigid in that no canForceAllocatePrimary method would be able to override a non-NO canAllocate decision - but I don't see any reason why we would want that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 5c25b15 to address this

extend TestAllocateDecision that provide the desired behavior.
@abeyad
Copy link
Author

abeyad commented Aug 12, 2016

@ywelsch I pushed 9446267 to remove ForcePrimaryDecider in favor of anonymous class creation that extends TestAllocateDecision

@abeyad
Copy link
Author

abeyad commented Aug 15, 2016

@elasticmachine retest this please


@Override
public Decision canForceAllocatePrimary(ShardRouting shardRouting, RoutingNode node, RoutingAllocation allocation) {
assert shardRouting.primary() : "must not call canForceAllocatePrimary on a non-primary shard [" + shardRouting.shardId() + "]";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print shardRouting, not only shardRouting.shardId().

@ywelsch
Copy link
Contributor

ywelsch commented Aug 16, 2016

@abeyad Left minor comments about docs/tests, change looks good o.w.

@abeyad
Copy link
Author

abeyad commented Aug 16, 2016

@ywelsch I pushed 5b536be

@ywelsch
Copy link
Contributor

ywelsch commented Aug 16, 2016

LGTM. Thanks @abeyad!

@abeyad
Copy link
Author

abeyad commented Aug 16, 2016

Thanks for the review @ywelsch !

@abeyad abeyad merged commit 88aff40 into elastic:master Aug 16, 2016
@abeyad abeyad deleted the improve-primary-allocation-retries branch August 16, 2016 15:25
@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement v5.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants