-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Primary shard allocator observes limits in forcing allocation #19811
Primary shard allocator observes limits in forcing allocation #19811
Conversation
@ywelsch FYI, initial pass on this, would welcome your feedback. |
public Decision canForceAllocatePrimary(ShardRouting shardRouting, RoutingNode node, RoutingAllocation allocation) { | ||
assert shardRouting.primary() : "must not call canForceAllocatePrimary on a non-primary shard routing [" + | ||
shardRouting.shardId() + "]"; | ||
return Decision.YES; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means that throttling is just ignored.
Assume scenario where FilterAllocationDecider says NO and ThrottlingAllocationDecider says THROTTLE. The implementation here would just forcefully allocate the shard, ignoring the throttling.
7324b43
to
397e374
Compare
/** | ||
* Returns a {@link Decision} whether the given primary shard can be | ||
* forcibly allocated on the given node. This method should only be called | ||
* on nodes for which previous allocations exist for the primary shard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This method should only be called on nodes for which previous allocations exist for the primary shard." should be more like "This method should only be called for unassigned primary shards where the node has a shard copy on disk."
Can you also assert shardRouting.unassigned()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@abeyad I've left some comments. Can you also check if there are tests that I would also like to see some docs where we can explain the implemented behavior. This could be useful to understand why primary shards are / are not allocated. |
Previously, during primary shards allocation of shards with prior allocation IDs, if all nodes returned a NO decision for allocation (e.g. the settings blocked allocation on that node), we would chose one of those nodes and force the primary shard to be allocated to it. However, this meant that primary shard allocation would not adhere to the decision of the MaxRetryAllocationDecider, which would lead to attempting to allocate a shard which has failed N number of times already (presumably due to some configuration issue). This commit solves this issue by introducing the notion of force allocating a primary shard to a node and each decider implementation must implement whether this is allowed or not. In the case of MaxRetryAllocationDecider, it just forwards the request to canAllocate. Closes elastic#19446
397e374
to
9e47cd1
Compare
@ywelsch I pushed 9e47cd1, which addresses your review comments, and adds As we discussed, I augmented the |
public Decision canForceAllocatePrimary(ShardRouting shardRouting, RoutingNode node, RoutingAllocation allocation) { | ||
assert shardRouting.primary() : "must not call canForceAllocatePrimary on a non-primary shard " + shardRouting; | ||
assert shardRouting.unassigned() : "must not call canForceAllocatePrimary on an assigned shard " + shardRouting; | ||
return Decision.YES; // by default, a decider will allow force allocation of the primary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the way it was before with the default being
decision = canAllocate(...)
if (decision == Decision.NO) {
decision = Decision.single(Type.YES, "force override of " + decision.label, ...)
}
This keeps the override logic out of AllocationDeciders
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern with your suggestion (which was the initial approach) is that for any decider that overrides canForceAllocatePrimary
, it is easy for it to forget to take into account the decision of canAllocate
. That's why I figured it would be better to put the invocation of canAllocate
in AllocationDeciders
, then if the decision is NO
(as opposed to throttle, for example), then call the deciders canForceAllocatePrimary
implementation. The only downside to this that I can think of is that its rigid in that no canForceAllocatePrimary
method would be able to override a non-NO canAllocate
decision - but I don't see any reason why we would want that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed 5c25b15 to address this
extend TestAllocateDecision that provide the desired behavior.
@elasticmachine retest this please |
|
||
@Override | ||
public Decision canForceAllocatePrimary(ShardRouting shardRouting, RoutingNode node, RoutingAllocation allocation) { | ||
assert shardRouting.primary() : "must not call canForceAllocatePrimary on a non-primary shard [" + shardRouting.shardId() + "]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print shardRouting
, not only shardRouting.shardId()
.
@abeyad Left minor comments about docs/tests, change looks good o.w. |
LGTM. Thanks @abeyad! |
Thanks for the review @ywelsch ! |
Previously, during primary shards allocation of shards
with prior allocation IDs, if all nodes returned a
NO decision for allocation (e.g. the settings blocked
allocation on that node), we would chose one of those
nodes and force the primary shard to be allocated to it.
However, this meant that primary shard allocation
would not adhere to the decision of the MaxRetryAllocationDecider,
which would lead to attempting to allocate a shard
which has failed N number of times already (presumably
due to some configuration issue).
This commit solves this issue by introducing the
notion of force allocating a primary shard to a node
and each decider implementation must implement whether
this is allowed or not. In the case of MaxRetryAllocationDecider,
it just forwards the request to canAllocate.
Closes #19446