-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Balancer changes to use Decision#NOT_PREFERRED #134160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Balancer changes to use Decision#NOT_PREFERRED #134160
Conversation
…red in reconciliation
…terpret as YES but delay until the YES assignments are exhausted
2d70a0f
to
5c04938
Compare
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
RoutingNode routingNode = sourceNode.getRoutingNode(); | ||
Decision canRemain = allocation.deciders().canRemain(shardRouting, routingNode, allocation); | ||
if (canRemain.type() != Decision.Type.NO) { | ||
if (canRemain.type() != Decision.Type.NO && canRemain.type() != Decision.Type.NOT_PREFERRED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally an early return if not NO for canRemain. Excluding NOT_PREFERRED from early return, so we'll go on to try to move it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we want to keep this as-is, pending the introduction of the proposed moveNotPreferred
phase. For example if a node is hot spotting and all its shards are returning NOT_PREFERRED
, we probably want to delay dealing with those until moveNotPreferred
when we'll move them in preferential order.
I have a PR for moveNotPreferred
which I'll put up for review shortly to get feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of our work is feature gated, so in that respect I'm not worried about waiting for other code first. I can't test without this change: I've got the canRemain work done, waiting on the PR so I can rebase before publishing the work. You will also be able to take advantage of the testing / functionality, once your feature is in place, however it turns out, so that might be appealing. This logic can be changed easily since it's a line of code.
If you're comfortable with that, I'd like to go ahead with getting the dumb case (of picking any shard) working, so we don't bottleneck work. I was actually expecting moveNotPreferred to run before moveShards. In that case, though, we would not actually exercise this check. We might even turn this into an assert that not-preferred never occurs.
…ation, not reconciliation's job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with nits
} else if (bestDecision == Type.NOT_PREFERRED) { | ||
assert remainDecision.type() != Type.NOT_PREFERRED; | ||
assert bestDecision != Type.YES; | ||
// If we don't ever find a YES decision, we'll settle for NOT_PREFERRED as preferable to NO. | ||
targetNode = target; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change Type bestDecision = Type.NO;
to Type bestDecision = remainDecision.type();
then if (allocationDecision.type().higherThan(bestDecision)) {
will work for all cases without new if conditions.
When source and target NOT_PREFERRED allocationDecision.type().higherThan(bestDecision)
returns false. So first condition is redundant. if (allocationDecision.type() == Type.NOT_PREFERRED && remainDecision.type() == Type.NOT_PREFERRED) {
Basically only one line change is needed Type bestDecision = remainDecision.type();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is clever :) I'd worry about clarity, though, with implicit logic, and thus making the code fragile.
IIUC. The method could then return NOT_PREFERRED as the bestDecision, which then gets treated as YES in the code but has a null target node. Or the canRemain could be NOT_PREFERRED, but all the options are NO, and we return NOT_PREFERRED, which could be overridden (except no target node). These may or may not have consequences, can't say without experimentation. But my primary concern would be too much implicit logic that's easy to break in future..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think thats how it suppose to be in the first place, before your change, then you wouldn't need to change anything. The whole methods simply says is there better decision than current(remainDecision
). It should start with best=remainDecision, not NO.
It's a nit comment, I dont find explicitness here helps with clarity, rather a redundancy that takes few more moments to understand whats going on and why we need extra IFs.
Strings.format( | ||
"Shard [%s] in index [%s] can be assigned to node [%s]. The node's utilization would become [%s]", | ||
shardRouting.shardId(), | ||
shardRouting.index(), | ||
node.nodeId(), | ||
newWriteThreadPoolUtilization | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other deciders tend to use String explanation = Strings.format(...)
and pass it to the logger
and allocation.decision
, DRY :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, done 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. Updated.
Strings.format( | ||
"Shard [%s] in index [%s] can be assigned to node [%s]. The node's utilization would become [%s]", | ||
shardRouting.shardId(), | ||
shardRouting.index(), | ||
node.nodeId(), | ||
newWriteThreadPoolUtilization | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, done 👍
} else if (bestDecision == Type.NOT_PREFERRED) { | ||
assert remainDecision.type() != Type.NOT_PREFERRED; | ||
assert bestDecision != Type.YES; | ||
// If we don't ever find a YES decision, we'll settle for NOT_PREFERRED as preferable to NO. | ||
targetNode = target; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is clever :) I'd worry about clarity, though, with implicit logic, and thus making the code fragile.
IIUC. The method could then return NOT_PREFERRED as the bestDecision, which then gets treated as YES in the code but has a null target node. Or the canRemain could be NOT_PREFERRED, but all the options are NO, and we return NOT_PREFERRED, which could be overridden (except no target node). These may or may not have consequences, can't say without experimentation. But my primary concern would be too much implicit logic that's easy to break in future..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, only real concern is the treatment of NOT_PREFERRED
in canMove
. I will tidy up what I have for moveNotPreferred
and put it up today to add context to my comment.
RoutingNode routingNode = sourceNode.getRoutingNode(); | ||
Decision canRemain = allocation.deciders().canRemain(shardRouting, routingNode, allocation); | ||
if (canRemain.type() != Decision.Type.NO) { | ||
if (canRemain.type() != Decision.Type.NO && canRemain.type() != Decision.Type.NOT_PREFERRED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we want to keep this as-is, pending the introduction of the proposed moveNotPreferred
phase. For example if a node is hot spotting and all its shards are returning NOT_PREFERRED
, we probably want to delay dealing with those until moveNotPreferred
when we'll move them in preferential order.
I have a PR for moveNotPreferred
which I'll put up for review shortly to get feedback.
...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java
Outdated
Show resolved
Hide resolved
if (canRemainDecision.type() != Decision.Type.NO && canRemainDecision.type() != Decision.Type.NOT_PREFERRED) { | ||
// If movement is throttled, a future reconciliation round will see a resolution. For now, leave it alone. | ||
// Reconciliation treats canRemain NOT_PREFERRED answers as YES because the DesiredBalance computation already decided | ||
// how to handle the situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will mean we move NO
and NOT_PREFERRED
shards with the same priority in the reconciler. Not obviously wrong to me, but I wonder if we want to do NO
first then NOT_PREFERRED
? Seems easier to treat them as the same for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking the other way, prioritizing moves to address hot-spots first, would be more ideal since it addresses a performance problem. Though actually, that would deprioritize shutdown moves.. But then again, timeout during shutdown is often because something else is going on than allocation.
This bit of code, though, doesn't control ordering -- that would have to be a new feature in the code to organize shard selection based on NO vs NOT_PREFERRED, probably hard -- rather it's an early exit if canRemain say YES or THROTTLE.
But yeah, perhaps we'll see a motivation later for something fancier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does control ordering in that shards moved in this phase will consume limited incoming/outgoing recovery slots, right? so shards eligible for movement in this phase will be prioritised before undesired allocations eligible only for movement in the balance()
phase?
In saying that it probably makes sense to prioritise NOT_PREFERRED
before merely undesired allocations.
* off of Node1 while Node2 and Node3 are hot-spotting, resulting in overriding not-preferred and relocating shards to Node2 and Node3. | ||
*/ | ||
public void testShardsAreAssignedToNotPreferredWhenAlternativeIsNo() { | ||
TestHarness harness = setUpThreeTestNodesAndAllIndexShardsOnFirstNode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test feels more like a test for the balancer logic than the write load constraint decider specifically. I wonder if it would be less verbose to do this by putting in dummy deciders which you can directly control the canRemain
/canAllocate
values for, rather than creating the conditions where WriteLoadConstraintDecider
returns NO
and NOT_PREFERRED
? Perhaps there are already similar tests for testing the existing balancer logic?
I might have missed something though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test feels more like a test for the balancer logic than the write load constraint decider specifically.
Hmm, I agree with you. Though this is the easiest place for me to write the test, since I have the shared setup logic already: it'd be duplicative otherwise. I'm inclined to let it slide because we typically aren't that concerned where we test features, and this behavior is very relevant to this decider. Perhaps I've phrased the test as more generic than it actually is: there's a good deal of write load decider specifics.
I wonder if it would be less verbose to do this by putting in dummy deciders which you can directly control the canRemain/canAllocate values for, rather than creating the conditions where WriteLoadConstraintDecider returns NO and NOT_PREFERRED ? Perhaps there are already similar tests for testing the existing balancer logic?
This does test the write load decider decisions in particular with the thresholds and such. So my first thought is that we would lose WriteLoadDecider test coverage by using dummy deciders. Perhaps what you suggest with dummy deciders would be a fitting balancer unit test, rather than a decider integration test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, Nick. I've responded to the comment threads, let me know what you think.
RoutingNode routingNode = sourceNode.getRoutingNode(); | ||
Decision canRemain = allocation.deciders().canRemain(shardRouting, routingNode, allocation); | ||
if (canRemain.type() != Decision.Type.NO) { | ||
if (canRemain.type() != Decision.Type.NO && canRemain.type() != Decision.Type.NOT_PREFERRED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of our work is feature gated, so in that respect I'm not worried about waiting for other code first. I can't test without this change: I've got the canRemain work done, waiting on the PR so I can rebase before publishing the work. You will also be able to take advantage of the testing / functionality, once your feature is in place, however it turns out, so that might be appealing. This logic can be changed easily since it's a line of code.
If you're comfortable with that, I'd like to go ahead with getting the dumb case (of picking any shard) working, so we don't bottleneck work. I was actually expecting moveNotPreferred to run before moveShards. In that case, though, we would not actually exercise this check. We might even turn this into an assert that not-preferred never occurs.
...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java
Outdated
Show resolved
Hide resolved
if (canRemainDecision.type() != Decision.Type.NO && canRemainDecision.type() != Decision.Type.NOT_PREFERRED) { | ||
// If movement is throttled, a future reconciliation round will see a resolution. For now, leave it alone. | ||
// Reconciliation treats canRemain NOT_PREFERRED answers as YES because the DesiredBalance computation already decided | ||
// how to handle the situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking the other way, prioritizing moves to address hot-spots first, would be more ideal since it addresses a performance problem. Though actually, that would deprioritize shutdown moves.. But then again, timeout during shutdown is often because something else is going on than allocation.
This bit of code, though, doesn't control ordering -- that would have to be a new feature in the code to organize shard selection based on NO vs NOT_PREFERRED, probably hard -- rather it's an early exit if canRemain say YES or THROTTLE.
But yeah, perhaps we'll see a motivation later for something fancier.
* off of Node1 while Node2 and Node3 are hot-spotting, resulting in overriding not-preferred and relocating shards to Node2 and Node3. | ||
*/ | ||
public void testShardsAreAssignedToNotPreferredWhenAlternativeIsNo() { | ||
TestHarness harness = setUpThreeTestNodesAndAllIndexShardsOnFirstNode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test feels more like a test for the balancer logic than the write load constraint decider specifically.
Hmm, I agree with you. Though this is the easiest place for me to write the test, since I have the shared setup logic already: it'd be duplicative otherwise. I'm inclined to let it slide because we typically aren't that concerned where we test features, and this behavior is very relevant to this decider. Perhaps I've phrased the test as more generic than it actually is: there's a good deal of write load decider specifics.
I wonder if it would be less verbose to do this by putting in dummy deciders which you can directly control the canRemain/canAllocate values for, rather than creating the conditions where WriteLoadConstraintDecider returns NO and NOT_PREFERRED ? Perhaps there are already similar tests for testing the existing balancer logic?
This does test the write load decider decisions in particular with the thresholds and such. So my first thought is that we would lose WriteLoadDecider test coverage by using dummy deciders. Perhaps what you suggest with dummy deciders would be a fitting balancer unit test, rather than a decider integration test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if (canRemainDecision.type() != Decision.Type.NO && canRemainDecision.type() != Decision.Type.NOT_PREFERRED) { | ||
// If movement is throttled, a future reconciliation round will see a resolution. For now, leave it alone. | ||
// Reconciliation treats canRemain NOT_PREFERRED answers as YES because the DesiredBalance computation already decided | ||
// how to handle the situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does control ordering in that shards moved in this phase will consume limited incoming/outgoing recovery slots, right? so shards eligible for movement in this phase will be prioritised before undesired allocations eligible only for movement in the balance()
phase?
In saying that it probably makes sense to prioritise NOT_PREFERRED
before merely undesired allocations.
Closes ES-12716
I've left some TODOs to tickets I've filed as a result of code exploration, particularly to test some balancer changes I'm making for canRemain response handling.
In case it makes reviewing the test file changes easier, I created a branch without the new test, so you can see the test refactor changes separately from the new test -- the new test makes the diff harder to read, unfortunately.