-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Return NOT_PREFERRED decisions in allocation explain #137228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b03f11f to
be58da7
Compare
13322ec to
17de33c
Compare
Adds numerous NOT_PREFERRED options to allocation decision / status types. Adds NOT_PREFERRED option to AllocationDecision (resolving ES-12729). Closes ES-12833, ES-13288, ES-12729
17de33c to
4a0ee8a
Compare
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
|
Adding @DaveCTurner as a reviewer in case he wants to take a high level look for anything I'm missing -- I saw in git-blame that he wrote the Explanations.java file a while ago. But optional. |
|
I think I might be missing test coverage (if not functionality) of the allocate unassigned code path. However, I've been hacking on the rebalancing code path for a while now to get this all to work sensibly, so if agreeable I might just file another ticket to explore that. |
| .getShardAllocationDecision() | ||
| .getMoveDecision() | ||
| .getNodeDecisions(); | ||
| assertThat(canAllocateDecisions.size(), equalTo(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we give 2 a name somewhere? I assume it's {number of cluster nodes} - 1 perhaps we should make our expectedNumberOfClusterNodes explicit and assert that it's accurate after we run the other test logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, updated to indicate purpose and make dynamic 👍
| TimeValue.timeValueMillis(queueLatencyThresholdMillis) | ||
| ) | ||
| // Keep all the debug logging, no throttling of decider messages. | ||
| .put(WriteLoadConstraintSettings.WRITE_LOAD_DECIDER_MINIMUM_LOGGING_INTERVAL.getKey(), TimeValue.timeValueMinutes(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is this necessary, I don't think we're asserting on these messages anywhere are we? could also use TimeValue.ZERO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall why I added this anymore. I think perhaps for general debug purposes, in case of a test failure.
There don't appear to be any tests in this file with *Decider DEBUG logging turned on, so I'll remove it 👍
| int shard, | ||
| boolean primary, | ||
| @Nullable String currentNode | ||
| ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I don't think this is necessary, you can call
ClusterAllocationExplainRequest allocationExplainRequest =
new ClusterAllocationExplainRequest(TEST_REQUEST_TIMEOUT)
.setIndex(harness.indexName)
.setShard(0),
.setPrimary(true);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thank you! Reverted.
| } | ||
| if (allocationDecision.type().higherThan(bestDecision)) { | ||
| assert allocationDecision.type() != Type.THROTTLE | ||
| : "DesiredBalance computations run in a simulation mode and should not encounter throttling"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, how do we know we're simulating here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this as extra protection because I changed the Decision.Type enum ordering, such that the higherThan gate above would now prefer (return true for) THROTTLE over NOT_PREFERRED.
Because this is simulation, we should never see THROTTLE.
| assert decider.canRebalance(shardRouting, allocation).type() != Decision.Type.THROTTLE | ||
| : decider.getClass().getSimpleName() + " throttled unexpectedly in canRebalance"; | ||
| return decider.canRebalance(shardRouting, allocation); | ||
| }, (decider, decision) -> Strings.format("Can not rebalance [%s]. [%s]: %s", shardRouting, decider, decision)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConcurrentRebalanceAllocationDecider appears to return THROTTLE from canRebalance ?
Nit: I'd prefer if we stored the result of canRebalance then ran the assertion after it rather than running canRebalance twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, reverted 👍
| NOT_PREFERRED, | ||
| // Temporarily throttled is a better choice than choosing a not-preferred node, | ||
| // but NOT_PREFERRED and THROTTLED are generally not comparable. | ||
| THROTTLE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can do this can we? It'll mean in the presence of a THROTTLE and a NOT_PREFERRED, canAllocate will return NOT_PREFERRED, which I think is bad for non-desired-balance allocation, because there we should wait for a THROTTLE to become a yes before allocating to a NOT_PREFERRED?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe there's currently any direct comparison between THROTTLE and NOT_PREFERRED. But we still need to specify an ordering here, to not break how one or the other gets compared to YES and NO.
It'll mean in the presence of a THROTTLE and a NOT_PREFERRED, canAllocate will return NOT_PREFERRED
I think this would be the other way around, THROTTLE would be higher value, closer to YES, and chosen over NOT_PREFERRED. So, IIUC, we both want it the current way.
I've a little blurb for the commit message:
"A significant change is to re-order comparison of Decision.Type enum
values, such that THROTTLE is chosen over NOT_PREFERRED. Functionally
this change should not matter because simulation (DesiredBalance
computation) does not throttle and reconciliation (real shard movement)
treats not-preferred essentially as a YES: they are not compared."
| @Override | ||
| public String getExplanation() { | ||
| checkDecisionState(); | ||
| return switch (getAllocationDecision()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at AllocateUnassignedDecision#getAllocationDecision it appears as though it'll never return NOT_PREFERRED, see AllocationDecision#fromAllocationStatus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, fixed 👍
| public static final String NOT_PREFERRED = """ | ||
| Elasticsearch will not rebalance this shard to another node because all other eligible nodes have high resource usage. The \ | ||
| total cluster balance weights might improve, were the shard relocated, but it would push one resource usage dimension \ | ||
| too high and threaten performance. See the node-by-node explanation to understand what resource usage is high."""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the above description sounds over-fitted to the write-load decider. The index balance decider will also return NOT_PREFERRED right? and that has nothing to do with resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index balance decider is also interested in resources, in the sense that a spike in an index' ingest would be more safely distributed across as many nodes as possible. Those are the lines along which I was thinking.
I've tweaked the phrasing a little, but not sure if as much as desirable. If you have any suggestions, let me know.
| } | ||
| // TODO (ES-13482): clusterRebalanceDecision is set to the result of AllocationDecider#canRebalance, which does not return | ||
| // NOT_PREFERRED or THROTTLE. This switch statement, and how MoveDecision uses clusterRebalanceDecision, should be | ||
| // refactored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it still does? see org/elasticsearch/cluster/routing/allocation/decider/ConcurrentRebalanceAllocationDecider.java:182
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like Simon committed some changes concurrently with my PR.
But I think I was wrong, anyway: I didn't realize the complexities of canRebalance.
Removed 👍 Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, I'm finally returning to this. Fixed up the code per your feedback.
| .getShardAllocationDecision() | ||
| .getMoveDecision() | ||
| .getNodeDecisions(); | ||
| assertThat(canAllocateDecisions.size(), equalTo(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, updated to indicate purpose and make dynamic 👍
| TimeValue.timeValueMillis(queueLatencyThresholdMillis) | ||
| ) | ||
| // Keep all the debug logging, no throttling of decider messages. | ||
| .put(WriteLoadConstraintSettings.WRITE_LOAD_DECIDER_MINIMUM_LOGGING_INTERVAL.getKey(), TimeValue.timeValueMinutes(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall why I added this anymore. I think perhaps for general debug purposes, in case of a test failure.
There don't appear to be any tests in this file with *Decider DEBUG logging turned on, so I'll remove it 👍
| int shard, | ||
| boolean primary, | ||
| @Nullable String currentNode | ||
| ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thank you! Reverted.
| } | ||
| if (allocationDecision.type().higherThan(bestDecision)) { | ||
| assert allocationDecision.type() != Type.THROTTLE | ||
| : "DesiredBalance computations run in a simulation mode and should not encounter throttling"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this as extra protection because I changed the Decision.Type enum ordering, such that the higherThan gate above would now prefer (return true for) THROTTLE over NOT_PREFERRED.
Because this is simulation, we should never see THROTTLE.
| NOT_PREFERRED, | ||
| // Temporarily throttled is a better choice than choosing a not-preferred node, | ||
| // but NOT_PREFERRED and THROTTLED are generally not comparable. | ||
| THROTTLE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe there's currently any direct comparison between THROTTLE and NOT_PREFERRED. But we still need to specify an ordering here, to not break how one or the other gets compared to YES and NO.
It'll mean in the presence of a THROTTLE and a NOT_PREFERRED, canAllocate will return NOT_PREFERRED
I think this would be the other way around, THROTTLE would be higher value, closer to YES, and chosen over NOT_PREFERRED. So, IIUC, we both want it the current way.
I've a little blurb for the commit message:
"A significant change is to re-order comparison of Decision.Type enum
values, such that THROTTLE is chosen over NOT_PREFERRED. Functionally
this change should not matter because simulation (DesiredBalance
computation) does not throttle and reconciliation (real shard movement)
treats not-preferred essentially as a YES: they are not compared."
| } | ||
| // TODO (ES-13482): clusterRebalanceDecision is set to the result of AllocationDecider#canRebalance, which does not return | ||
| // NOT_PREFERRED or THROTTLE. This switch statement, and how MoveDecision uses clusterRebalanceDecision, should be | ||
| // refactored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like Simon committed some changes concurrently with my PR.
But I think I was wrong, anyway: I didn't realize the complexities of canRebalance.
Removed 👍 Thanks!
| assert decider.canRebalance(shardRouting, allocation).type() != Decision.Type.THROTTLE | ||
| : decider.getClass().getSimpleName() + " throttled unexpectedly in canRebalance"; | ||
| return decider.canRebalance(shardRouting, allocation); | ||
| }, (decider, decision) -> Strings.format("Can not rebalance [%s]. [%s]: %s", shardRouting, decider, decision)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, reverted 👍
| @Override | ||
| public String getExplanation() { | ||
| checkDecisionState(); | ||
| return switch (getAllocationDecision()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, fixed 👍
| public static final String NOT_PREFERRED = """ | ||
| Elasticsearch will not rebalance this shard to another node because all other eligible nodes have high resource usage. The \ | ||
| total cluster balance weights might improve, were the shard relocated, but it would push one resource usage dimension \ | ||
| too high and threaten performance. See the node-by-node explanation to understand what resource usage is high."""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index balance decider is also interested in resources, in the sense that a spike in an index' ingest would be more safely distributed across as many nodes as possible. Those are the lines along which I was thinking.
I've tweaked the phrasing a little, but not sure if as much as desirable. If you have any suggestions, let me know.
Adds numerous NOT_PREFERRED options to allocation decision / status types.
Adds NOT_PREFERRED option to AllocationDecision (resolving ES-12729).
A significant change is to re-order comparison of Decision.Type enum
values, such that THROTTLE is chosen over NOT_PREFERRED. Functionally
this change should not matter because simulation (DesiredBalance
computation) does not throttle and reconciliation (real shard movement)
treats not-preferred essentially as a YES: they are not compared.
Closes ES-12833, ES-13288, ES-12729