Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
88fac67
wip
DiannaHohensee Sep 2, 2025
c9c2645
comment cleanup
DiannaHohensee Sep 3, 2025
0f837c5
change for rebalancing, skip if not-preferred
DiannaHohensee Sep 4, 2025
4fef486
change write load decider to not-preferred answers; handle not-prefer…
DiannaHohensee Sep 4, 2025
e3da024
test code refactor, plus testing todos
DiannaHohensee Sep 5, 2025
a07e654
test improvements; obviate THROTTLE handling issue by overriding conc…
DiannaHohensee Sep 6, 2025
3c16fa0
bit of write load decider explanation / logging change
DiannaHohensee Sep 6, 2025
5c04938
second test working; add reconciliation handling of NOT_PREFERRED, in…
DiannaHohensee Sep 9, 2025
89e2dd0
fix test file after force rebase
DiannaHohensee Sep 9, 2025
2621ac1
tidy up test file
DiannaHohensee Sep 9, 2025
d04e073
improve test change readability
DiannaHohensee Sep 9, 2025
515175d
improve test change readability again
DiannaHohensee Sep 9, 2025
0aba59a
return log level after debugging
DiannaHohensee Sep 9, 2025
b609df8
[CI] Auto commit changes from spotless
Sep 9, 2025
8dd940c
add an assert for protection
DiannaHohensee Sep 9, 2025
fd217db
remove todo, consider canRemain not-preferred as YES during reconcili…
DiannaHohensee Sep 9, 2025
18fa1f2
improve string use, remove duplicate code
DiannaHohensee Sep 10, 2025
f14e4f1
Merge branch 'main' into 2025/09/02/ES-12716-balancer
DiannaHohensee Sep 10, 2025
f36746b
Merge branch 'main' into 2025/09/02/ES-12716-balancer
DiannaHohensee Sep 12, 2025
d048da0
remove redundant assert
DiannaHohensee Sep 12, 2025
c10f655
Merge branch 'main' into 2025/09/02/ES-12716-balancer
DiannaHohensee Sep 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ public Decision getCanRemainDecision() {
* the result of this method is meaningless, as no rebalance decision was taken. If {@link #isDecisionTaken()}
* returns {@code false}, then invoking this method will throw an {@code IllegalStateException}.
*/
// @VisibleForTesting
public boolean canRebalanceCluster() {
checkDecisionState();
return clusterRebalanceDecision != null && clusterRebalanceDecision.type() == Type.YES;
Expand All @@ -192,6 +193,7 @@ public boolean canRebalanceCluster() {
* If {@link #isDecisionTaken()} returns {@code false}, then invoking this method will throw an
* {@code IllegalStateException}.
*/
// @VisibleForTesting
@Nullable
public Decision getClusterRebalanceDecision() {
checkDecisionState();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -861,7 +861,7 @@ public MoveDecision decideMove(final ProjectIndex index, final ShardRouting shar
assert sourceNode != null && sourceNode.containsShard(index, shardRouting);
RoutingNode routingNode = sourceNode.getRoutingNode();
Decision canRemain = allocation.deciders().canRemain(shardRouting, routingNode, allocation);
if (canRemain.type() != Decision.Type.NO) {
if (canRemain.type() != Decision.Type.NO && canRemain.type() != Decision.Type.NOT_PREFERRED) {
Copy link
Contributor Author

@DiannaHohensee DiannaHohensee Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally an early return if not NO for canRemain. Excluding NOT_PREFERRED from early return, so we'll go on to try to move it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want to keep this as-is, pending the introduction of the proposed moveNotPreferred phase. For example if a node is hot spotting and all its shards are returning NOT_PREFERRED, we probably want to delay dealing with those until moveNotPreferred when we'll move them in preferential order.

I have a PR for moveNotPreferred which I'll put up for review shortly to get feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of our work is feature gated, so in that respect I'm not worried about waiting for other code first. I can't test without this change: I've got the canRemain work done, waiting on the PR so I can rebase before publishing the work. You will also be able to take advantage of the testing / functionality, once your feature is in place, however it turns out, so that might be appealing. This logic can be changed easily since it's a line of code.

If you're comfortable with that, I'd like to go ahead with getting the dumb case (of picking any shard) working, so we don't bottleneck work. I was actually expecting moveNotPreferred to run before moveShards. In that case, though, we would not actually exercise this check. We might even turn this into an assert that not-preferred never occurs.

return MoveDecision.remain(canRemain);
}

Expand Down Expand Up @@ -901,7 +901,11 @@ private MoveDecision decideMove(
if (explain) {
nodeResults.add(new NodeAllocationResult(currentNode.getRoutingNode().node(), allocationDecision, ++weightRanking));
}
// TODO maybe we can respect throttling here too?
// TODO (ES-12633): test that nothing moves when the source is not-preferred and the target is not-preferred.
if (allocationDecision.type() == Type.NOT_PREFERRED && remainDecision.type() == Type.NOT_PREFERRED) {
// Relocating a shard from one NOT_PREFERRED node to another would not improve the situation.
continue;
}
if (allocationDecision.type().higherThan(bestDecision)) {
bestDecision = allocationDecision.type();
if (bestDecision == Type.YES) {
Expand All @@ -911,6 +915,10 @@ private MoveDecision decideMove(
// no need to continue iterating
break;
}
} else if (bestDecision == Type.NOT_PREFERRED) {
assert remainDecision.type() != Type.NOT_PREFERRED;
// If we don't ever find a YES decision, we'll settle for NOT_PREFERRED as preferable to NO.
targetNode = target;
}
}
}
Expand Down Expand Up @@ -1221,7 +1229,7 @@ private boolean tryRelocateShard(ModelNode minNode, ModelNode maxNode, ProjectIn
continue;
}
final Decision allocationDecision = deciders.canAllocate(shard, minNode.getRoutingNode(), allocation);
if (allocationDecision.type() == Type.NO) {
if (allocationDecision.type() == Type.NO || allocationDecision.type() == Type.NOT_PREFERRED) {
continue;
}

Expand Down Expand Up @@ -1407,7 +1415,7 @@ public boolean containsShard(ShardRouting shard) {
public static final class NodeSorter extends IntroSorter {

final ModelNode[] modelNodes;
/* the nodes weights with respect to the current weight function / index */
/** The nodes weights with respect to the current weight function / index */
final float[] weights;
private final WeightFunction function;
private ProjectIndex index;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -503,9 +503,10 @@ private void moveShards() {

final var routingNode = routingNodes.node(shardRouting.currentNodeId());
final var canRemainDecision = allocation.deciders().canRemain(shardRouting, routingNode, allocation);
if (canRemainDecision.type() != Decision.Type.NO) {
// it's desired elsewhere but technically it can remain on its current node. Defer its movement until later on to give
// priority to shards that _must_ move.
if (canRemainDecision.type() != Decision.Type.NO && canRemainDecision.type() != Decision.Type.NOT_PREFERRED) {
// If movement is throttled, a future reconciliation round will see a resolution. For now, leave it alone.
// Reconciliation treats canRemain NOT_PREFERRED answers as YES because the DesiredBalance computation already decided
// how to handle the situation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will mean we move NO and NOT_PREFERRED shards with the same priority in the reconciler. Not obviously wrong to me, but I wonder if we want to do NO first then NOT_PREFERRED? Seems easier to treat them as the same for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking the other way, prioritizing moves to address hot-spots first, would be more ideal since it addresses a performance problem. Though actually, that would deprioritize shutdown moves.. But then again, timeout during shutdown is often because something else is going on than allocation.

This bit of code, though, doesn't control ordering -- that would have to be a new feature in the code to organize shard selection based on NO vs NOT_PREFERRED, probably hard -- rather it's an early exit if canRemain say YES or THROTTLE.

But yeah, perhaps we'll see a motivation later for something fancier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does control ordering in that shards moved in this phase will consume limited incoming/outgoing recovery slots, right? so shards eligible for movement in this phase will be prioritised before undesired allocations eligible only for movement in the balance() phase?

In saying that it probably makes sense to prioritise NOT_PREFERRED before merely undesired allocations.

continue;
}

Expand Down Expand Up @@ -650,6 +651,7 @@ private DiscoveryNode findRelocationTarget(
Set<String> desiredNodeIds,
BiFunction<ShardRouting, RoutingNode, Decision> canAllocateDecider
) {
DiscoveryNode chosenNode = null;
for (final var nodeId : desiredNodeIds) {
// TODO consider ignored nodes here too?
if (nodeId.equals(shardRouting.currentNodeId())) {
Expand All @@ -661,12 +663,24 @@ private DiscoveryNode findRelocationTarget(
}
final var decision = canAllocateDecider.apply(shardRouting, node);
logger.trace("relocate {} to {}: {}", shardRouting, nodeId, decision);

// Assign shards to the YES nodes first. This way we might delay moving shards to NOT_PREFERRED nodes until after shards are
// first moved away. The DesiredBalance could be moving shards away from a hot node as well as moving shards to it, and it's
// better to offload shards first.
if (decision.type() == Decision.Type.YES) {
return node.node();
chosenNode = node.node();
// As soon as we get any YES, we return it.
break;
} else if (decision.type() == Decision.Type.NOT_PREFERRED && chosenNode == null) {
// If the best answer is not-preferred, then the shard will still be assigned. It is okay to assign to a not-preferred
// node because the desired balance computation had a reason to override it: when there aren't any better nodes to
// choose and the shard cannot remain where it is, we accept not-preferred. NOT_PREFERRED is essentially a YES for
// reconciliation.
chosenNode = node.node();
}
}

return null;
return chosenNode;
}

private Decision decideCanAllocate(ShardRouting shardRouting, RoutingNode target) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,11 @@ public Decision canAllocate(ShardRouting shardRouting, RoutingNode node, Routing
shardRouting.shardId()
);
logger.debug(explain);
return Decision.single(Decision.Type.NO, NAME, explain);
return Decision.single(Decision.Type.NOT_PREFERRED, NAME, explain);
}

if (calculateShardMovementChange(nodeWriteThreadPoolStats, shardWriteLoad) >= nodeWriteThreadPoolLoadThreshold) {
var newWriteThreadPoolUtilization = calculateShardMovementChange(nodeWriteThreadPoolStats, shardWriteLoad);
if (newWriteThreadPoolUtilization >= nodeWriteThreadPoolLoadThreshold) {
// The node's write thread pool usage would be raised above the high utilization threshold with assignment of the new shard.
// This could lead to a hot spot on this node and is undesirable.
String explain = Strings.format(
Expand All @@ -92,10 +93,22 @@ public Decision canAllocate(ShardRouting shardRouting, RoutingNode node, Routing
nodeWriteThreadPoolStats.totalThreadPoolThreads()
);
logger.debug(explain);
return Decision.single(Decision.Type.NO, NAME, explain);
return Decision.single(Decision.Type.NOT_PREFERRED, NAME, explain);
}

return Decision.YES;
String explanation = Strings.format(
"Shard [%s] in index [%s] can be assigned to node [%s]. The node's utilization would become [%s]",
shardRouting.shardId(),
shardRouting.index(),
node.nodeId(),
newWriteThreadPoolUtilization
);

if (logger.isTraceEnabled()) {
logger.trace(explanation);
}

return allocation.decision(Decision.YES, NAME, explanation);
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ public void testWriteLoadDeciderCanAllocate() {
);
assertEquals(
"Assigning a new shard to a node that is above the threshold should fail",
Decision.Type.NO,
Decision.Type.NOT_PREFERRED,
writeLoadDecider.canAllocate(
testHarness.shardRouting2,
testHarness.exceedingThresholdRoutingNode,
Expand All @@ -128,7 +128,7 @@ public void testWriteLoadDeciderCanAllocate() {
);
assertEquals(
"Assigning a new shard that would cause the node to exceed capacity should fail",
Decision.Type.NO,
Decision.Type.NOT_PREFERRED,
writeLoadDecider.canAllocate(testHarness.shardRouting1, testHarness.nearThresholdRoutingNode, testHarness.routingAllocation)
.type()
);
Expand Down