Return NOT_PREFERRED decisions in allocation explain #137228

DiannaHohensee · 2025-10-28T00:35:35Z

Adds numerous NOT_PREFERRED options to allocation decision / status types.

Adds NOT_PREFERRED option to AllocationDecision (resolving ES-12729).

A significant change is to re-order comparison of Decision.Type enum
values, such that THROTTLE is chosen over NOT_PREFERRED. Functionally
this change should not matter because simulation (DesiredBalance
computation) does not throttle and reconciliation (real shard movement)
treats not-preferred essentially as a YES: they are not compared.

Closes ES-12833, ES-13288, ES-12729

Adds numerous NOT_PREFERRED options to allocation decision / status types. Adds NOT_PREFERRED option to AllocationDecision (resolving ES-12729). Closes ES-12833, ES-13288, ES-12729

elasticsearchmachine · 2025-11-08T02:29:21Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

DiannaHohensee · 2025-11-08T02:33:39Z

Adding @DaveCTurner as a reviewer in case he wants to take a high level look for anything I'm missing -- I saw in git-blame that he wrote the Explanations.java file a while ago. But optional.

DiannaHohensee · 2025-11-08T02:38:01Z

I think I might be missing test coverage (if not functionality) of the allocate unassigned code path. However, I've been hacking on the rebalancing code path for a while now to get this all to work sensibly, so if agreeable I might just file another ticket to explore that.

nicktindall · 2025-11-09T23:01:48Z

.../java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderIT.java

+            .getShardAllocationDecision()
+            .getMoveDecision()
+            .getNodeDecisions();
+        assertThat(canAllocateDecisions.size(), equalTo(2));


can we give 2 a name somewhere? I assume it's {number of cluster nodes} - 1 perhaps we should make our expectedNumberOfClusterNodes explicit and assert that it's accurate after we run the other test logic?

Yep, updated to indicate purpose and make dynamic 👍

nicktindall · 2025-11-09T23:09:08Z

.../java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderIT.java

                TimeValue.timeValueMillis(queueLatencyThresholdMillis)
            )
+            // Keep all the debug logging, no throttling of decider messages.
+            .put(WriteLoadConstraintSettings.WRITE_LOAD_DECIDER_MINIMUM_LOGGING_INTERVAL.getKey(), TimeValue.timeValueMinutes(0))


Nit: is this necessary, I don't think we're asserting on these messages anywhere are we? could also use TimeValue.ZERO

I don't recall why I added this anymore. I think perhaps for general debug purposes, in case of a test failure.

There don't appear to be any tests in this file with *Decider DEBUG logging turned on, so I'll remove it 👍

nicktindall · 2025-11-09T23:13:52Z

.../java/org/elasticsearch/action/admin/cluster/allocation/ClusterAllocationExplainRequest.java

+        int shard,
+        boolean primary,
+        @Nullable String currentNode
+    ) {


Nit: I don't think this is necessary, you can call

ClusterAllocationExplainRequest allocationExplainRequest = new ClusterAllocationExplainRequest(TEST_REQUEST_TIMEOUT) .setIndex(harness.indexName) .setShard(0), .setPrimary(true);

Ah, thank you! Reverted.

nicktindall · 2025-11-09T23:32:07Z

...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java

                    }
                    if (allocationDecision.type().higherThan(bestDecision)) {
+                        assert allocationDecision.type() != Type.THROTTLE
+                            : "DesiredBalance computations run in a simulation mode and should not encounter throttling";


Sorry, how do we know we're simulating here?

I added this as extra protection because I changed the Decision.Type enum ordering, such that the higherThan gate above would now prefer (return true for) THROTTLE over NOT_PREFERRED.

Because this is simulation, we should never see THROTTLE.

nicktindall · 2025-11-09T23:37:45Z

...r/src/main/java/org/elasticsearch/cluster/routing/allocation/decider/AllocationDeciders.java

+            assert decider.canRebalance(shardRouting, allocation).type() != Decision.Type.THROTTLE
+                : decider.getClass().getSimpleName() + " throttled unexpectedly in canRebalance";
+            return decider.canRebalance(shardRouting, allocation);
+        }, (decider, decision) -> Strings.format("Can not rebalance [%s]. [%s]: %s", shardRouting, decider, decision));


ConcurrentRebalanceAllocationDecider appears to return THROTTLE from canRebalance ?

Nit: I'd prefer if we stored the result of canRebalance then ran the assertion after it rather than running canRebalance twice?

You're right, reverted 👍

nicktindall · 2025-11-09T23:54:38Z

server/src/main/java/org/elasticsearch/cluster/routing/allocation/decider/Decision.java

        NOT_PREFERRED,
+        // Temporarily throttled is a better choice than choosing a not-preferred node,
+        // but NOT_PREFERRED and THROTTLED are generally not comparable.
+        THROTTLE,


I don't think we can do this can we? It'll mean in the presence of a THROTTLE and a NOT_PREFERRED, canAllocate will return NOT_PREFERRED, which I think is bad for non-desired-balance allocation, because there we should wait for a THROTTLE to become a yes before allocating to a NOT_PREFERRED?

I don't believe there's currently any direct comparison between THROTTLE and NOT_PREFERRED. But we still need to specify an ordering here, to not break how one or the other gets compared to YES and NO.

It'll mean in the presence of a THROTTLE and a NOT_PREFERRED, canAllocate will return NOT_PREFERRED

I think this would be the other way around, THROTTLE would be higher value, closer to YES, and chosen over NOT_PREFERRED. So, IIUC, we both want it the current way.

I've a little blurb for the commit message:

"A significant change is to re-order comparison of Decision.Type enum
values, such that THROTTLE is chosen over NOT_PREFERRED. Functionally
this change should not matter because simulation (DesiredBalance
computation) does not throttle and reconciliation (real shard movement)
treats not-preferred essentially as a YES: they are not compared."

nicktindall · 2025-11-09T23:59:04Z

...r/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocateUnassignedDecision.java

    @Override
    public String getExplanation() {
        checkDecisionState();
        return switch (getAllocationDecision()) {


Looking at AllocateUnassignedDecision#getAllocationDecision it appears as though it'll never return NOT_PREFERRED, see AllocationDecision#fromAllocationStatus

You're right, fixed 👍

nicktindall · 2025-11-10T00:01:02Z

server/src/main/java/org/elasticsearch/cluster/routing/allocation/Explanations.java

+        public static final String NOT_PREFERRED = """
+            Elasticsearch will not rebalance this shard to another node because all other eligible nodes have high resource usage. The \
+            total cluster balance weights might improve, were the shard relocated, but it would push one resource usage dimension \
+            too high and threaten performance. See the node-by-node explanation to understand what resource usage is high.""";


This and the above description sounds over-fitted to the write-load decider. The index balance decider will also return NOT_PREFERRED right? and that has nothing to do with resources?

The index balance decider is also interested in resources, in the sense that a spike in an index' ingest would be more safely distributed across as many nodes as possible. Those are the lines along which I was thinking.

I've tweaked the phrasing a little, but not sure if as much as desirable. If you have any suggestions, let me know.

nicktindall · 2025-11-10T00:06:37Z

server/src/main/java/org/elasticsearch/cluster/routing/allocation/MoveDecision.java

            }
+            // TODO (ES-13482): clusterRebalanceDecision is set to the result of AllocationDecider#canRebalance, which does not return
+            // NOT_PREFERRED or THROTTLE. This switch statement, and how MoveDecision uses clusterRebalanceDecision, should be
+            // refactored.


I think it still does? see org/elasticsearch/cluster/routing/allocation/decider/ConcurrentRebalanceAllocationDecider.java:182

Looks like Simon committed some changes concurrently with my PR.

But I think I was wrong, anyway: I didn't realize the complexities of canRebalance.

Removed 👍 Thanks!

DiannaHohensee

Thanks for the review, I'm finally returning to this. Fixed up the code per your feedback.

DiannaHohensee · 2025-11-20T20:12:26Z

.../java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderIT.java

+            .getShardAllocationDecision()
+            .getMoveDecision()
+            .getNodeDecisions();
+        assertThat(canAllocateDecisions.size(), equalTo(2));


Yep, updated to indicate purpose and make dynamic 👍

DiannaHohensee · 2025-11-20T20:15:51Z

.../java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderIT.java

                TimeValue.timeValueMillis(queueLatencyThresholdMillis)
            )
+            // Keep all the debug logging, no throttling of decider messages.
+            .put(WriteLoadConstraintSettings.WRITE_LOAD_DECIDER_MINIMUM_LOGGING_INTERVAL.getKey(), TimeValue.timeValueMinutes(0))


I don't recall why I added this anymore. I think perhaps for general debug purposes, in case of a test failure.

There don't appear to be any tests in this file with *Decider DEBUG logging turned on, so I'll remove it 👍

DiannaHohensee · 2025-11-20T20:17:30Z

.../java/org/elasticsearch/action/admin/cluster/allocation/ClusterAllocationExplainRequest.java

+        int shard,
+        boolean primary,
+        @Nullable String currentNode
+    ) {


Ah, thank you! Reverted.

DiannaHohensee · 2025-11-20T20:29:04Z

...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java

                    }
                    if (allocationDecision.type().higherThan(bestDecision)) {
+                        assert allocationDecision.type() != Type.THROTTLE
+                            : "DesiredBalance computations run in a simulation mode and should not encounter throttling";


I added this as extra protection because I changed the Decision.Type enum ordering, such that the higherThan gate above would now prefer (return true for) THROTTLE over NOT_PREFERRED.

Because this is simulation, we should never see THROTTLE.

DiannaHohensee · 2025-11-20T20:47:06Z

server/src/main/java/org/elasticsearch/cluster/routing/allocation/decider/Decision.java

        NOT_PREFERRED,
+        // Temporarily throttled is a better choice than choosing a not-preferred node,
+        // but NOT_PREFERRED and THROTTLED are generally not comparable.
+        THROTTLE,


I don't believe there's currently any direct comparison between THROTTLE and NOT_PREFERRED. But we still need to specify an ordering here, to not break how one or the other gets compared to YES and NO.

It'll mean in the presence of a THROTTLE and a NOT_PREFERRED, canAllocate will return NOT_PREFERRED

I think this would be the other way around, THROTTLE would be higher value, closer to YES, and chosen over NOT_PREFERRED. So, IIUC, we both want it the current way.

I've a little blurb for the commit message:

"A significant change is to re-order comparison of Decision.Type enum
values, such that THROTTLE is chosen over NOT_PREFERRED. Functionally
this change should not matter because simulation (DesiredBalance
computation) does not throttle and reconciliation (real shard movement)
treats not-preferred essentially as a YES: they are not compared."

DiannaHohensee · 2025-11-20T22:04:28Z

server/src/main/java/org/elasticsearch/cluster/routing/allocation/MoveDecision.java

            }
+            // TODO (ES-13482): clusterRebalanceDecision is set to the result of AllocationDecider#canRebalance, which does not return
+            // NOT_PREFERRED or THROTTLE. This switch statement, and how MoveDecision uses clusterRebalanceDecision, should be
+            // refactored.


Looks like Simon committed some changes concurrently with my PR.

But I think I was wrong, anyway: I didn't realize the complexities of canRebalance.

Removed 👍 Thanks!

DiannaHohensee · 2025-11-20T22:13:44Z

...r/src/main/java/org/elasticsearch/cluster/routing/allocation/decider/AllocationDeciders.java

+            assert decider.canRebalance(shardRouting, allocation).type() != Decision.Type.THROTTLE
+                : decider.getClass().getSimpleName() + " throttled unexpectedly in canRebalance";
+            return decider.canRebalance(shardRouting, allocation);
+        }, (decider, decision) -> Strings.format("Can not rebalance [%s]. [%s]: %s", shardRouting, decider, decision));


You're right, reverted 👍

DiannaHohensee · 2025-11-20T22:19:10Z

...r/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocateUnassignedDecision.java

    @Override
    public String getExplanation() {
        checkDecisionState();
        return switch (getAllocationDecision()) {


You're right, fixed 👍

DiannaHohensee · 2025-11-20T22:29:31Z

server/src/main/java/org/elasticsearch/cluster/routing/allocation/Explanations.java

+        public static final String NOT_PREFERRED = """
+            Elasticsearch will not rebalance this shard to another node because all other eligible nodes have high resource usage. The \
+            total cluster balance weights might improve, were the shard relocated, but it would push one resource usage dimension \
+            too high and threaten performance. See the node-by-node explanation to understand what resource usage is high.""";


The index balance decider is also interested in resources, in the sense that a spike in an index' ingest would be more safely distributed across as many nodes as possible. Those are the lines along which I was thinking.

I've tweaked the phrasing a little, but not sure if as much as desirable. If you have any suggestions, let me know.

DiannaHohensee self-assigned this Oct 28, 2025

elasticsearchmachine added the v9.3.0 label Oct 28, 2025

DiannaHohensee force-pushed the 2025/10/23/ES-12833 branch 4 times, most recently from b03f11f to be58da7 Compare October 30, 2025 20:21

DiannaHohensee force-pushed the 2025/10/23/ES-12833 branch 6 times, most recently from 13322ec to 17de33c Compare November 8, 2025 02:07

Return NOT_PREFERRED decisions in allocation explain

4a0ee8a

Adds numerous NOT_PREFERRED options to allocation decision / status types. Adds NOT_PREFERRED option to AllocationDecision (resolving ES-12729). Closes ES-12833, ES-13288, ES-12729

DiannaHohensee force-pushed the 2025/10/23/ES-12833 branch from 17de33c to 4a0ee8a Compare November 8, 2025 02:19

DiannaHohensee changed the title ~~[DRAFT] Return NOT_PREFERRED decisions in allocation explain~~ Return NOT_PREFERRED decisions in allocation explain Nov 8, 2025

DiannaHohensee marked this pull request as ready for review November 8, 2025 02:28

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Nov 8, 2025

DiannaHohensee requested review from DaveCTurner and nicktindall November 8, 2025 02:29

improve assert message

7165744

nicktindall reviewed Nov 10, 2025

View reviewed changes

Merge branch 'main' into 2025/10/23/ES-12833

00a44f4

DiannaHohensee removed the request for review from DaveCTurner November 20, 2025 19:58

fixes per Nick's review

debef72

DiannaHohensee commented Nov 20, 2025

View reviewed changes

comment improvement

08fad2f

Return NOT_PREFERRED decisions in allocation explain #137228

Are you sure you want to change the base?

Return NOT_PREFERRED decisions in allocation explain #137228

Uh oh!

Conversation

DiannaHohensee commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Nov 8, 2025

Uh oh!

DiannaHohensee commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DiannaHohensee commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DiannaHohensee left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

DiannaHohensee commented Oct 28, 2025 •

edited

Loading

DiannaHohensee commented Nov 8, 2025 •

edited

Loading

DiannaHohensee commented Nov 8, 2025 •

edited

Loading

DiannaHohensee left a comment •

edited

Loading