Improve shard balancing #91603

fcofdez · 2022-11-16T09:00:00Z

Include shard write load and disk usage as a balancing factor.

# Conflicts: # server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java # server/src/main/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java

…ion/allocator/BalancedShardsAllocator.java Co-authored-by: David Turner <david.turner@elastic.co>

elasticsearchmachine · 2022-11-16T09:00:24Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine · 2022-11-16T09:00:25Z

Hi @fcofdez, I've created a changelog YAML for you.

fcofdez · 2022-11-16T09:02:23Z

...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java

+            if (forecastedShardSizeInBytes.isPresent()) {
+                indexDiskUsageInBytes = forecastedShardSizeInBytes.getAsLong() * numberOfCopies(indexMetadata);
+            } else {
+                long totalSizeInBytes = 0;


I'm not sure if we should fallback to the cluster info in those cases?

fcofdez · 2022-11-16T09:02:55Z

...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java

        }

        float weight(Balancer balancer, ModelNode node, String index) {
            final float weightShard = node.numShards() - balancer.avgShardsPerNode();
            final float weightIndex = node.numShards(index) - balancer.avgShardsPerNode(index);
-            return theta0 * weightShard + theta1 * weightIndex;
+            final float ingestLoad = (float) (node.writeLoad() - balancer.avgWriteLoadPerNode());
+            // TODO: can this overflow?


We're casting from long to float here, I think it should be fine in most cases? but just wanted to double check

If I am not mistaken float is [1.175494351 E - 38 3.402823466 E + 38].
Petabyte would be 1e15 so we should be in range for some time now.

...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java

…into balance-disk-usage

henningandersen

LGTM.

fcofdez · 2022-11-16T10:19:21Z

...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java

+            }
+
+            // TODO: Should we go through the cluster info service and compute the average in this case?
+            return shardCount == 0 ? 0 : (totalSizeInBytes / shardCount) * numberOfCopies(indexMetadata);


@henningandersen do you think that computing the average with the available shards and then multiplying by the number of copies is the right call here? I think it's fine in most cases, but I just want to confirm that my intuition is correct.

Yeah, that sounds fine to me.

I wonder if we should also ask the cluster info though in case there is no forecasted shard size in diskUsageInBytesPerShard? DIfferent topic, also fine to do in a follow-up ofc.

fcofdez · 2022-11-16T11:21:53Z

@elasticmachine run elasticsearch-ci/bwc

idegtiarenko and others added 23 commits November 8, 2022 11:24

initial write load

d27b4e5

make it configurable

4eb3416

upd

4c7b3b9

upd

c8cea00

Merge branch 'main' into balance_write_load

d1980e4

upd

7a8c7c8

upd

4ae0f76

calculate localThreshold once per index

c0664ae

Merge branch 'main' into balance_write_load

d04954e

# Conflicts: # server/src/main/java/org/elasticsearch/cluster/metadata/IndexMetadata.java # server/src/main/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java

fix merge

f35a0f9

fix condition

bd531a0

use WriteLoadForecaster

2c72d26

fix API usage

c9550da

fix API usage

e853655

Merge branch 'main' into balance_write_load

f6667f8

default to 0.0f

b8f0b47

fix write load calculation

50aea66

Update server/src/main/java/org/elasticsearch/cluster/routing/allocat…

722eba6

…ion/allocator/BalancedShardsAllocator.java Co-authored-by: David Turner <david.turner@elastic.co>

advance by a smaller step

d7a3808

update default

af05cdf

Merge branch 'main' into balance_write_load

c0edb00

fix merge

02e77e1

Balance based on disk usage

f47d5e7

fcofdez added >enhancement :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team v8.6.0 labels Nov 16, 2022

fcofdez requested review from idegtiarenko, DaveCTurner and henningandersen November 16, 2022 09:00

Update docs/changelog/91603.yaml

c21e56f

fcofdez commented Nov 16, 2022

View reviewed changes

idegtiarenko approved these changes Nov 16, 2022

View reviewed changes

fcofdez commented Nov 16, 2022

View reviewed changes

idegtiarenko reviewed Nov 16, 2022

View reviewed changes

...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java Outdated Show resolved Hide resolved

fcofdez added 2 commits November 16, 2022 10:20

Fix calculation

0d857bb

Merge branch 'improve-balancing' of github.com:fcofdez/elasticsearch …

d55dc33

…into balance-disk-usage

henningandersen approved these changes Nov 16, 2022

View reviewed changes

Mute tests

2455e5a

fcofdez commented Nov 16, 2022

View reviewed changes

fcofdez added 2 commits November 16, 2022 11:43

Fix

bffe4cd

Unmute tests

bdd6fd7

fcofdez merged commit 9e93645 into elastic:main Nov 16, 2022

idegtiarenko mentioned this pull request Nov 16, 2022

(draft) Balance an index write load #91392

Closed

yangsongbai mentioned this pull request Jan 30, 2024

dynamic shards per_node limit #104860

Open

seut mentioned this pull request May 7, 2024

Improve primary shards balancing/reduce primary shard write overhead crate/crate#15919

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve shard balancing #91603

Improve shard balancing #91603

fcofdez commented Nov 16, 2022 •

edited

Loading

elasticsearchmachine commented Nov 16, 2022

elasticsearchmachine commented Nov 16, 2022

fcofdez Nov 16, 2022

fcofdez Nov 16, 2022

idegtiarenko Nov 16, 2022 •

edited

Loading

henningandersen left a comment

fcofdez Nov 16, 2022

henningandersen Nov 16, 2022

fcofdez commented Nov 16, 2022

Improve shard balancing #91603

Improve shard balancing #91603

Conversation

fcofdez commented Nov 16, 2022 • edited Loading

elasticsearchmachine commented Nov 16, 2022

elasticsearchmachine commented Nov 16, 2022

fcofdez Nov 16, 2022

Choose a reason for hiding this comment

fcofdez Nov 16, 2022

Choose a reason for hiding this comment

idegtiarenko Nov 16, 2022 • edited Loading

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

fcofdez Nov 16, 2022

Choose a reason for hiding this comment

henningandersen Nov 16, 2022

Choose a reason for hiding this comment

fcofdez commented Nov 16, 2022

fcofdez commented Nov 16, 2022 •

edited

Loading

idegtiarenko Nov 16, 2022 •

edited

Loading