Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve shard balancing #91603

Merged
merged 29 commits into from
Nov 16, 2022
Merged

Improve shard balancing #91603

merged 29 commits into from
Nov 16, 2022

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Nov 16, 2022

Include shard write load and disk usage as a balancing factor.

Relates #17213

@fcofdez fcofdez added >enhancement :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team v8.6.0 labels Nov 16, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @fcofdez, I've created a changelog YAML for you.

if (forecastedShardSizeInBytes.isPresent()) {
indexDiskUsageInBytes = forecastedShardSizeInBytes.getAsLong() * numberOfCopies(indexMetadata);
} else {
long totalSizeInBytes = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should fallback to the cluster info in those cases?

}

float weight(Balancer balancer, ModelNode node, String index) {
final float weightShard = node.numShards() - balancer.avgShardsPerNode();
final float weightIndex = node.numShards(index) - balancer.avgShardsPerNode(index);
return theta0 * weightShard + theta1 * weightIndex;
final float ingestLoad = (float) (node.writeLoad() - balancer.avgWriteLoadPerNode());
// TODO: can this overflow?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're casting from long to float here, I think it should be fine in most cases? but just wanted to double check

Copy link
Contributor

@idegtiarenko idegtiarenko Nov 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken float is [1.175494351 E - 38 3.402823466 E + 38].
Petabyte would be 1e15 so we should be in range for some time now.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

}

// TODO: Should we go through the cluster info service and compute the average in this case?
return shardCount == 0 ? 0 : (totalSizeInBytes / shardCount) * numberOfCopies(indexMetadata);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henningandersen do you think that computing the average with the available shards and then multiplying by the number of copies is the right call here? I think it's fine in most cases, but I just want to confirm that my intuition is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds fine to me.

I wonder if we should also ask the cluster info though in case there is no forecasted shard size in diskUsageInBytesPerShard? DIfferent topic, also fine to do in a follow-up ofc.

@fcofdez
Copy link
Contributor Author

fcofdez commented Nov 16, 2022

@elasticmachine run elasticsearch-ci/bwc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Distributed Meta label for distributed team v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants