-
Notifications
You must be signed in to change notification settings - Fork 25.6k
allocation: add balancer round summary as metrics #136043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
schase-es
merged 24 commits into
elastic:main
from
schase-es:ES-10343_balancer-round-apm-export
Nov 13, 2025
Merged
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
6704f4f
allocation: add balancer round summary as metrics
schase-es 99844ad
[CI] Auto commit changes from spotless
dbce2bd
Update docs/changelog/136043.yaml
schase-es 840b003
Added enableSending flag in, and some renames
schase-es 5a94657
Added metrics consolidation, correct diff calculation, and some tests.
schase-es c7195fc
[CI] Auto commit changes from spotless
fec6131
Merge branch 'main' into ES-10343_balancer-round-apm-export
schase-es 4d1b9db
Changing metrics for NodeWeightsChanges and NodeWeightsDiff to use a …
schase-es 1b149ef
Remove extra meter registry
schase-es 41a2c72
Send absolute value of diff instead of last + diff
schase-es 26f9c14
Adding shard moves histogram
schase-es 40bf700
[CI] Auto commit changes from spotless
dad8264
Fixes to metrics names and summaries
schase-es 28dbf90
Move of DiscoveryNode as key in BalancingRoundSummary.nodeToWeightCha…
schase-es d89a03c
Renaming nodeNameToWeightChanges to nodeToWeightChanges
schase-es ba786de
[CI] Auto commit changes from spotless
85e4919
CombinedBalancingRoundSummary prints DiscoveryNode name instead of en…
schase-es 52c1ce7
Math.abs on a long is forbidden. Use *= -1 instead
schase-es cb6a196
Style fixes
schase-es d0bb2d3
Move of Math.abs on long into suppressed method
schase-es a650365
Use DiscoveryNodeUtils to make a node
schase-es ac9ee89
Formatting fixes
schase-es adb1741
Adding assertion for Long.MIN_VALUE, better comments and names
schase-es 6eefe31
Merge branch 'main' into ES-10343_balancer-round-apm-export
schase-es File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| pr: 136043 | ||
| summary: "Allocation: add balancer round summary as metrics" | ||
| area: Allocation | ||
| type: enhancement | ||
| issues: [] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
114 changes: 114 additions & 0 deletions
114
...g/elasticsearch/cluster/routing/allocation/allocator/AllocationBalancingRoundMetrics.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,114 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the "Elastic License | ||
| * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side | ||
| * Public License v 1"; you may not use this file except in compliance with, at | ||
| * your election, the "Elastic License 2.0", the "GNU Affero General Public | ||
| * License v3.0 only", or the "Server Side Public License, v 1". | ||
| */ | ||
|
|
||
| package org.elasticsearch.cluster.routing.allocation.allocator; | ||
|
|
||
| import org.elasticsearch.cluster.node.DiscoveryNode; | ||
| import org.elasticsearch.cluster.routing.allocation.allocator.BalancingRoundSummary.NodesWeightsChanges; | ||
| import org.elasticsearch.core.SuppressForbidden; | ||
| import org.elasticsearch.telemetry.metric.DoubleHistogram; | ||
| import org.elasticsearch.telemetry.metric.LongCounter; | ||
| import org.elasticsearch.telemetry.metric.LongHistogram; | ||
| import org.elasticsearch.telemetry.metric.MeterRegistry; | ||
|
|
||
| import java.util.Map; | ||
|
|
||
| /** | ||
| * A telemetry metrics sender for {@link BalancingRoundSummary} | ||
| */ | ||
| public class AllocationBalancingRoundMetrics { | ||
|
|
||
| // counters that measure rounds and moves from the last balancing round | ||
| public static final String NUMBER_OF_BALANCING_ROUNDS_METRIC_NAME = "es.allocator.balancing_round.balancing_rounds.total"; | ||
| public static final String NUMBER_OF_SHARD_MOVES_METRIC_NAME = "es.allocator.balancing_round.shard_moves.total"; | ||
| public static final String NUMBER_OF_SHARD_MOVES_HISTOGRAM_METRIC_NAME = "es.allocator.balancing_round.shard_moves.histogram"; | ||
|
|
||
| // histograms that measure current utilization | ||
| public static final String NUMBER_OF_SHARDS_METRIC_NAME = "es.allocator.balancing_round.shard_count.histogram"; | ||
| public static final String DISK_USAGE_BYTES_METRIC_NAME = "es.allocator.balancing_round.disk_usage_bytes.histogram"; | ||
| public static final String WRITE_LOAD_METRIC_NAME = "es.allocator.balancing_round.write_load.histogram"; | ||
| public static final String TOTAL_WEIGHT_METRIC_NAME = "es.allocator.balancing_round.total_weight.histogram"; | ||
|
|
||
| private final LongCounter balancingRoundCounter; | ||
| private final LongCounter shardMovesCounter; | ||
| private final LongHistogram shardMovesHistogram; | ||
|
|
||
| private final LongHistogram shardCountHistogram; | ||
| private final DoubleHistogram diskUsageHistogram; | ||
| private final DoubleHistogram writeLoadHistogram; | ||
| private final DoubleHistogram totalWeightHistogram; | ||
|
|
||
| public static AllocationBalancingRoundMetrics NOOP = new AllocationBalancingRoundMetrics(MeterRegistry.NOOP); | ||
|
|
||
| public AllocationBalancingRoundMetrics(MeterRegistry meterRegistry) { | ||
| this.balancingRoundCounter = meterRegistry.registerLongCounter( | ||
| NUMBER_OF_BALANCING_ROUNDS_METRIC_NAME, | ||
| "Total number of balancing rounds", | ||
| "unit" | ||
| ); | ||
| this.shardMovesCounter = meterRegistry.registerLongCounter( | ||
| NUMBER_OF_SHARD_MOVES_METRIC_NAME, | ||
| "Total number of shard moves", | ||
| "unit" | ||
| ); | ||
|
|
||
| this.shardMovesHistogram = meterRegistry.registerLongHistogram( | ||
| NUMBER_OF_SHARD_MOVES_HISTOGRAM_METRIC_NAME, | ||
| "Number of shard movements executed in a balancing round", | ||
| "unit" | ||
| ); | ||
| this.shardCountHistogram = meterRegistry.registerLongHistogram( | ||
| NUMBER_OF_SHARDS_METRIC_NAME, | ||
| "change in node shard count per balancing round", | ||
| "unit" | ||
| ); | ||
| this.diskUsageHistogram = meterRegistry.registerDoubleHistogram( | ||
| DISK_USAGE_BYTES_METRIC_NAME, | ||
| "change in disk usage in bytes per balancing round", | ||
| "unit" | ||
| ); | ||
| this.writeLoadHistogram = meterRegistry.registerDoubleHistogram( | ||
| WRITE_LOAD_METRIC_NAME, | ||
| "change in write load per balancing round", | ||
| "1.0" | ||
| ); | ||
| this.totalWeightHistogram = meterRegistry.registerDoubleHistogram( | ||
| TOTAL_WEIGHT_METRIC_NAME, | ||
| "change in total weight per balancing round", | ||
| "1.0" | ||
| ); | ||
| } | ||
|
|
||
| @SuppressForbidden(reason = "ForbiddenAPIs bans Math.abs(long) because of overflow on Long.MIN_VALUE, but this is impossible here") | ||
| private long longAbsNegativeSafe(long value) { | ||
| assert value != Long.MIN_VALUE : "value must not be Long.MIN_VALUE"; | ||
| return Math.abs(value); | ||
| } | ||
|
|
||
| public void addBalancingRoundSummary(BalancingRoundSummary summary) { | ||
| balancingRoundCounter.increment(); | ||
| shardMovesCounter.incrementBy(summary.numberOfShardsToMove()); | ||
| shardMovesHistogram.record(summary.numberOfShardsToMove()); | ||
|
|
||
| for (Map.Entry<DiscoveryNode, NodesWeightsChanges> changesEntry : summary.nodeToWeightChanges().entrySet()) { | ||
| DiscoveryNode node = changesEntry.getKey(); | ||
| NodesWeightsChanges weightChanges = changesEntry.getValue(); | ||
| BalancingRoundSummary.NodeWeightsDiff weightsDiff = weightChanges.weightsDiff(); | ||
|
|
||
| shardCountHistogram.record(longAbsNegativeSafe(weightsDiff.shardCountDiff()), getNodeAttributes(node)); | ||
| diskUsageHistogram.record(Math.abs(weightsDiff.diskUsageInBytesDiff()), getNodeAttributes(node)); | ||
| writeLoadHistogram.record(Math.abs(weightsDiff.writeLoadDiff()), getNodeAttributes(node)); | ||
| totalWeightHistogram.record(Math.abs(weightsDiff.totalWeightDiff()), getNodeAttributes(node)); | ||
| } | ||
| } | ||
|
|
||
| private Map<String, Object> getNodeAttributes(DiscoveryNode node) { | ||
| return Map.of("node_name", node.getName(), "node_id", node.getId()); | ||
| } | ||
nicktindall marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we discuss having a histogram for shard moves? To avoid the combine effect, merging multiple balancing rounds together 🤔 I can't recall what we decided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nick and I talked about it last week again, and we decided to send the absolute difference with each summary. (The comment I made that seemed out of place was to create a link for Nick to see.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a histogram for shard moves as well, those differences are per-node, so we'll see how many shards moved on/off each node, but I think having a histogram of total shard move count would be good too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have the shard moves per node as a stat right now, so we can't have that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The high level conclusion Simon and I were discussing is that generally histograms are going to be more useful for the balancer round summary metrics because our goal is to see per balancing round granularity, as best we can. We don't have a use case for wanting to see how many total shard moves take place over time, but we do want to see roughly how many shard moves happen per balancing round, and the histogram will give us that.
So let's go with a histogram and remove the counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, these metrics aren't available in the balancing round summary. We can certainly try to get them out of the Desired Balancer and get them in through here.