Skip to content

Conversation

@schase-es
Copy link
Contributor

This commit adds the BalancerRoundSummary as a collection of APM/open telemetry metrics. These are already logged. The summary collected every ten seconds or so is set as the current state into the telemetry metrics class (AllocationBalancingRoundMetrics). Whenever the telemetry runs, each metric picks up its current view.

Fixes: ES-10343

This commit adds the BalancerRoundSummary as a collection of APM/open telemetry
metrics. These are already logged. The summary collected every ten seconds or so
is set as the current state into the telemetry metrics class
(AllocationBalancingRoundMetrics). Whenever the telemetry runs, each metric
picks up its current view.
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.3.0 labels Oct 6, 2025
@schase-es schase-es added :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement and removed needs:triage Requires assignment of a team area label labels Oct 8, 2025
@schase-es schase-es marked this pull request as draft October 8, 2025 20:15
@elasticsearchmachine
Copy link
Collaborator

Hi @schase-es, I've created a changelog YAML for you.

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments, I think we should discuss with Dianna what we're aiming for with the node deltas

bind(AllocationStatsService.class).toInstance(allocationStatsService);
bind(TelemetryProvider.class).toInstance(telemetryProvider);
bind(DesiredBalanceMetrics.class).toInstance(desiredBalanceMetrics);
bind(AllocationBalancingRoundMetrics.class).toInstance(balancingRoundMetrics);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binding like this is only necessary if we use the instance in a cluster annotated @Inject, I'm not sure if we do with the AllocationBalancingRoundMetrics? It probably doesn't matter, but I think in general we don't do it unless we need to.

assert summary != null : "balancing round metrics cannot be null";

nodeNameToWeightChangesRef.set(summary.nodeNameToWeightChanges());
if (enableSending) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set the nodeNameToWeightChangesRef if enableSending = false ?

long shardCount = nodeWeightChanges.baseWeights().shardCount() + nodeWeightChanges.weightsDiff().shardCountDiff();
metrics.add(new LongWithAttributes(shardCount, getNodeAttributes(nodeWeights.getKey())));
}
return metrics;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want the current values of shard count, disk usage and write load, because we already have those in the cluster balance dashboard

Perhaps the deltas are more interesting? I think number of balancing rounds, and shard movements are definitely valuable as you have them published now, but I'm less clear on the value of the specific node shard/weight/disk usage deltas/values that we don't get already from existing metrics.

Maybe the absolute amount of change (e.g. if one node loses X weight and another gains X the sum would be 2X) in those values might be interesting?

If we plotted any of these values for the cluster by simply adding them together, they'd sum to zero I think? because for every node gaining X shards there are other node(s) losing X shards. @DiannaHohensee might have more clarity on the direction here.

I think for serverless as well, disk usage will always be zero after this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed I think we should reduce the scope on this PR to just the two metrics we know how to publish and do a second PR when we've discussed how we'd like to publish the write load/shard count/disk usage

@schase-es schase-es marked this pull request as ready for review October 27, 2025 23:38
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Oct 27, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

shardCountHistogram.record(baseWeights.shardCount() + weightsDiff.shardCountDiff(), getNodeAttributes(nodeName));
diskUsageHistogram.record(baseWeights.diskUsageInBytes() + weightsDiff.diskUsageInBytesDiff(), getNodeAttributes(nodeName));
writeLoadHistogram.record(baseWeights.writeLoad() + weightsDiff.writeLoadDiff(), getNodeAttributes(nodeName));
totalWeightHistogram.record(baseWeights.nodeWeight() + weightsDiff.totalWeightDiff(), getNodeAttributes(nodeName));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the metrics I'm wondering about

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the question you're thinking about? What value to put in the histogram?

Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do a thorough review, I'll have to take another look/think later. But publishing some comments as is.

NUMBER_OF_SHARD_MOVES_METRIC_NAME,
"Current number of shard moves",
"{shard}"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we discuss having a histogram for shard moves? To avoid the combine effect, merging multiple balancing rounds together 🤔 I can't recall what we decided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nick and I talked about it last week again, and we decided to send the absolute difference with each summary. (The comment I made that seemed out of place was to create a link for Nick to see.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a histogram for shard moves as well, those differences are per-node, so we'll see how many shards moved on/off each node, but I think having a histogram of total shard move count would be good too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have the shard moves per node as a stat right now, so we can't have that :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high level conclusion Simon and I were discussing is that generally histograms are going to be more useful for the balancer round summary metrics because our goal is to see per balancing round granularity, as best we can. We don't have a use case for wanting to see how many total shard moves take place over time, but we do want to see roughly how many shard moves happen per balancing round, and the histogram will give us that.

So let's go with a histogram and remove the counter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a histogram for shard moves as well, those differences are per-node, so we'll see how many shards moved on/off each node.

Right now, these metrics aren't available in the balancing round summary. We can certainly try to get them out of the Desired Balancer and get them in through here.

shardCountHistogram.record(baseWeights.shardCount() + weightsDiff.shardCountDiff(), getNodeAttributes(nodeName));
diskUsageHistogram.record(baseWeights.diskUsageInBytes() + weightsDiff.diskUsageInBytesDiff(), getNodeAttributes(nodeName));
writeLoadHistogram.record(baseWeights.writeLoad() + weightsDiff.writeLoadDiff(), getNodeAttributes(nodeName));
totalWeightHistogram.record(baseWeights.nodeWeight() + weightsDiff.totalWeightDiff(), getNodeAttributes(nodeName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the question you're thinking about? What value to put in the histogram?

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just added some comments

NUMBER_OF_SHARD_MOVES_METRIC_NAME,
"Current number of shard moves",
"{shard}"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a histogram for shard moves as well, those differences are per-node, so we'll see how many shards moved on/off each node, but I think having a histogram of total shard move count would be good too.

Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sync'ed with Simon. I think this looks great, just need a histogram for the shardMoves to replace the LongCounter shardMovesCounter.

@nicktindall can I leave the final sign off to you? You're more familiar with this area of the code (APM metrics) than I am right now.

Update: to add an offline discussion, we decided that we'd like the shardMoveCount to also be reported as a histogram. So we'll have both a LongCounter shardMoveCount and LongHistogram shardMoveHistogram.

public static final String NUMBER_OF_SHARDS_METRIC_NAME = "es.allocator.balancing_round.shard_count";
public static final String DISK_USAGE_BYTES_METRIC_NAME = "es.allocator.balancing_round.disk_usage_bytes";
public static final String WRITE_LOAD_METRIC_NAME = "es.allocator.balancing_round.write_load";
public static final String TOTAL_WEIGHT_METRIC_NAME = "es.allocator.balancing_round.total_weight";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the metrics registry made you end histogram metric names in .histogram or something?

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, with a few nits and a very annoying change request to the labels we're using.

this.shardMovesCounter = meterRegistry.registerLongCounter(
NUMBER_OF_SHARD_MOVES_METRIC_NAME,
"Current number of shard moves",
"{shard}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think the convention is to use unit when it's just a count ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out -- this is one of the conventions that's been hard to understand.


this.shardMovesHistogram = meterRegistry.registerLongHistogram(
NUMBER_OF_SHARD_MOVES_HISTOGRAM_METRIC_NAME,
"Histogram of shard moves",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we say "number of shard movements executed in a balancing round" to make it clear it's a per-round count

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great docs suggestions -- thanks!

);
this.shardMovesCounter = meterRegistry.registerLongCounter(
NUMBER_OF_SHARD_MOVES_METRIC_NAME,
"Current number of shard moves",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we say "total number of shard moves" also above, current is a term often used for gauges

"Histogram of shard moves",
"unit"
);
this.shardCountHistogram = meterRegistry.registerLongHistogram(NUMBER_OF_SHARDS_METRIC_NAME, "Current number of shards", "unit");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: perhaps "change in node shard count per balancing round" to make it clear it's node-scoped and a delta

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also similar for the descriptions below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with some minor nits. Nice one.

long shardCountDiff = weightsDiff.shardCountDiff();
if (shardCountDiff < 0) {
shardCountDiff *= -1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow the reasoning here? can't we just use Math.abs if we know the diff won't be MIN_VALUE?

Copy link
Contributor Author

@schase-es schase-es Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the forbidden APIs unfortunately -- it fails the math.abs call, but curious does not fail this *= -1 call. It's an annoyance because the whole point of longs is that the max or min value are so, so big that you'll never see it.

I should mention the forbidden APIs in the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that makes sense, yes either mention it in the comment, or there's also org.elasticsearch.core.SuppressForbidden, but on a large method that might be dangerous because I think it's scoped to the whole method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it into a private method longAbsNegativeSafe -- it matches the other metrics, and it's clearer as the suppress annotation has space for a reason.

nodeToWeightChanges.forEach((node, nodesWeightChanges) -> nodeNameToWeightChanges.put(node.getName(), nodesWeightChanges));

return Strings.format(
"CombinedBalancingRoundSummary[numberOfBalancingRounds=%d, nodeToWeightChange=%s, " + "numberOfShardMoves=%d]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: looks like an auto-format mishap (unnecessary concatenation)

Map.of(),
Set.of(),
null
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: there is also org.elasticsearch.cluster.node.DiscoveryNodeUtils for creating DiscoveryNode instances which might make this less verbose?

final String NODE_1 = "node1";
final String NODE_2 = "node2";
final DiscoveryNode NODE_1 = new DiscoveryNode("node1", "node1Id", "eph-node1", "abc", "abc", null, Map.of(), Set.of(), null);
final DiscoveryNode NODE_2 = new DiscoveryNode("node2", "node2Id", "eph-node2", "abc", "abc", null, Map.of(), Set.of(), null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here (org.elasticsearch.cluster.node.DiscoveryNodeUtils)


@SuppressForbidden(reason = "ForbiddenAPIs bans Math.abs on longs, but long.MIN_VALUE is impossible here")
private long longAbsNegativeSafe(long input) {
return Math.abs(input);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just put an assertion above the call to abs that ensures input isn't Long.MIN_VALUE

@schase-es schase-es added auto-backport Automatically create backport pull requests when merged and removed auto-backport Automatically create backport pull requests when merged v9.2.0 labels Nov 13, 2025
@schase-es schase-es merged commit 2e9b8f1 into elastic:main Nov 13, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants