Skip to content

Conversation

@nicktindall
Copy link
Contributor

@nicktindall nicktindall commented Oct 31, 2025

We log a message to indicate that some nodes are hot-spotted which includes a total node count, e.g.

Nodes [[v7V4iKxHSs-fN4fu8Ofq6Q]] are hot-spotting, of 7 total cluster nodes. Reroute for hot-spotting was last called [4.4m] ago. Previously hot-spotting nodes are [[zA1TQ3UqQa-j1e1MT_em5Q]]. The write thread pool queue latency threshold is [3s]. Triggering reroute.

This PR changes that message to indicate how many "ingest" nodes, or nodes that might be able to have shards moved to them there are.

The PR also excludes ML nodes from that count and from consideration when deciding if there are non-hot-spotted nodes available. Previously only search nodes were excluded.

@elasticsearchmachine elasticsearchmachine added v9.3.0 needs:triage Requires assignment of a team area label labels Oct 31, 2025
@nicktindall nicktindall added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Oct 31, 2025
@elasticsearchmachine elasticsearchmachine added Team:Distributed Coordination Meta label for Distributed Coordination team and removed needs:triage Requires assignment of a team area label labels Oct 31, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

var totalIngestionNodes = 0;
for (var entry : clusterInfo.getNodeUsageStatsForThreadPools().entrySet()) {
final var nodeId = entry.getKey();
final var usageStats = entry.getValue();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just changed to a for-loop so we didn't need to create all these AtomicXXX to store the counter/boolean

Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor nits.

// TODO (ES-13314): consider stateful data tiers
return;
}
totalIngestionNodes++;
Copy link
Contributor

@DiannaHohensee DiannaHohensee Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seeing ingestion might be giving me indigestion (I couldn't resist...).

I haven't seen ingestion elsewhere. Should this be totalIngestNodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in cfbb6fe

@nicktindall nicktindall requested a review from a team as a code owner November 4, 2025 23:50
@nicktindall nicktindall force-pushed the log_total_ingest_nodes branch from b5c8ae9 to 34d92df Compare November 5, 2025 00:06
@nicktindall nicktindall merged commit c9c50ea into elastic:main Nov 5, 2025
34 checks passed
@nicktindall nicktindall deleted the log_total_ingest_nodes branch November 5, 2025 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants