Skip to content

Conversation

@pxsalehi
Copy link
Member

@pxsalehi pxsalehi commented Nov 11, 2024

At the end of each reconciliation round, also export the current allocation stats for each node. This is intended to show the gradual progress (or divergence!) towards the desired values exported in #115854, and relies on the existing AllocationStatsService.

Relates ES-9873

@pxsalehi pxsalehi added >non-issue WIP :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Nov 11, 2024
@pxsalehi pxsalehi force-pushed the ps241106-nodeBalanceMetrics branch from 226d443 to 57c526a Compare November 11, 2024 12:42
Comment on lines 47 to 50
public static final String CURRENT_NODE_UNDESIRED_SHARD_COUNT_METRIC_NAME =
"es.allocator.allocations.node_undesired_shard_count.current";
public static final String CURRENT_NODE_FORECASTED_DISK_USAGE_METRIC_NAME =
"es.allocator.allocations.node_forecasted_disk_usage_bytes.current";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeAllocationStats gives these two more compared to the DESIRED_BALANCE_... metrics above. Not sure if they're needed, but I have exported all of them.

@pxsalehi pxsalehi force-pushed the ps241106-nodeBalanceMetrics branch from 57c526a to 6c84cfe Compare November 11, 2024 12:51
@pxsalehi pxsalehi removed the WIP label Nov 11, 2024
@pxsalehi pxsalehi marked this pull request as ready for review November 11, 2024 12:52
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Nov 11, 2024
assert node != null;
nodeAllocationStats.put(node, entry.getValue());
}
desiredBalanceMetrics.updateMetrics(allocationStats, desiredBalance.weightsPerNode(), nodeAllocationStats);
Copy link
Contributor

@idegtiarenko idegtiarenko Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think I am following.

New stats claim they represent the current state of the nodes and are populated from allocationStatsPerNodeRef that is set from nodeAllocationStats or the last argument here.
In its turn the last argument is effectively copied from stats entries that represent the desired state.

How are current stats different from desired stats?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you say stats represents desired state? To me it seems to be based on the current cluster state in org.elasticsearch.cluster.routing.allocation.AllocationStatsService.NodeStatsProvider#stats? Either I'm missing something or the naming is confusing! :))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, maybe I am confused. We pass the desiredBalance to the call above but not the current cluster state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just to manage the dependencies to be able to reuse AllocationStatsService. There, we used to pass the entire Allocator only to get the desired balance out of it, which seems to be used for calculating undesired shard count. (git tells me it's your code actually!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... so if you find a different approach to reuse that code more useful/readable, or see some comments missing or renames, please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That component was originally created to expose stats via endpoint and was not originally used in reconciler.

The first one (minor) is that it accesses the cluster state via clusterService.state() and not from allocation. They should be the same as this is happening during cluster state update computation, but that is not immediately obvious.

The second one (more substantial) is the set of metric it provides. It does not produce the same metrics as we have for computed balance. This might introduce confusing while comparing them and opens a possibility of publishing different metrics under similar names. Also since this service is used in public API, we will be limited in how we can change it. This might become relevant as we discuss the imbalance measurements and experiment with different balancing functions.

Based on this I suggest not to rely on it and instead implement an apm specific service that is used in both computer and reconciler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first one (minor) is that it accesses the cluster state via clusterService.state()

I can change that and make it accept also a state.

It does not produce the same metrics as we have for computed balance.

Not sure I follow. The three node-level metrics from the desired balance that we currently have are shard_count, write_load and disk_usage. We have in this PR those three calculated based on current cluster state. There are two more here, that as I mentioned I chose to export, but we can drop them.

Also since this service is used in public API, we will be limited in how we can change it.

That code is now refactored into its own class. For now we're reusing it because it provides what we need. If we need to later, we can duplicate it, extend it or whatever is appropriate based on what we need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this thread offline,

Re

It does not produce the same metrics as we have for computed balance.

It misses weights, but we discussed that they are not necessary for now.
We also discussed the possibility of weight computations diverging. This aspect could be addressed in future when this problem arises.

public static final String DESIRED_BALANCE_NODE_DISK_USAGE_METRIC_NAME =
"es.allocator.desired_balance.allocations.node_disk_usage_bytes.current";

public static final String CURRENT_NODE_SHARD_COUNT_METRIC_NAME = "es.allocator.allocations.node.shard_count.current";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally, I had these as es.allocator.allocations.node_ but APM complained that it is too long so I changed to this.

currentDiskUsage
)
);
public static class NodeStatsProvider implements StatsProvider {
Copy link
Contributor

@nicktindall nicktindall Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit confusing with the inner class for reuse, could we just add an overload to AllocationStatsService that took a desired balance to use in the computation, and everyone just gets given a reference to a single instance of it?

e.g. add

public Map<String, NodeAllocationStats> stats(DesiredBalance desiredBalance) {
   // ...
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh there would be a circular dependency between allocator and stats service then.

I reckon it'd be clearer if the thing that calculated the stats just became a separate shared piece of infrastructure that took a DesiredBalance from somewhere and returned the stats. The inner class and the interface makes it a bit tricky to read.

Copy link
Member Author

@pxsalehi pxsalehi Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the dependencies here are not straight-forward and largely the reason I came up with this. But how is your suggestion different than what's in the PR? I can move out the inner class and get rid of the interface. Other than that, it is the same, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a5381e1.

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me but I think we should just refactor to tidy up the reuse of AllocationStatsService

@pxsalehi pxsalehi requested a review from nicktindall November 12, 2024 09:39
@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Nov 12, 2024
Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's easier to read. You could probably share an instance of NodeAllocationStatsProvider, which might make the shared dependency a bit clearer, but if that's tricky with the wiring you could also not.

LGTM assuming @idegtiarenko is OK with the current vs desired logic

@pxsalehi pxsalehi added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Nov 12, 2024
@elasticsearchmachine elasticsearchmachine merged commit d2e5c43 into elastic:main Nov 12, 2024
16 checks passed
@pxsalehi pxsalehi deleted the ps241106-nodeBalanceMetrics branch November 12, 2024 16:13
javanna pushed a commit to javanna/elasticsearch that referenced this pull request Nov 12, 2024
At the end of each reconciliation round, also export the current
allocation stats for each node. This is intended to show the gradual
progress (or divergence!) towards the desired values exported in
elastic#115854, and relies on the
existing `AllocationStatsService`.

Relates ES-9873
jozala pushed a commit that referenced this pull request Nov 13, 2024
At the end of each reconciliation round, also export the current
allocation stats for each node. This is intended to show the gradual
progress (or divergence!) towards the desired values exported in
#115854, and relies on the
existing `AllocationStatsService`.

Relates ES-9873
afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Nov 14, 2024
At the end of each reconciliation round, also export the current
allocation stats for each node. This is intended to show the gradual
progress (or divergence!) towards the desired values exported in
elastic#115854, and relies on the
existing `AllocationStatsService`.

Relates ES-9873
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
At the end of each reconciliation round, also export the current
allocation stats for each node. This is intended to show the gradual
progress (or divergence!) towards the desired values exported in
elastic#115854, and relies on the
existing `AllocationStatsService`.

Relates ES-9873
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue serverless-linked Added by automation, don't add manually Team:Distributed Coordination Meta label for Distributed Coordination team v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants