-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Collect node thread pool usage for shard balancing #131480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
31f150c
571ec0a
3c6e8fd
c28d091
9290acf
5f3d76e
5505fab
f378e34
00b9e55
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,145 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the "Elastic License | ||
| * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side | ||
| * Public License v 1"; you may not use this file except in compliance with, at | ||
| * your election, the "Elastic License 2.0", the "GNU Affero General Public | ||
| * License v3.0 only", or the "Server Side Public License, v 1". | ||
| */ | ||
|
|
||
| package org.elasticsearch.action.admin.cluster.node.usage; | ||
|
|
||
| import org.elasticsearch.action.FailedNodeException; | ||
| import org.elasticsearch.action.support.nodes.BaseNodeResponse; | ||
| import org.elasticsearch.action.support.nodes.BaseNodesRequest; | ||
| import org.elasticsearch.action.support.nodes.BaseNodesResponse; | ||
| import org.elasticsearch.cluster.ClusterName; | ||
| import org.elasticsearch.cluster.NodeUsageStatsForThreadPools; | ||
| import org.elasticsearch.cluster.node.DiscoveryNode; | ||
| import org.elasticsearch.common.io.stream.StreamInput; | ||
| import org.elasticsearch.common.io.stream.StreamOutput; | ||
| import org.elasticsearch.transport.AbstractTransportRequest; | ||
|
|
||
| import java.io.IOException; | ||
| import java.util.HashMap; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
|
|
||
| /** | ||
| * Defines the request/response types for {@link TransportNodeUsageStatsForThreadPoolsAction}. | ||
| */ | ||
| public class NodeUsageStatsForThreadPoolsAction { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a TransportNodesStatsAction which can produce thread-pool usage. Do we need a separate action for this?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The calls to collect the thread pool stats are destructive. For example, collecting the max queue latency seen since the last call and then resetting max seen to zero. Pool utilization is also destructive, resetting an execution time tracker after collection. So we can't hook the new stats up to the TransportNodesStatsAction API and have random callers clearing the state we'll need for allocation.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we're building a whole separate set of polling messages just because of the destructive-ness of the utilisation read, can we instead just limit the re-calculation to no more often than 30s or something (returning the most recent calculated value until that interval has passed) and have all readers poll from a single monitor? Or instead just switch to having a separate dedicated utilisation polling task on the data node, and make the reads non-destructive. That sounds like a tidier alternative to me. We can discuss tomorrow at the catch up, but it seems to me like having separate monitoring state is now cascading into more work/maintenance.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If we make it time based, recalculated on the data node every 30 seconds, and then the master node polls every 30 seconds, operations can race such that the master node sees the same value twice and misses a value that is just about to be calculated.
For this we'd have to build a component on the data node to asynchronously run a thread every 30 seconds to calculate a new value. Not obviously easier than building a TransportAction (which is also already implemented). Transport actions are apparently lightweight and there isn't concern about adding more, FWIW. There was a discussion on this subject back in the 07-01 meeting (David, Pooya and I) (minutes 10-32). We were thinking EWMA at the time. We did like the idea of using the node stats api. The concern became public documentation of the new values because we were uncertain whether we'd want to change what metrics were sent in future. We can't change publicly visible stats. Arguably the stats we have now are easier to explain, though I'm not sure they're final yet.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the new action is more about having a separate action, request/response that we can play around with and include things into that may never go to node-stats. There is also not too much overhead to a new action so I prefer to have it as such. One can argue that node-stats should maybe not be used from cluster-info - rather we should have a dedicated action - but let us not go there now. |
||
| /** | ||
| * The sender request type that will be resolved to send individual {@link NodeRequest} requests to every node in the cluster. | ||
| */ | ||
| public static class Request extends BaseNodesRequest { | ||
| /** | ||
| * @param nodeIds The list of nodes to which to send individual requests and collect responses from. If the list is null, all nodes | ||
| * in the cluster will be sent a request. | ||
| */ | ||
| public Request(String[] nodeIds) { | ||
| super(nodeIds); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Request sent to and received by a cluster node. There are no parameters needed in the node-specific request. | ||
| */ | ||
| public static class NodeRequest extends AbstractTransportRequest { | ||
| public NodeRequest(StreamInput in) throws IOException { | ||
| super(in); | ||
| } | ||
|
|
||
| public NodeRequest() {} | ||
| } | ||
|
|
||
| /** | ||
| * A collection of {@link NodeUsageStatsForThreadPools} responses from all the cluster nodes. | ||
| */ | ||
| public static class Response extends BaseNodesResponse<NodeUsageStatsForThreadPoolsAction.NodeResponse> { | ||
|
|
||
| protected Response(StreamInput in) throws IOException { | ||
| super(in); | ||
| } | ||
|
|
||
| public Response( | ||
| ClusterName clusterName, | ||
| List<NodeUsageStatsForThreadPoolsAction.NodeResponse> nodeResponses, | ||
| List<FailedNodeException> nodeFailures | ||
| ) { | ||
| super(clusterName, nodeResponses, nodeFailures); | ||
| } | ||
|
|
||
| /** | ||
| * Combines the responses from each node that was called into a single map (by node ID) for the final {@link Response}. | ||
| */ | ||
| public Map<String, NodeUsageStatsForThreadPools> getAllNodeUsageStatsForThreadPools() { | ||
| Map<String, NodeUsageStatsForThreadPools> allNodeUsageStatsForThreadPools = new HashMap<>(); | ||
| for (NodeUsageStatsForThreadPoolsAction.NodeResponse nodeResponse : getNodes()) { | ||
| allNodeUsageStatsForThreadPools.put( | ||
| nodeResponse.getNodeUsageStatsForThreadPools().nodeId(), | ||
| nodeResponse.getNodeUsageStatsForThreadPools() | ||
| ); | ||
| } | ||
| return allNodeUsageStatsForThreadPools; | ||
| } | ||
|
|
||
| @Override | ||
| protected void writeNodesTo(StreamOutput out, List<NodeResponse> nodeResponses) throws IOException { | ||
| out.writeCollection(nodeResponses); | ||
| } | ||
|
|
||
| @Override | ||
| protected List<NodeResponse> readNodesFrom(StreamInput in) throws IOException { | ||
| return in.readCollectionAsList(NodeUsageStatsForThreadPoolsAction.NodeResponse::new); | ||
| } | ||
|
|
||
| @Override | ||
| public String toString() { | ||
| return "NodeUsageStatsForThreadPoolsAction.Response{" + getNodes() + "}"; | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * A {@link NodeUsageStatsForThreadPools} response from a single cluster node. | ||
| */ | ||
| public static class NodeResponse extends BaseNodeResponse { | ||
| private final NodeUsageStatsForThreadPools nodeUsageStatsForThreadPools; | ||
|
|
||
| protected NodeResponse(StreamInput in, DiscoveryNode node) throws IOException { | ||
| super(in, node); | ||
| this.nodeUsageStatsForThreadPools = new NodeUsageStatsForThreadPools(in); | ||
| } | ||
|
|
||
| public NodeResponse(DiscoveryNode node, NodeUsageStatsForThreadPools nodeUsageStatsForThreadPools) { | ||
| super(node); | ||
| this.nodeUsageStatsForThreadPools = nodeUsageStatsForThreadPools; | ||
| } | ||
|
|
||
| public NodeResponse(StreamInput in) throws IOException { | ||
| super(in); | ||
| this.nodeUsageStatsForThreadPools = new NodeUsageStatsForThreadPools(in); | ||
| } | ||
|
|
||
| public NodeUsageStatsForThreadPools getNodeUsageStatsForThreadPools() { | ||
| return nodeUsageStatsForThreadPools; | ||
| } | ||
|
|
||
| @Override | ||
| public void writeTo(StreamOutput out) throws IOException { | ||
| super.writeTo(out); | ||
| nodeUsageStatsForThreadPools.writeTo(out); | ||
| } | ||
|
|
||
| @Override | ||
| public String toString() { | ||
| return "NodeUsageStatsForThreadPoolsAction.NodeResponse{" | ||
| + "nodeId=" | ||
| + getNode().getId() | ||
| + ", nodeUsageStatsForThreadPools=" | ||
| + nodeUsageStatsForThreadPools | ||
| + "}"; | ||
| } | ||
| } | ||
|
|
||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be good to add a test where we block write-threads to ensure we have a queue latency here. We can do that in a follow-up, this is not essential before merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I flagged it on my JIRA ticket so I don't forget 👍