Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Desired nodes API #82975

Merged
merged 72 commits into from
Feb 1, 2022
Merged

Desired nodes API #82975

merged 72 commits into from
Feb 1, 2022

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Jan 24, 2022

This commit adds the Desired Nodes API, allowing orchestrators
that manage Elasticsearch clusters to let the system know about the
current/planned topology that the cluster will run on.
This allows the system to take better decisions based on the entire
cluster topology, including nodes that will be added/removed in the
near future.

This commit adds the basic endpoints to manage the desired nodes
state:

  • GET /_internal/desired_nodes
  • PUT /_internal/desired_nodes/<history_id>/
  • DELETE /_internal/desired_nodes

@sethmlarson sethmlarson added the Team:Clients Meta label for clients team label Jan 24, 2022
@fcofdez fcofdez added Team:Distributed Meta label for distributed team :Distributed/Autoscaling labels Jan 25, 2022
"processors" : 8,
"memory" : "58gb",
"storage" : "1700gb",
"node_version" : "8.1.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there isn't a way to fetch this value programatically and populate it, I think we should upgrade it as new versions come out. Maybe it's better to skip the doc tests?

Copy link
Contributor

@henningandersen henningandersen Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok to leave that for a follow-up at least, i.e.,m skip the doc tests in this PR.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for dumping yet another incomplete review, will get to the rest of it shortly.

listener.onResponse(new GetDesiredNodesAction.Response(DesiredNodesMetadata.latestFromClusterState(state)));
final DesiredNodes latestDesiredNodes = DesiredNodesMetadata.latestFromClusterState(state);
if (latestDesiredNodes == null) {
throw new ResourceNotFoundException("Desired nodes not found");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to invoke listener.onFailure directly rather than throwing exceptions even though it will need an else block below.

final DesiredNodes latestDesiredNodes = DesiredNodesMetadata.latestFromClusterState(newState);
boolean replacedExistingHistoryId = previousDesiredNodes != null
&& previousDesiredNodes.hasSameHistoryId(latestDesiredNodes) == false;
listener.onResponse(new UpdateDesiredNodesResponse(true, replacedExistingHistoryId));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should simply remove the acknowledged flag from the response, just like adding voting config exclusions do not have it. The client should not care about acknowledged, only whether data is committed or not (which should throw if not or in doubt).


@Override
public void clusterStateProcessed(ClusterState oldState, ClusterState newState) {
final DesiredNodes previousDesiredNodes = DesiredNodesMetadata.latestFromClusterState(oldState);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add comment that we rely on the unbatched executor here?

) {
super(
UpdateDesiredNodesAction.NAME,
transportService,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should allow this even if above circuit breaker limit, this is orchestration trying to help us out:

Suggested change
transportService,
false,
transportService,


clusterService.submitStateUpdateTask(
"update-desired-nodes",
new ClusterStateUpdateTask(Priority.HIGH, request.masterNodeTimeout()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think we should use URGENT here instead, but maybe @DaveCTurner has an opinion on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be ok with URGENT although I'd be happier about it if these things were batched, just in case the orchestrator goes haywire.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the PR batching them

}

if (nodes.isEmpty()) {
validationException = ValidateActions.addValidationError("nodes must contain at least one node", validationException);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow-up we should probably also verify that there is at least one master eligible node.

@@ -166,7 +172,7 @@ public void testUnknownSettingsAreAllowedInFutureVersions() {
assertThat(latestDesiredNodes, is(equalTo(desiredNodes)));
}

public void testSomeSettingsCanBeOverridden() {
public void testNodeProcessorsGetValidatedWithDesiredNodeProcessors() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this verifies what the method name says by not throwing when setting the desired nodes? Perhaps add a comment if so.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Francisco

ClusterState state,
ActionListener<AcknowledgedResponse> listener
) throws Exception {
clusterService.submitStateUpdateTask("delete-desired-nodes", new AckedClusterStateUpdateTask(Priority.HIGH, request, listener) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these don't need to be acked tasks, we don't care whether the update is applied on all nodes or not since it will only be used on the master. It's enough that the state update is committed.

I'd rather we didn't merge without addressing this comment.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fcofdez LGTM

@fcofdez fcofdez merged commit 520b843 into elastic:master Feb 1, 2022
@fcofdez
Copy link
Contributor Author

fcofdez commented Feb 1, 2022

Thanks all for the reviews!

arteam added a commit that referenced this pull request Jul 26, 2022
Add the dry_run query parameter to support simulating of updating of desired nodes. The update request will be validated, but no cluster state updates will be performed. In order to indicate that the response was a result of a dry run, we add the dry_run run field to the JSON representation of a response.

See #82975
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >feature Team:Clients Meta label for clients team Team:Distributed Meta label for distributed team v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants