Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set elastic password and generate enrollment token #75816

Merged
merged 29 commits into from
Aug 13, 2021

Conversation

BigPandaToo
Copy link
Contributor

This change implements a class which can be invoked
as a CLI tool and generate an elastic user password
(if bootstrap.password doesn't exist) and Kibana
enrollment token for initial node.
If bootstrap.password presents, no attempts to
auto-config the elastic password or to "promote"
this password value automatically to the security index
is made.
The process output password, Kibana enrollment token
and fingerprint of the CA certificate that ES is configured
to use for the HTTP layer to standard output.
This functionality is intended for the archive installation.

Resolves #75310

Set elastic password and generate enrollment token for initial node
Set elastic password and generate enrollment token for initial node
Set elastic password and generate enrollment token for initial node
@BigPandaToo BigPandaToo added :Security/Security Security issues without another label >enhancement Team:Security Meta label for security team labels Jul 28, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass.

In addition to the things mentioned, I think you need to consider that the node's keystore is not open, prompt for password etc.

Let's discuss sync.

@BigPandaToo
Copy link
Contributor Author

@elasticmachine update branch

@albertzaharovits
Copy link
Contributor

We've considered whether this tool is redundant wrt the other tools we've got for resetting the elastic user's password and generating enrollment tokens.
It is not, because:

  • the others expect the node to be running, whereas this one has to wait for the node to come up
  • this one has to operate only once during node bootstrap
  • because it both sets the password and generates enrollment tokens it takes away from the bash code that would otherwise have to call two other cmd line tools; internally this already reuses code from CreateEnrollmentToken, and it is always preferable to have java glue code to bash glue code.

@albertzaharovits
Copy link
Contributor

WRT to having a cmd line tool that does one thing (generates enrollment tokens) but which also "sometimes" does another thing (resets the elastic password unless bootstrap.password is set), we're acknowledging this is unfortunate, but this code is going to be invoked by scripts in exactly this configuration every time. No one is going to call this to only generate enrollment tokens or only to reset the passwords.
To minimize surprising anyone, we're going to rename this to something more vague but more illustrative of the context this is invoked in, ie something containing "Bootstrap" , eg "BootstrapCredentialsInitialNodeTool".

@jkakavas jkakavas added this to Implementation Milestone 4 (August 13th) in Security On by Default Implementation Tracking Aug 2, 2021
@BigPandaToo
Copy link
Contributor Author

@elasticmachine update branch

elasticmachine and others added 2 commits August 5, 2021 00:57
- Refactoring to reuse some code
- Refactoring checkClusterHealth code
- Change to decrypt keystore
- Update tests
Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a through review.
It is close.
Please ping me when you've addressed it all and I'll approve.

builder.endObject();
final String jsonString = Strings.toString(builder);
return Base64.getUrlEncoder().encodeToString(jsonString.getBytes(StandardCharsets.UTF_8));
return new EnrollmentToken(apiKey, fingerprint, httpInfo.v2(), httpInfo.v1());
} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this error handling is now redundant.

public String getApiKey() { return apiKey; }
public String getFingerprint() { return fingerprint; }
public String getVersion() { return version; }
public List<String> getBound_address() { return bound_address; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bound_address -> boundAddresses

import static org.elasticsearch.xpack.security.tool.CommandLineHttpClient.createURL;

public class BootstrapPasswordAndEnrollmentTokenForInitialNode extends KeyStoreAwareCommand {
private static final String ELASTIC_USER = ElasticUser.NAME;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: redundant

this.clientFunction = clientFunction;
this.keyStoreFunction = keyStoreFunction;
this.createEnrollmentTokenFunction = createEnrollmentTokenFunction;
docker = parser.accepts("docker", "determine that we are running in docker");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the parameter name is suitable.

We know that we're only going to set this parameter when the auto configuration runs in docker.
But this might change in the future, because what it does has actually nothing to do with docker. It's better to name it for what it does.

I suggest the name of "node-enrollment-token" or "include-node-enrollment-token".

throw new UserException(ExitCodes.UNAVAILABLE, null);
}
}
if (ReservedRealm.BOOTSTRAP_ELASTIC_PASSWORD.exists(env.settings()) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really work.

The problem is that the env.settings() object doesn't contain the settings from the keystore.
If you look at the BaseRunAsSuperuserCommand#execute preamble code (

), the decoded keystore is used to build an environment which then gets passed around.

Instead, you've chosen to build the environment every time you want to read the Secure Setting and then throw it out (the readBootstrapPassword method).

}
try {
final EnrollmentToken kibanaToken = enrollmentTokenGenerator.createKibanaEnrollmentToken(ELASTIC_USER, bootstrapPassword);
terminal.println("Kibana enrollment token: " + kibanaToken.encode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first time I'm bringing this up.
It would be better if we collate all the individual lines into a big String that is then passed to terminal#print.

This way the tool shows output when everything is completely done, and no output if one of the calls fails part way. It's still technically possible to show parts of it but much less likely, especially to the human eye.

It also helps with reordering of the information, without reordering of the endpoint calls (ie keep the elastic password reset last).

Thread.sleep(1000);
retries -= 1;
checkClusterHealthWithRetries(env, terminal, username, password, retries, force);
if (e instanceof ElasticsearchStatusException &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work now, because checkClusterHealthWithRetriesWaitingForCluster wraps exception in IllegalStateExceptions.

I'm starting to regret my suggestion to have a common approach for the cluster health polling.
My suggestion now is to do whatever works that follows the following requirements:

For BaseRunAsSuperUserCommand:

  • fail on connection exception (node is down)
  • retry on HTTP_UNAUTHORIZED and HTTP_FORBIDDEN

For BootstrapPasswordAndEnrollmentTokenForInitialNode:

  • retry on connection exception
  • fail on any Exception

I think it's better that these methods be at the CommandLineHttpClient level, to aid with discoverability, and that they reuse code, but I'm not going to gripe about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkClusterHealthWithRetriesWaitingForCluster is in CommandLineHttpClient
checkClusterHealthWithRetries is in BaseRunAsSuperuserCommand because it is superuser aware command

checkClusterHealthWithRetriesWaitingForCluster throws ElasticsearchStatusException in case of HTTP_UNAUTHORIZED and HTTP_FORBIDDEN and pass the specific http error code to checkClusterHealthWithRetries so it can retry in this case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to regret my suggestion to have a common approach for the cluster health polling.

++ :/ I think we gave it a try and it doesn't work ( I don't like exception type based flow control ) . I'd suggest we leave the BaseRunAsSuperUserCommand check health method as it was and implement something for BootstrapPasswordAndEnrollmentTokenForInitialNode that makes sense for it since these two have slightly different requirements. I'd leave the first in the BaseRunAsSuperUserCommand as it is very specific to it, but I don't have strong opinions on where to place the second checkclusterhealth method.

WDYT Lyudmila ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with both of you. I think the current implementation would work, but it doesn't look intuitive and overall fragil.
I would leave the one waiting for cluster in CommandLineHttpClient as I think it is generic enough to be used by other commands

@BigPandaToo
Copy link
Contributor Author

@elasticmachine update branch

* @param apiKey API Key credential in the form apiKeyId:ApiKeySecret to be used for enroll calls
* @param fingerprint hex encoded SHA256 fingerprint of the HTTP CA cert
* @param version node version number
* @param boundAddress IP Addresses and port numbers for the interface where the Elasticsearch node is listening on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param boundAddress IP Addresses and port numbers for the interface where the Elasticsearch node is listening on
* @param boundAddress IP Addresses and port numbers for the interfaces where the Elasticsearch node is listening on

return Strings.toString(builder);
}

public String encode() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit suggestion : getEnoded()

this.clientFunction = clientFunction;
this.keyStoreFunction = keyStoreFunction;
this.createEnrollmentTokenFunction = createEnrollmentTokenFunction;
docker = parser.accepts("include-node-enrollment-token", "determine that we are running in docker");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove docker references from here. For the same reasons you changed it to include-node-enrollment-token

Thread.sleep(1000);
retries -= 1;
checkClusterHealthWithRetries(env, terminal, username, password, retries, force);
if (e instanceof ElasticsearchStatusException &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to regret my suggestion to have a common approach for the cluster health polling.

++ :/ I think we gave it a try and it doesn't work ( I don't like exception type based flow control ) . I'd suggest we leave the BaseRunAsSuperUserCommand check health method as it was and implement something for BootstrapPasswordAndEnrollmentTokenForInitialNode that makes sense for it since these two have slightly different requirements. I'd leave the first in the BaseRunAsSuperUserCommand as it is very specific to it, but I don't have strong opinions on where to place the second checkclusterhealth method.

WDYT Lyudmila ?

Copy link
Member

@jkakavas jkakavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good Lyudmila, I added a couple of comments and a few nit suggestions. Happy to approve once comments are resolved

return password;
}

Environment readBootstrapPassword(Environment env, SecureString keystorePassword) throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's call this something rebuildEnvironment or similar now it returns an Environment ? Or even better return settings here because this is all we need? In BaseRunAsSuperuserCommand we build a new Environment but there we are actually using it. So maybe loadSecureSettings and return Settings ? Just throwing ideas around

/**
* If cluster is not up yet (connection refused or cluster still Red), we will retry @retries number of times
*/
public void checkClusterHealthWithRetriesWaitingForCluster(String username, SecureString password, int retries, boolean force)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think force flag makes sense in this method. If a user of the CommandLineHttpClient doesn't care about cluster health and would force execution eitherway, they can simply not call checkClusterHealthWithRetriesWaitingForCluster at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

throw new UserException(ExitCodes.UNAVAILABLE, null);
}
if (ReservedRealm.BOOTSTRAP_ELASTIC_PASSWORD.exists(newEnvironment.settings()) == false) {
output = output.concat("elastic user password: " + setElasticUserPassword(client, bootstrapPassword));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to the try catch block above ? setElasticUserPassword() throws too

client.checkClusterHealthWithRetriesWaitingForCluster(ElasticUser.NAME, bootstrapPassword, 5, false);
final EnrollmentToken kibanaToken = enrollmentTokenGenerator.createKibanaEnrollmentToken(ElasticUser.NAME, bootstrapPassword);
output = "Kibana enrollment token: " + kibanaToken.encode() + "\n";
output = output.concat("CA fingerprint: " + kibanaToken.getFingerprint() + "\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\n -> System.lineSeparator()

Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more issues on the cluster health retry logic. Please address.
Besides that LGTM.

Thanks for the effort.

if (retries > 0) {
Thread.sleep(1000);
retries -= 1;
checkClusterHealthWithRetriesWaitingForCluster(username, password, retries);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want a return here, otherwise it will throw npe further down.

throw new IllegalStateException(
"Failed to determine the health of the cluster. Cluster health API did not return a status value."
);
} else if ("red".equalsIgnoreCase(clusterStatus)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI the cluster health API has a bunch of "wait for conditions" in order to avoid this sort of client side polling.

We can improve on this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! I have changed to wait for yellow for default 30s instead of reiterating here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reiterating only waiting for connection

}
}
final int responseStatus = response.getHttpStatus();
if (responseStatus != HttpURLConnection.HTTP_OK) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically I would say that 503 is also a condition (in addition to connection problems) that should be retryable.
Cluster health can return it when there's no master (it does some retrying, it's unlikely, but possible).

final SecureString bootstrapPassword = ReservedRealm.BOOTSTRAP_ELASTIC_PASSWORD.get(secureSettings);
try {
String output;
client.checkClusterHealthWithRetriesWaitingForCluster(ElasticUser.NAME, bootstrapPassword, 5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 seconds for a node to come up is a bit on the low side. I would put it closer to 15.

final SecureString keystorePassword = new SecureString(keyStorePassword);
final CommandLineHttpClient client = clientFunction.apply(env);
final EnrollmentTokenGenerator enrollmentTokenGenerator = createEnrollmentTokenFunction.apply(env);
final Settings secureSettings = readSecureSettings(env, keystorePassword);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to return an Environment (which will contain the secure settings from the keystore ) and you need to pass this Environment to

final CommandLineHttpClient client = clientFunction.apply(env);
final EnrollmentTokenGenerator enrollmentTokenGenerator = createEnrollmentTokenFunction.apply(env);

so that these classes can read key material from the TLS keystores ( that are password protected and their passwords are stored in the secure settings ).

Apologies I missed it yesterday when I made the comment that this can return Settings instead, we weren't using the returned environment below and it didn't ring any bells

response = client.execute("POST", passwordSetUrl, ElasticUser.NAME, bootstrapPassword,
() -> {
XContentBuilder xContentBuilder = JsonXContent.contentBuilder();
xContentBuilder.startObject().field("password", password).endObject();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't pass a SecureString as the value in field(), you need to use the char array here and wrap it in a SecureString later or call toString() on the SecureString you already have


@Override
protected void execute(Terminal terminal, OptionSet options, Environment env) throws Exception {
final char[] keyStorePassword;
Copy link
Member

@jkakavas jkakavas Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not have two variables of different type with almost the same name if we don't need to. You can simply have

final SecureString keystorePassword; 
try {
    keystorePassword = new SecureString(terminal.readSecret(""));
} catch (Exception e) {
    throw new UserException(ExitCodes.USAGE, null);
}

Copy link
Member

@jkakavas jkakavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM provided you address Albert's and my final comments before merging. Thanks for the iterations Lyudmila!

p.s. Lack of QA testing is biting us, both the issues I commented on in my last review could not be found with unit testing. We need to discuss relevant priorities and find a useful balance between feature mvp and useful/necessary testing.

- Reiterate if http status 503 terturned
- Use 'wait for yellow' option instead of reiterating
- Increase to 15 retrys
@BigPandaToo
Copy link
Contributor Author

@elasticmachine update branch

@BigPandaToo BigPandaToo merged commit c5b73fc into elastic:master Aug 13, 2021
Security On by Default Implementation Tracking automation moved this from Implementation Milestone 4 (August 13th) to Completed Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Security/Security Security issues without another label Team:Security Meta label for security team v8.0.0-alpha2
Development

Successfully merging this pull request may close these issues.

Set elastic password and generate enrollment token for initial node
6 participants