Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hot-reloadable remote cluster credentials #102798

Merged
merged 58 commits into from Dec 8, 2023

Conversation

n1v0lg
Copy link
Contributor

@n1v0lg n1v0lg commented Nov 30, 2023

This PR enables RCS 2.0 remote clusters to be configured without the need to restart nodes. It works as the follows (assuming both clusters are already running):

  1. Get a cross-cluster API key for accessing the remote cluster
  2. Add cross-cluster API key to keystores of the local cluster, e.g.
    echo -n xxx | ./bin/elasticsearch-keystore add cluster.remote.my.credentials -x
    
  3. Call ReloadSecureSettings API on the local cluster
  4. Configure RCS 2.0 remote cluster should now just work for the local cluster, e.g.
    PUT /_cluster/settings
    {"persistent":{"cluster":{"remote":{"my":{"seeds":["127.0.0.1:9443"]}}}}}
    

This PR does not include functionality to automatically re-build connections on secure settings reload. I will add this in a follow up PR.

The high level technical approach is to maintain a credentials manager class and use this to attach credentials for connections to remote clusters. This comment also provides more context on some lower level details.

Relates: #98120
Relates: ES-6764

@n1v0lg n1v0lg added >enhancement :Security/Security Security issues without another label labels Nov 30, 2023
@n1v0lg n1v0lg self-assigned this Nov 30, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @n1v0lg, I've created a changelog YAML for you.

@n1v0lg
Copy link
Contributor Author

n1v0lg commented Dec 7, 2023

@elasticmachine update branch

@n1v0lg
Copy link
Contributor Author

n1v0lg commented Dec 7, 2023

CI failure is unrelated:

Execution failed for task ':server:test'.
> Process 'Gradle Test Executor 237' finished with non-zero exit value 134
  This problem might be caused by incorrect test process configuration.
  For more on test execution, please refer to https://docs.gradle.org/8.5/userguide/java_testing.html#sec:test_execution in the Gradle documentation.

I thought this was fixed by: #103112 though... I'll dig into this but it should not prevent a review.

@ChrisHegarty
Copy link
Contributor

CI failure is unrelated:

Execution failed for task ':server:test'.
> Process 'Gradle Test Executor 237' finished with non-zero exit value 134
  This problem might be caused by incorrect test process configuration.
  For more on test execution, please refer to https://docs.gradle.org/8.5/userguide/java_testing.html#sec:test_execution in the Gradle documentation.

I thought this was fixed by: #103112 though... I'll dig into this but it should not prevent a review.

The fix for 103112 was not in your merge with main 5f91ae7. It should be there now with your subsequent merge with main, c865955. If you see an issue now, then we have a serious problem.

@n1v0lg n1v0lg marked this pull request as ready for review December 7, 2023 17:19
@elasticsearchmachine elasticsearchmachine added the Team:Security Meta label for security team label Dec 7, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - nice work !

A couple nitpicks (nits:) (feel free to ignore entirely) and a couple non-blocking comments.

logger.trace("No cluster credentials are configured for remote cluster [{}]", remoteClusterAlias);
return Optional.empty();
}

return remoteClusterCredentials;
return Optional.of(
new RemoteClusterCredentials(remoteClusterAlias, ApiKeyService.withApiKeyPrefix(remoteClusterCredentials.toString()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we reuse RemoteClusterAliasWithCredentials to avoid different records that serve a similar purpose with only minor distinction and keeps credential in a SecureString. Could probably promote RemoteClusterAliasWithCredentials to a top level object and add a method to add the prefix (and method to check if it prefixed) and return a new record.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to do this originally, but the ApiKey prefix logic is a security concept, and the method lives in xpack, which makes it inaccessible in the core remote cluster code. I think it makes sense to keep the two different records, as they conceptually belong to two different domains and we're translating between these here.

public record RemoteClusterAliasWithCredentials(String clusterAlias, @Nullable SecureString credentials) {
@Override
public String toString() {
return "RemoteClusterAliasWithCredentials{clusterAlias='" + clusterAlias + "', credentials='::es_redacted::'}";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ and really surprised that we don't do this automatically with SecureString#toString() (we should fix that outside of this PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's trappy for sure, but that's currently how to translate between SecureString and a regular string (e.g., for sending cross cluster access headers, or when dealing with third party deps that don't have a concept of SecureString). Not saying, we shouldn't change it -- just that it's going to be a slightly trickier refactor.

// Update credentials and ensure they are used
final String updatedCredentials = randomAlphaOfLength(41);
writeCredentialsToKeyStore(updatedCredentials);
reloadSecureSettings();
Copy link
Contributor

@jakelandis jakelandis Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand the relationship between connection profile and the need to rebuild the connection. Once the connection is established (TCP + handshake) it should remain open and if need to re-establish will use the new credential. Per request will use the new credential. I assumed that the rebuilding the connection was for 1) ensure we disconnect any old connections that we established with the old credential (more clean up than anything) 2) so that we can error or at least warn if the new credential is not valid on the reload call. I can't remember if it is possible to explictly change the underlying transport profile (https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html#transport-profiles) but we intentionally didn't document the one used for RCS 2.0 to discourage direct configuration there and any related options that influence the underlying transport profile are exposed as dedicated settings and are not related to the credential reload.

EDIT: this in reference to #102798 (comment) ... not sure why GH decided add the comment outside of the reply to that comment.

@n1v0lg
Copy link
Contributor Author

n1v0lg commented Dec 8, 2023

@ChrisHegarty sorry I missed your comment, yup, it looks like that build did not include your fix; after merging main again the build passed just fine. Thanks for investigating this!

@n1v0lg
Copy link
Contributor Author

n1v0lg commented Dec 8, 2023

@elasticmachine update branch bitte

@n1v0lg n1v0lg added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Dec 8, 2023
@elasticsearchmachine elasticsearchmachine merged commit 7ded906 into elastic:main Dec 8, 2023
19 checks passed
@n1v0lg n1v0lg deleted the rcs2-reload branch December 8, 2023 15:09
n1v0lg added a commit that referenced this pull request Dec 8, 2023
n1v0lg added a commit that referenced this pull request Dec 8, 2023
elasticsearchmachine pushed a commit that referenced this pull request Dec 18, 2023
…103215)

Brings back #102798, with a
tweak to avoid tripping on a blocking operation.

Only change compared to the original PR is
4072fac
javanna pushed a commit to javanna/elasticsearch that referenced this pull request Dec 21, 2023
…lastic#103215)

Brings back elastic#102798, with a
tweak to avoid tripping on a blocking operation.

Only change compared to the original PR is
elastic@4072fac
jbaiera pushed a commit to jbaiera/elasticsearch that referenced this pull request Jan 10, 2024
…lastic#103215)

Brings back elastic#102798, with a
tweak to avoid tripping on a blocking operation.

Only change compared to the original PR is
elastic@4072fac
elasticsearchmachine pushed a commit that referenced this pull request Jan 11, 2024
This PR builds on #102798
by adding automatic remote connection rebuilding on cluster credentials
changes. In particular, we rebuild a remote cluster connection if a
credential for the associated cluster is newly added (i.e., we are
moving from RCS 1.0 -> RCS 2.0) or removed (moving from RCS 2.0 -> RCS
1.0). A connection rebuild allows us to associate the correct profile
(`_remote_server` in case of RCS 2.0, or "regular" transport profile for
RCS 1.0) without requiring end-users to manually update remote cluster
settings via a settings update call.

More context on connection rebuilding also in this
[comment](https://github.com/elastic/elasticsearch/pull/102798/files#r1420454541).

Relates: ES-6764
jedrazb pushed a commit to jedrazb/elasticsearch that referenced this pull request Jan 17, 2024
…3460)

This PR builds on elastic#102798
by adding automatic remote connection rebuilding on cluster credentials
changes. In particular, we rebuild a remote cluster connection if a
credential for the associated cluster is newly added (i.e., we are
moving from RCS 1.0 -> RCS 2.0) or removed (moving from RCS 2.0 -> RCS
1.0). A connection rebuild allows us to associate the correct profile
(`_remote_server` in case of RCS 2.0, or "regular" transport profile for
RCS 1.0) without requiring end-users to manually update remote cluster
settings via a settings update call.

More context on connection rebuilding also in this
[comment](https://github.com/elastic/elasticsearch/pull/102798/files#r1420454541).

Relates: ES-6764
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement :Security/Security Security issues without another label Team:Security Meta label for security team v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants