Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexify CCR internal actions for RCS 2.0 #95264

Merged
merged 26 commits into from
Apr 28, 2023

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Apr 17, 2023

This PR adds CCR support for RCS 2.0 by pairing its internal actions with indices actions which are safe to use across different clusters.

This PR adds CCR support for RCS 2.0 by pairing its internal actions
with indices actions which are safe to use across different clusters.
Comment on lines +72 to +79
@Override
public String[] indices() {
if (shardId == null) {
return null;
} else {
return new String[] { shardId.getIndexName() };
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using just the index name from the shard works because CCR does not support aliases. Otherwise it could be a lot of trouble to keep the original indices.

Comment on lines 365 to 369
final String effectiveAction = RCS_ACTION_NAME_LOOKUP.getOrDefault(action, action);
if (false == effectiveAction.equals(action)) {
logger.info("switching action from [{}] to [{}]", action, effectiveAction);
}
sendWithCrossClusterAccessHeaders(crossClusterAccessHeaders, connection, effectiveAction, request, options, handler);
Copy link
Member Author

@ywangd ywangd Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite the request here to use the indices actions instead of internal actions to honor indices privileges. This also means, for existing remote cluster model, it will keep using the internal actions.

Comment on lines 54 to 56
if (out.getTransportVersion().onOrAfter(TransportVersion.V_8_8_0)) {
shardId.writeTo(out);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since RCS 2.0 is after 8.8, dropping this field for older versions is not an issue at all.

@ywangd ywangd added :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC >non-issue labels Apr 24, 2023
@ywangd ywangd marked this pull request as ready for review April 24, 2023 07:06
@elasticsearchmachine elasticsearchmachine added the Team:Security Meta label for security team label Apr 24, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

import static org.hamcrest.Matchers.hasSize;
import static org.hamcrest.Matchers.is;

public class RemoteClusterSecurityCcrFcActionAuthorizationIT extends ESRestTestCase {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test class does not extends the abstract RCS base class because it only uses a single cluster. The single cluster is used as the leader and the test uses a TransportService to directly hit the remote server interface. This allows us to test requests and beheviours that should not happen with normal software execution environment, i.e. it simulates potentially malicious behaviours.

Comment on lines 230 to 232
if (false == getMetadata().contains(fileName)) {
throw new IllegalArgumentException("invalid file name [" + fileName + "]");
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic here as well as the above ensureSessionShardIdConsistency method are behaviour change for existing CCR as well. They feel safe enough and are likely useful enhancements. We could have a distributed team member to take a look to be sure. Alternatively we could try conditionally disable these checks for RCS 1 leader clusters. But it may not be easy since this service layer does not have knowledge about the cluster connections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards not disabling them conditionally since as you say it would be complicated to do this; besides, these seem like reasonable validation steps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you suggest here it actually makes sense (and isn't too challenging) to avoid this for old-style actions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more unrelated sanity check: the filenames are for Lucene-maintained segment info files, with a fixed format of segment_N; having poked around the code a bit (e.g. this is the relevant ES-side part) I don't think there is a way for someone to maliciously (and manually) add a file with a name that would enable directory traversal, unless perhaps they had access to the file system; in that case though they would not benefit from a directory traversal attack to begin with. Was wondering if this is roughly what you have found as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes basically that. Also, I don't think it is possible to physically add a file with a name that enables directory traversal. The file sytsem won't allow such file name to exist in the first place.

@ywangd ywangd requested a review from n1v0lg April 24, 2023 07:19
Comment on lines +37 to +45
RetentionLeaseActions.Add.ACTION_NAME,
RetentionLeaseActions.Remove.ACTION_NAME,
RetentionLeaseActions.Renew.ACTION_NAME,
"indices:monitor/stats",
"indices:internal/admin/ccr/restore/session/put",
"indices:internal/admin/ccr/restore/session/clear",
"internal:transport/proxy/indices:internal/admin/ccr/restore/session/clear",
"indices:internal/admin/ccr/restore/file_chunk/get",
"internal:transport/proxy/indices:internal/admin/ccr/restore/file_chunk/get"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a separate PR for adding new consolidated CCR/CCS privileges.

Copy link
Contributor

@n1v0lg n1v0lg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great. I made a first pass today -- only nits, and one sanity check so far. Will review the rest, including test tomorrow.

// Cross cluster access user can perform has privilege check
if (authentication.isCrossClusterAccess() && HasPrivilegesAction.NAME.equals(action)) {
assert request instanceof HasPrivilegesRequest;
return ((Authentication) authentication.getAuthenticatingSubject()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional nit: getRemoteAuthenticationFromMetadata or getAuthenticationFromCrossClusterAccessMetadata or something similar could be a nice util to add to Authentication.

((Authentication) authentication.getAuthenticatingSubject().getMetadata().get(CROSS_CLUSTER_ACCESS_AUTHENTICATION_KEY)) feels pretty long and distracting, esp. because of the cast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that. But I was a bit hesitate to add such method to Authentication because we don't even have a getMetadata method for it since metadata right now is a subject level information. So for now I added getAuthenticationFromCrossClusterAccessMetadata as a static method to the Authentication class. We can iterate in future to further rationalize how Authentication information should be distributed.

this.ccrRestoreService = ccrRestoreService;
}

@Override
protected void doExecute(Task task, ClearCcrRestoreSessionRequest request, ActionListener<ActionResponse.Empty> listener) {
final ShardId shardId = request.getShardId();
// It can be null if the request is sent from an old node with the internal action
if (shardId != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity check: if shardId is null, for example for ClearCcrRestoreSessionRequest, then authorization is guaranteed to fail for the new-style action (indices:internal/admin/ccr/restore/session/clear) here:

if (indices == null || indices.length == 0) {
throw new IllegalArgumentException("the action " + action + " requires explicit index names, but none were provided");
}

For the old-style action (internal:admin/ccr/restore/session/clear) on the other hand, we won't hit the above IAE because authz won't go through the regular authz flow.

Is my understanding correct here? If so, I wonder if it makes sense to fail early for new style actions if shardId is null, e.g., by making doExecute or parts of it abstract, and overriding for InternalTransportAction and TransportAction accordingly. After all the action is guaranteed to fail.

Apologies if I'm missing something here. I ran out of time getting to the bottom of this today.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you said makes perfect sense! Thanks for the prompt. I was focusing on minimize the changes of different transport actions and forgot about that we could actual leverage the difference. In this case, you are absolutely right that we should fail for null in the new TransportAction and only have this leniency in the internal one.

This also prompts me to think twice about this comment. Since we have separate transport actions for different RCS models, we could lift the check up into the transport action so that it does not affect existing model. This is my current preference because conditional check for the ShardId here would also make the change not affect existing model. It would be more consistent if both ShardId and fileName checks either affect the existing model or not at all. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, glad I was onto something. Yup, I'm ++ on lifting the other check to the transport action as well. It's good to minimize the impact on the old model and we can always retrofit the check if we deem it worthwhile in the future.

Copy link
Contributor

@n1v0lg n1v0lg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone through the test code as well. All looks good. Once the adjustments around separating old-style from new-style actions are made, I will take another quick look but from my perspective this is basically ready to ship!

Comment on lines 230 to 232
if (false == getMetadata().contains(fileName)) {
throw new IllegalArgumentException("invalid file name [" + fileName + "]");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you suggest here it actually makes sense (and isn't too challenging) to avoid this for old-style actions.

Comment on lines 230 to 232
if (false == getMetadata().contains(fileName)) {
throw new IllegalArgumentException("invalid file name [" + fileName + "]");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more unrelated sanity check: the filenames are for Lucene-maintained segment info files, with a fixed format of segment_N; having poked around the code a bit (e.g. this is the relevant ES-side part) I don't think there is a way for someone to maliciously (and manually) add a file with a name that would enable directory traversal, unless perhaps they had access to the file system; in that case though they would not benefit from a directory traversal attack to begin with. Was wondering if this is roughly what you have found as well.

…ecurity/transport/SecurityServerTransportInterceptor.java

Co-authored-by: Nikolaj Volgushev <n1v0lg@users.noreply.github.com>
Comment on lines 124 to 148
public void ensureSessionShardIdConsistency(String sessionUUID, ShardId shardId) {
final RestoreSession restore = onGoingRestores.get(sessionUUID);
if (restore == null) {
logger.debug("could not get session [{}] because session not found", sessionUUID);
throw new IllegalArgumentException("session [" + sessionUUID + "] not found");
}
final ShardId sessionShardId = restore.indexShard.shardId();
if (false == sessionShardId.equals(shardId)) {
throw new IllegalArgumentException(
"session [" + sessionUUID + "] shardId [" + sessionShardId + "] does not match requested shardId [" + shardId + "]"
);
}
}

public void ensureFileNameIsKnownToSession(String sessionUUID, String fileName) throws IOException {
final RestoreSession restore = onGoingRestores.get(sessionUUID);
if (restore == null) {
logger.debug("could not get session [{}] because session not found", sessionUUID);
throw new IllegalArgumentException("session [" + sessionUUID + "] not found");
}
// Ensure no file system traversal is possible by only allow file names known to the restore session
if (false == restore.getMetadata().contains(fileName)) {
throw new IllegalArgumentException("invalid file name [" + fileName + "]");
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether these methods need to be sychronized. I will seek advices from the distributed team.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been advised that synchronization are not necessary for these methods since they do not mutate state.

Comment on lines +199 to +209
// query cluster error cases - aliases not supported
{
final Request putCcrRequest = new Request("PUT", "/follower-index-3/_ccr/follow?wait_for_active_shards=1");
putCcrRequest.setJsonEntity("""
{
"remote_cluster": "my_remote_cluster",
"leader_index": "shared-alias"
}""");
final ResponseException e = expectThrows(ResponseException.class, () -> performRequestWithCcrUser(putCcrRequest));
assertThat(e.getResponse().getStatusLine().getStatusCode(), equalTo(400));
assertThat(e.getMessage(), containsString("cannot follow [shared-alias], because it is a ALIAS"));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a negative test case specifically for aliases since the indices action has an implicit assumption that the permission must be granted on the concrete indices.

@ywangd ywangd requested a review from n1v0lg April 26, 2023 07:57
@gmarouli gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023
Comment on lines +107 to +109
final Engine.IndexCommitRef commitRef = indexShard.acquireSafeIndexCommit();
final Set<String> fileNames = Set.copyOf(commitRef.getIndexCommit().getFileNames());
restore = new RestoreSession(sessionUUID, indexShard, commitRef, fileNames, scheduleTimeout(sessionUUID));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing the known list of file names on RestoreSession creation time so that later checks for known file name is lightweight.
Thanks to @Tim-Brooks for the suggestion!

Copy link
Contributor

@n1v0lg n1v0lg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@ywangd ywangd added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Apr 28, 2023
@ywangd
Copy link
Member Author

ywangd commented Apr 28, 2023

Failures are not related. I raised
#95644
#95645

@ywangd
Copy link
Member Author

ywangd commented Apr 28, 2023

@elasticmachine run elasticsearch-ci/part-2

@ywangd
Copy link
Member Author

ywangd commented Apr 28, 2023

@elasticmachine run elasticsearch-ci/part-2-fips

@ywangd
Copy link
Member Author

ywangd commented Apr 28, 2023

@elasticmachine run elasticsearch-ci/part-2

@elasticsearchmachine elasticsearchmachine merged commit a2c9c0b into elastic:main Apr 28, 2023
@ywangd ywangd deleted the rcs2-ccr-support branch April 28, 2023 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >non-issue :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants