Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Security to use auto-managed system indices #67114

Merged
merged 15 commits into from
Feb 2, 2021

Conversation

pugnascotia
Copy link
Contributor

Part of #61656.

Change the Security plugin so that its system indices are managed automatically by the system indices infrastructure.

@pugnascotia pugnascotia added >refactoring :Security/Security Security issues without another label v8.0.0 v7.12.0 labels Jan 6, 2021
@elasticmachine elasticmachine added the Team:Security Meta label for security team label Jan 6, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I left a comment about an edge case that I think needs addressed

@@ -350,53 +308,6 @@ public void prepareIndexIfNeededThenExecute(final Consumer<Exception> consumer,
} else if (indexState.indexExists() && indexState.isIndexUpToDate == false) {
throw new IllegalStateException("Index [" + indexState.concreteIndexName + "] is not on the current version."
+ "Security features relying on the index will not be available until the upgrade API is run on the index");
} else if (indexState.indexExists() == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need to handle the case where the master node doesn't know about the security system index, so for BWC we should maintain this code somehow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting point.
I think we presently don't handle the cases of index auto-creation and mapping updates too well.
if .security does not yet exist, rolling update is in progress, and an API that creates the index for the first time hits a new node, the index will be created, but the other old nodes will subsequently complain that the index is not up to date (because the format number is greater) or that the mapping is not up to date (because the mapping version is greater).

I think this might be a common issue, I vaguely remember something similar for async search. Would it be possible to defer the system index upgrade (of the mapping and of the settings/metadata) until the rolling upgrade is complete, and let the security business logic deal with possibly storing entities in the old format, or if that's not possible, return a response failure informing that the requested API (with its parameters) is not available in a mixed cluster (similar to how no APIs are available until the cluster state has been recovered)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reinstated the code for creating the security index, while changing the code so that it relies on the auto-create logic. I've also re-added code for the mappings not being up-to-date, where the runnable is ignore and the exception consumer is called. Is that what you had in mind?

I've also changed a number of locations to check for a different in minimum and maximum ES version across the cluster nodes. If there is a different, the auto-create logic will refuse to run, and the SystemIndexManager will not try to update mappings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reinstated the code for creating the security index

I'm not sure this is need, the auto-create should handle it (now that auto-create has a predictable behaviour in a mixed cluster).

I've also re-added code for the mappings not being up-to-date, where the runnable is ignore and the exception consumer is called. Is that what you had in mind?

This fixes the problem, yes.

I've also changed a number of locations to check for a different in minimum and maximum ES version across the cluster nodes.

This is very cool. I like it that there is a clear, testable behavior in a mixed cluster scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if the create call is kept around, the rename from prepareIndexIfNeededThenExecute to checkIndexStateThenExecute is less fortunate 👿

@albertzaharovits
Copy link
Contributor

@pugnascotia I'm also going to take a look at this tomorrow. I hope that's alright 🙂

@pugnascotia
Copy link
Contributor Author

@albertzaharovits I would be very happy if you could take a look 🙏

@@ -350,53 +308,6 @@ public void prepareIndexIfNeededThenExecute(final Consumer<Exception> consumer,
} else if (indexState.indexExists() && indexState.isIndexUpToDate == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes around here are neat given that this is a refactoring! 👍
But I think we should let the runnable through if the index does not exist, and only then.
If the index exists, and it has an old mapping version we should hold off the runnable, because it can race with the service that updates the mapping. Related side note: can the mapping update service maybe hook into the "auto put mapping action" in a similar fashion that the system index creation hooks into the "index auto create action"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed TransportPutMappingAction to enforce that any attempt to change mappings on a system index must supply the same mappings as the system index descriptor contains. I haven't updated TransportAutoPutMappingAction because I don't understand when that action is used or why.

Copy link
Contributor

@albertzaharovits albertzaharovits Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to dwell too much on it, but IMO I think it is worth investing in adapting the auto mapping update action. It is added recently, so I don't expect there to be too obscure behaviors encoded. The reason I mention it is that moving the mapping update from a ClusterStateListener to another (from SecurityIndexManager to SystemIndexManager) is not a great step forward.

Just a suggestion, I haven't investigated it carefully, and it's outside security's concern.

Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left two comments about the core of it.
I think we have to think through mapping and metadata upgrades in a mixed cluster scenario, and that index requests don't race with the mapping updates.

Despite that, I think this is looking very promising, we've been eager for a long time to get rid of the update logic in the SecurityIndexManager.

Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM though the important changes are outside Security Area's purview.
Thank you Rory!


private XContentBuilder getIndexMappings() {
try {
final XContentBuilder builder = jsonBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would prefer the mapping as a resource file, I wonder what's the reason for this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the example of other plugins that auto-created their indices. It does also make it harder to change the mappings, versus opening up a jar file and editing the json.

CreateIndexClusterStateUpdateRequest updateRequest = descriptor != null && descriptor.isAutomaticallyManaged()
final boolean isSystemIndex = descriptor != null && descriptor.isAutomaticallyManaged();

if (isSystemIndex && state.nodes().getMaxNodeVersion().after(state.nodes().getMinNodeVersion())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concerns me if we cannot auto create a system index in a mixed version cluster even if the descriptor itself does not differ between versions. I think we should allow for creation still. I wonder if we should consider an approach to pull the most up to date version of the descriptor and apply those when creating the index; however there may be cases where that would fail if a feature was used that couldn't be validated on the master. In those cases I would consider falling back to the master version and allowing the master to update the mappings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can pull the most up-to-date version of the descriptor into the master, at least not without building infrastructure to make it pull-able.

How about: in the auto-create case, we always use the master's descriptor. If we release some new feature that needs to create a system index and relies on particular features being present, perhaps it can explicitly create the index using the right settings, either waiting until the cluster is in the right state, or retrying until the creation is successful. Admittedly this is just shifting the problem, but the general system index infra can't know at the moment whether it's OK to attempt creation in a mixed cluster.

I suppose we could extend the descriptor to apply a version range? e.g.

SystemIndexDescriptor.builder().setMinimimVersion(Version.V_8_0_0)

Then the auto-create, explicit create, mappings and settings methods can check the minimum descriptor version against state.nodes().getMinNodeVersion(). Code that relied on new features would still need to concern themselves with whether it was possible yet to create their system index. If that case becomes common/annoying enough, we can build more support infra.

I think I still want the SystemIndexManager to hold off upgrading mappings in a mixed cluster - partly in case of a failed upgrade, and partly to avoid disagreements around mappings versions. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the general system index infra can't know at the moment whether it's OK to attempt creation in a mixed cluster

I think this is the crux, and we have to delegate this responsibility to the caller (eg Security(IndexManager)). My suggestion to fail the creation in the mixed cluster scenario was the simplest, in a sense, because the caller doesn't have to guard the system document index call with anything. But I haven't really considered the practical aspect that much more upgrades are done than changes to the system index's mapping. Moreover, most mapping changes are focused on a small portion of the total mapping. Preventing the creation of the system index in all these cases is wasteful, as the creation with the old mapping would be OK.

So, if the decision is to not reject system index creation (or mapping updates) in a mixed cluster scenario (whatever the strategy ultimately is), when indexing a document relying on new mappings (eg a new type of API key) we have to look at the cluster state, and probably reject the indexing (there could be other options, depending on the particular case). I think that's OK; I hoped we could avoid it, but not creating any index in the other cases, is probably not worth it.

Allow system indices to be created or auto-created so long as their
minimum node version requirement is met.

// `TransportCreateIndexAction` will automatically apply the right mappings, settings and aliases, so none
// of that needs to be specified here.
CreateIndexRequest request = new CreateIndexRequest(indexState.concreteIndexName).waitForActiveShards(ActiveShardCount.ALL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to validate the mappings are good to go in cases where this node is not the master and there are mixed versions in the cluster?

Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pugnascotia pugnascotia merged commit 6ffddf8 into elastic:master Feb 2, 2021
@pugnascotia pugnascotia deleted the 61656-auto-create-security branch February 2, 2021 13:43
pugnascotia added a commit to pugnascotia/elasticsearch that referenced this pull request Feb 2, 2021
Backport of elastic#67114. Part of elastic#61656.

Change the Security plugin so that its system indices are managed automatically
by the system indices infrastructure.

Also add an `origin` field to `CreateIndexRequest` and `UpdateSettingsRequest`.
pugnascotia added a commit that referenced this pull request Feb 8, 2021
Re-apply changes from 0c9b9c1, which migrates the `.tasks` system index
to be managed automatically by the system indices infrastructure.

Changes went into #67114 that, I hope, will avoid the problems we saw before
in the BWC tests in CI.
pugnascotia added a commit to pugnascotia/elasticsearch that referenced this pull request Feb 8, 2021
Re-apply changes from 0c9b9c1, which migrates the `.tasks` system index
to be managed automatically by the system indices infrastructure.

Changes went into elastic#67114 that, I hope, will avoid the problems we saw before
in the BWC tests in CI.
pugnascotia added a commit that referenced this pull request Feb 9, 2021
Backport of #67114. Part of #61656.

Change the Security plugin so that its system indices are managed automatically
by the system indices infrastructure.

Also add an `origin` field to `CreateIndexRequest` and `UpdateSettingsRequest`.
pugnascotia added a commit to pugnascotia/elasticsearch that referenced this pull request Feb 9, 2021
While backporting elastic#67114 via elastic#68375, I realised that there are existing
upgrade scenarios that expect the `SecurityIndexManager` to update index
mappings, so in the backport PR, this capability was reinstated. This
commit does the same in `master`.
pugnascotia added a commit that referenced this pull request Feb 9, 2021
Re-apply changes from 0c9b9c1, which migrates the `.tasks` system index
to be managed automatically by the system indices infrastructure.

Changes went into #67114 that, I hope, will avoid the problems we saw before
in the BWC tests in CI.
pugnascotia added a commit that referenced this pull request Feb 9, 2021
While backporting #67114 via #68375, I realised that there are existing
upgrade scenarios that expect the `SecurityIndexManager` to update index
mappings, so in the backport PR, this capability was reinstated. This
commit does the same in `master`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>refactoring :Security/Security Security issues without another label Team:Security Meta label for security team v7.12.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants