Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create system indices on-demand #64533

Closed

Conversation

pugnascotia
Copy link
Contributor

Part of #61656.

This PR introduces a service that makes it straightforward to automatically create an index on-demand. This pattern already existed in the codebase, but varied in its implementation.

Notes to reviewers:

  • I do not like the names I have used, and would love some alternatives!
  • There are more plugins in scope for this change, but I want some feedback on direction before tackling the larger (and significantly more complex) plugins.
  • The new service's implementation is largely pinched from BlobStoreCacheService

@pugnascotia pugnascotia added >enhancement :Core/Infra/Core Core issues without another label v8.0.0 labels Nov 3, 2020
Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that it might be best to take a different approach and hook into the index creation mechanisms. My thought was mappings and settings get defined on a system index descriptor and we know by the name of the index what mappings and settings to apply to it, so when we get a create index request for a system index we know exactly what to do (at least for the ones we intend to have the system manage creation for).

@pugnascotia
Copy link
Contributor Author

@jaymode that's an interesting option. I assume we'd hook into AutoCreateAction and detect when a system index was being auto-created?

That made me wonder - do we have any guards against a user creating e.g. .snapshot-blob-cache index, which then stops the BlobStoreCacheService from creating the index with the correct mappings? I suppose it's the opposite case from above - when an auto-create index request comes in, and it matches a registered system index name, then we need to check who is trying to create the index.

Actually, I wonder if we can expand that do any auto-create index request that matches the system index patterns e.g. .kibana_*? We'd need some way of know who was allowed to create such indices and who wasn't.

@jaymode
Copy link
Member

jaymode commented Nov 6, 2020

do we have any guards against a user creating e.g. .snapshot-blob-cache index, which then stops the BlobStoreCacheService from creating the index with the correct mappings

No we do not. This a goal for the project and I view this as a step to getting us there.

I suppose it's the opposite case from above - when an auto-create index request comes in, and it matches a registered system index name, then we need to check who is trying to create the index.

I'm not sure if who matters as much as the index being created correctly at the moment. We can implement security restrictions so that only the kibana_system user can create the Kibana system index later if we want to, but I think our first focus is getting everything in order so that we know the indices are created properly.

In my mind, I see us doing something like:

  • Add method to SystemIndexDescriptor to see if an index is internally managed
  • Add mappings and settings to SystemIndexDescriptor (only need to be present if internally managed)
  • Within the MetadataCreateIndexService get the mappings and settings if it is a system index
    • Add validation to the create requests here too, so that if something tries to create a system index that is internally managed the provided mappings/settings should never be supplied. We may need to version guard this so that requests from an earlier version don't fail?

@gwbrown @williamrandolph thoughts?

@gwbrown
Copy link
Contributor

gwbrown commented Nov 6, 2020

In my mind, I see us doing something like [...]

This is pretty much what I had in mind as well.

Something like this to ensure the index has been created prior to every indexing operation might be necessary as things stand today, because it's very possible for the index to unexpectedly get deleted. But in the brave new world of protected system indices, we won't have to really worry about that case - we can either make sure that the index is created at startup (if that's necessary) or ensure that the right settings/mapping is applied when the index is auto-created.

@pugnascotia pugnascotia changed the title Add service to create an index on-demand Create system indices on-demand Nov 11, 2020
@pugnascotia
Copy link
Contributor Author

I've pushed a swathe of changes to create system indices on-demand. I also changed the security index to use this, since it's a complicated example. A whole bunch of tests are going to fail, I expect, but I want to see what happens 😄

Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look and left a couple of things I think we need to work on.


/**
* This class holds the {@link SystemIndexDescriptor} objects that represent system indices the
* node knows about. Methods for determining if an index should be a system index are also provided
* to reduce the locations within the code that need to deal with {@link SystemIndexDescriptor}s.
*/
public class SystemIndices {
public class SystemIndices implements ClusterStateListener {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we separate the "SystemIndexManager" out of the SystemIndices class? I like that SystemIndices is fairly lightweight and compact. The manager (and other services) will depend on SystemIndices, like SystemIndexMetadataUpgradeService.

if (descriptor.getAliasName() == null) {
aliases = Set.of();
} else {
concreteIndexName = descriptor.getIndexPattern();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need a method to get the current concrete index name? Maybe someone defines a pattern with a * to cover a variety of system indices but we should know what the concrete value to create is (ie security_6 vs security_7).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaymode @gwbrown Right now I feel that there's a conflict between defining a system index pattern with a wildcard, and being able to automatically create it, essentially because we have to define the concrete / underlying index name somewhere.

At the moment, this branch assumes that if descriptor.isAutomaticallyManaged() returns true, then descriptor.getIndexPattern() can be used as the concrete index name. Partly for that reason, the Security plugin doesn't define a descriptor for .security-*, but instead defines a .security-6 descriptor and a .security-7 descriptor, but only the latter has the .security alias defined.

That being said, I'm not sure how this scheme would work if we needed to introduce a .security-8 for some reason. Clearly in a new cluster, .security-8 should be auto-created with the .security alias pointing to it, but what about in an upgraded deployment?

I wonder whether a descriptor should be able to define:

  • An index pattern
  • One or more concrete index names, with optional mappings and settings
  • An alias-to-index mapping, which also represents what the latest concrete index name is
  • Perhaps a method to migrate between indices, e.g. convert .foo-7 to .foo-8, and re-point the alias?

So the for Security, we'd have one descriptor containing:

  • An index pattern of .security*
  • An index definition for .security-7 (it wouldn't be necessary to define one for .security-6 any more)
  • The alias .security, pointing at .security-7.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to separately have a reserved pattern and also have the ability to define concrete indices with aliases, so I agree with your proposal.

There will be follow-on work involved since we should prevent creation of aliases that match the index pattern, unless the alias definition matches that which the system knows about.

@pugnascotia
Copy link
Contributor Author

I have updated the PR after going over the mappings upgrade code more carefully, and adding test coverage.

@droberts195
Copy link
Contributor

Currently the ML system indices are having the correct mappings applied via index templates. Will the cease to work as soon as this PR is merged?

I know we need to switch over the ML system index mappings from index templates to the new system index descriptor mappings mechanism at some point before 8.0 release, but the timing depends on the effects of this PR. If index templates will stop being considered for system indices when this PR is merged then the switch needs to happen in this PR. Someone from the ML team can help with this by pushing to the PR branch when the basic framework is complete. Or if there will still be a period when index templates are considered for system indices then we can do the ML switchover as a separate followup PR.

@pugnascotia
Copy link
Contributor Author

@droberts195 there's no hard requirement for ML to switch in this PR. I migrated various plugins so I could see how they would look when being automatically managed, but we can easily do that work later. So if it would be more convenient for ML to switch over at a later point, I'm happy to make sure that happens. I had thought I'd break up the PR before merging anyway, since it affects various Security classes too.

@pugnascotia
Copy link
Contributor Author

@droberts195 actually, sorry, I looked again at the ML changes and ML won't be affected by this - its system index descriptors don't define any of the fields that would mark them as automatically managed.

@droberts195
Copy link
Contributor

OK, great, thanks @pugnascotia. At some point before 8.0.0-alpha1 we'll create a followup PR that says the ML system indices are automatically managed and moves their mapping definitions over to the system index descriptors. It's great that it doesn't all have to be done in one big PR.

@pugnascotia
Copy link
Contributor Author

I've split out the infrastructure parts of this PR into #65604.

@jaymode
Copy link
Member

jaymode commented Jan 13, 2021

Shall we close this one @pugnascotia?

@pugnascotia
Copy link
Contributor Author

Sure can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >enhancement v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants