-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Refactor es_index_patterns for searchability #180553
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
Pinging @elastic/fleet (Team:Fleet) |
es_index_patterns: { | ||
dynamic: false, | ||
properties: { | ||
pattern: { type: 'keyword' }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doubt: isn't it mismatching with the backfilled title
property later used in the backfillFn
function?
Also, should we keep it aligned and update the mapping with title
instead of pattern
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, I forgot to backport this change from my other PR integrating it with the integration+datastream search. Should be fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some other places in the PR where pattern
is still used in place of title
, like in some tests and the current_mappings file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urgh, went through these.
…3/kibana into 177731-search-across-datasets-2
… src/core/server/integration_tests/ci_checks'
backfillFn: (doc) => { | ||
return { | ||
attributes: { | ||
es_index_patterns: Object.entries(doc.attributes.es_index_patterns).map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check for null here? Object.entries
throws on null/undefined.
💔 Build FailedFailed CI Steps
Test Failures
Metrics [docs]Public APIs missing exports
HistoryTo update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment
type: 'data_backfill', | ||
backfillFn: (doc) => { | ||
return { | ||
attributes: { | ||
es_index_patterns: Object.entries(doc.attributes.es_index_patterns).map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, if I read the thing correctly, this is not a data backfill you're doing here, it's an incompatible mutation of existing data that would break BWC on serverless?
Are we really fine with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, yes, this would be incompatible. What's the best approach for this? Renaming the property to something else so there is no overlap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a different property is a start, but I'm afraid this won't be sufficient.
To be absolutely transparent, what you're trying to do here (BWC data mutation) isn't really supported today due to zero downtime migrations (and lack of documentation on how you could do it)
In theory, your model must remain compatible in case of rollback, meaning that changes can basically only be additional properties (potentially hidden to older versions using the forwardCompatibility
schema.
kibana/packages/core/saved-objects/core-saved-objects-server/src/model_version/schemas.ts
Lines 18 to 32 in fd068da
/** | |
* The schema applied when retrieving documents of a higher version from the cluster. | |
* Used for multi-version compatibility in managed environments. | |
* | |
* When retrieving a savedObject document from an index, if the version of the document | |
* is higher than the latest version known of the Kibana instance, the document will go | |
* through the `forwardCompatibility` schema of the associated model version. | |
* | |
* E.g a Kibana instance with model version `2` for type `foo` types fetches a `foo` document | |
* at model version `3`. The document will then go through the `forwardCompatibility` | |
* of the model version 2 (if present). | |
* | |
* See {@link SavedObjectModelVersionForwardCompatibilitySchema} for more info. | |
*/ | |
forwardCompatibility?: SavedObjectModelVersionForwardCompatibilitySchema; |
So if you want to refactor the shape of this existing es_index_patterns
field, The proper way would be to use a "2-stage migration" and do it on two different releases:
- release 1 introduces the new field, but still utilize the old one, and keep the two in sync (meaning that writes to the old field must be reflected on the new one and vice-versa, at the application layer, because there's no tooling to do it as the SO level)
- release 2 will utilize the new field but keep the field sync
- release 3 will cut the sync
And before you ask, no, there's no tooling or automated way to generate model versions automatically bound/applied to the next releases today, so all this has to be done manually.
TLDR: with serverless, one does not simply change the shape of its data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK this is the first time since we released ZDT migrations that we have a PR with a need for data mutation in a migration. Let me check with the team and get back to you regarding what our options are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pgayvallet - the multi-release solution sounds a bit involved for the case at hand but I guess that's how these things have to work now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a lot of work, and the tooling and documentation isn't there.
Given we're not GA yet, I'm checking with the team if we can just 🙈 for this time and improve the workflow on our side for the next time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we've been discussing it with the team and @lukeelmers yesterday, and we came to the conclusion that we unfortunately cannot just let teams do that kind of data transformation the unsafe way: even though we're not GA, the risk of corrupted data following a rollback is something we can't afford, and rollbacks aren't the only risk (in practice, the data migration is performed after the deployments are upgraded, meaning that the fields aren't available - searchable mostly - at soon as the new pods are up)
We also know that we're currently lacking proper tooling and documentation for multi-stage migrations (because we're not officially supporting it today, basically), and we would be fine working on it, however this can take time to polish the design that was suggested when we designed ZDT and implement a proper solution, so this would be more a matter of weeks than days if we were to start today.
However, if waiting until the functionality is ready on our side is not acceptable for you, we can commit on helping and guiding you implementing a solution.
To summarize, the options (we think) you have are:
1. Not doing data mutation
I'm lacking some context (even if I saw the comment this is coming from, so I don't know how viable this is, but would keeping the current shape of the data (and just adding the associated mappings to it to be searchable) be a potential option? Maybe with a flattened
field (if you hopefully only need to search on values and not keys)?
If so, this would clearly be the simplest, as all alternatives will imply multi-stage migrations.
2. Use the same field with different data formats
If the mappings of the old and new data format are compatible (e.g. with the data being an object in the old version and an array of the same shape of object in the new version), it could be possible to do a 2-stage migration with:
- release 1: the application recognize the 2 shapes of the data (array or object), but always write in the old format (no migration is added)
- release 2: the application write in the new format, and the migration to convert the existing data is introduced
However, looking at how you're transforming your data in the migration you added, I don't think that's an option as they can't share the same mappings (correct?), so I feel like option 1
and the flattened mapping approach would be more realistic.
3. Use a new field for the new format and synchronize the 2 fields
This is the workflow I described previously in this thread, that uses 3 deployment stages in total:
- release 1: add the new field and keep the two in sync (manually), but the app/business layer still utilize the old one.
- release 2: switch to utilizing the new field but keep the field sync (for rollbacks)
- release 3: cut the field sync
This is, in theory, quite straightforward, however the fact that we lack the tooling at the migration and release level makes it harder than it should today:
-
the field synchronization must be done in the application layer, given we don't provide write hooks on the SO APIs right now, which if I trust the first sentence from your comment here,
"Unfortunately there is no central place where all updates/create calls go through"
, will not make things easy. However atm, this would be the only way to do so. -
synchronization with releases has to be done manually, meaning that it would require three PRs in total, and them being merged at the right time to be shipped in three different serverless releases.
In summary
-
If you can avoid changing the shape of your data right now, we would encourage doing so, as we lack the tooling and documentation to let teams easily implement that today.
- If that would be acceptable to you but you'd still like to do it when the tooling is ready, please tell us, as it would help prioritizing this enhancement
-
If you need to do this right now, then I think you will have to follow
option 3
and do an multi-stage migration for the data. In that case, we can provide support and guidance to help you implementing it today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not doing data mutation only helps us so much, because on a separate PR I'm adding another property to this array that should be searchable as well.
Also, I tried the flattened
thing as one of the first options, and it seems like Elasticsearch doesn't let you do this:
Click me
PUT mytestindex
{
"mappings": {
"properties": {
"obj": {
"dynamic": false,
"properties": {}
}
}
}
}
PUT mytestindex/_mapping
{
"properties": {
"obj": {
"type": "flattened"
}
}
}
results in
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "can't merge a non object mapping [obj] with an object mapping"
}
],
"type": "illegal_argument_exception",
"reason": "can't merge a non object mapping [obj] with an object mapping"
},
"status": 400
}
The multi-stage thing seems like the right approach, however I would like to get @kpollich or @juliaElastic s input here as this is their area of the code.
Considering all of this (especially #180618 ), how do you think we should proceed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if the first approach doesn't work, going with the third one makes sense, as that's the only one that works with ZDT.
I started a new series of stacked PRs for this: #180684 |
Related to #177731
This changes the shape of
es_index_patterns
in the epm package saved object to make it easier to search on it.See #177731 (comment) for an explanation of the idea