[Fleet] Refactor es_index_patterns for searchability #180553

flash1293 · 2024-04-11T07:16:23Z

Related to #177731

This changes the shape of es_index_patterns in the epm package saved object to make it easier to search on it.

See #177731 (comment) for an explanation of the idea

apmmachine · 2024-04-11T07:16:38Z

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

/oblt-deploy : Deploy a Kibana instance using the Observability test environments.
/oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

elasticmachine · 2024-04-11T09:07:40Z

Pinging @elastic/fleet (Team:Fleet)

tonyghiani · 2024-04-11T09:24:11Z

x-pack/plugins/fleet/server/saved_objects/index.ts

+              es_index_patterns: {
+                dynamic: false,
+                properties: {
+                  pattern: { type: 'keyword' },


doubt: isn't it mismatching with the backfilled title property later used in the backfillFnfunction?
Also, should we keep it aligned and update the mapping with title instead of pattern?

my bad, I forgot to backport this change from my other PR integrating it with the integration+datastream search. Should be fixed now.

I see some other places in the PR where pattern is still used in place of title, like in some tests and the current_mappings file.

urgh, went through these.

…3/kibana into 177731-search-across-datasets-2

…te --fix'

… src/core/server/integration_tests/ci_checks'

juliaElastic · 2024-04-11T11:03:50Z

x-pack/plugins/fleet/server/saved_objects/index.ts

+            backfillFn: (doc) => {
+              return {
+                attributes: {
+                  es_index_patterns: Object.entries(doc.attributes.es_index_patterns).map(


should we check for null here? Object.entries throws on null/undefined.

kibana-ci · 2024-04-11T11:41:52Z

💔 Build Failed

Failed CI Steps

Test Failures

[job] [logs] FTR Configs #24 / EPM Endpoints installs and uninstalls all assets installs all assets when installing a package for the first time should have created the correct saved object
[job] [logs] FTR Configs #24 / EPM Endpoints installs and uninstalls all assets installs all assets when installing a package for the first time should have created the correct saved object
[job] [logs] Jest Tests #2 / installAssetsForInputPackagePolicy should install es index patterns for input package if package is installed
[job] [logs] Jest Tests #2 / installAssetsForInputPackagePolicy should install es index patterns for input package if package is installed
[job] [logs] FTR Configs #11 / security APIs - Kerberos Kerberos authentication finishing SPNEGO should properly set cookie and authenticate user
[job] [logs] Jest Integration Tests #6 / transition from md5 hashes to model versions ensures the hashToVersionMap does not miss any mappings changes
[job] [logs] Jest Integration Tests #6 / transition from md5 hashes to model versions ensures the hashToVersionMap does not miss any mappings changes

Metrics [docs]

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id	before	after	diff
`fleet`	61	62	+1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

pgayvallet

See comment

pgayvallet · 2024-04-11T12:22:04Z

x-pack/plugins/fleet/server/saved_objects/index.ts

+            type: 'data_backfill',
+            backfillFn: (doc) => {
+              return {
+                attributes: {
+                  es_index_patterns: Object.entries(doc.attributes.es_index_patterns).map(


Hum, if I read the thing correctly, this is not a data backfill you're doing here, it's an incompatible mutation of existing data that would break BWC on serverless?

Are we really fine with that?

Good point, yes, this would be incompatible. What's the best approach for this? Renaming the property to something else so there is no overlap?

Using a different property is a start, but I'm afraid this won't be sufficient.

To be absolutely transparent, what you're trying to do here (BWC data mutation) isn't really supported today due to zero downtime migrations (and lack of documentation on how you could do it)

In theory, your model must remain compatible in case of rollback, meaning that changes can basically only be additional properties (potentially hidden to older versions using the forwardCompatibility schema.

kibana/packages/core/saved-objects/core-saved-objects-server/src/model_version/schemas.ts

Lines 18 to 32 in fd068da

/**

* The schema applied when retrieving documents of a higher version from the cluster.

* Used for multi-version compatibility in managed environments.

*

* When retrieving a savedObject document from an index, if the version of the document

* is higher than the latest version known of the Kibana instance, the document will go

* through the `forwardCompatibility` schema of the associated model version.

*

* E.g a Kibana instance with model version `2` for type `foo` types fetches a `foo` document

* at model version `3`. The document will then go through the `forwardCompatibility`

* of the model version 2 (if present).

*

* See {@link SavedObjectModelVersionForwardCompatibilitySchema} for more info.

*/

forwardCompatibility?: SavedObjectModelVersionForwardCompatibilitySchema;

)

So if you want to refactor the shape of this existing es_index_patterns field, The proper way would be to use a "2-stage migration" and do it on two different releases:

release 1 introduces the new field, but still utilize the old one, and keep the two in sync (meaning that writes to the old field must be reflected on the new one and vice-versa, at the application layer, because there's no tooling to do it as the SO level)

release 2 will utilize the new field but keep the field sync

release 3 will cut the sync

And before you ask, no, there's no tooling or automated way to generate model versions automatically bound/applied to the next releases today, so all this has to be done manually.

TLDR: with serverless, one does not simply change the shape of its data.

AFAIK this is the first time since we released ZDT migrations that we have a PR with a need for data mutation in a migration. Let me check with the team and get back to you regarding what our options are.

Thanks @pgayvallet - the multi-release solution sounds a bit involved for the case at hand but I guess that's how these things have to work now.

Yeah, it's a lot of work, and the tooling and documentation isn't there.

Given we're not GA yet, I'm checking with the team if we can just 🙈 for this time and improve the workflow on our side for the next time

So, we've been discussing it with the team and @lukeelmers yesterday, and we came to the conclusion that we unfortunately cannot just let teams do that kind of data transformation the unsafe way: even though we're not GA, the risk of corrupted data following a rollback is something we can't afford, and rollbacks aren't the only risk (in practice, the data migration is performed after the deployments are upgraded, meaning that the fields aren't available - searchable mostly - at soon as the new pods are up)

We also know that we're currently lacking proper tooling and documentation for multi-stage migrations (because we're not officially supporting it today, basically), and we would be fine working on it, however this can take time to polish the design that was suggested when we designed ZDT and implement a proper solution, so this would be more a matter of weeks than days if we were to start today.

However, if waiting until the functionality is ready on our side is not acceptable for you, we can commit on helping and guiding you implementing a solution.

To summarize, the options (we think) you have are:

1. Not doing data mutation

I'm lacking some context (even if I saw the comment this is coming from, so I don't know how viable this is, but would keeping the current shape of the data (and just adding the associated mappings to it to be searchable) be a potential option? Maybe with a flattened field (if you hopefully only need to search on values and not keys)?

If so, this would clearly be the simplest, as all alternatives will imply multi-stage migrations.

2. Use the same field with different data formats

If the mappings of the old and new data format are compatible (e.g. with the data being an object in the old version and an array of the same shape of object in the new version), it could be possible to do a 2-stage migration with:

release 1: the application recognize the 2 shapes of the data (array or object), but always write in the old format (no migration is added)

release 2: the application write in the new format, and the migration to convert the existing data is introduced

However, looking at how you're transforming your data in the migration you added, I don't think that's an option as they can't share the same mappings (correct?), so I feel like option 1 and the flattened mapping approach would be more realistic.

3. Use a new field for the new format and synchronize the 2 fields

This is the workflow I described previously in this thread, that uses 3 deployment stages in total:

release 1: add the new field and keep the two in sync (manually), but the app/business layer still utilize the old one.

release 2: switch to utilizing the new field but keep the field sync (for rollbacks)

release 3: cut the field sync

This is, in theory, quite straightforward, however the fact that we lack the tooling at the migration and release level makes it harder than it should today:

the field synchronization must be done in the application layer, given we don't provide write hooks on the SO APIs right now, which if I trust the first sentence from your comment here, "Unfortunately there is no central place where all updates/create calls go through", will not make things easy. However atm, this would be the only way to do so.

synchronization with releases has to be done manually, meaning that it would require three PRs in total, and them being merged at the right time to be shipped in three different serverless releases.

In summary

If you can avoid changing the shape of your data right now, we would encourage doing so, as we lack the tooling and documentation to let teams easily implement that today.

If that would be acceptable to you but you'd still like to do it when the tooling is ready, please tell us, as it would help prioritizing this enhancement

If you need to do this right now, then I think you will have to follow option 3 and do an multi-stage migration for the data. In that case, we can provide support and guidance to help you implementing it today.

Not doing data mutation only helps us so much, because on a separate PR I'm adding another property to this array that should be searchable as well.

Also, I tried the flattened thing as one of the first options, and it seems like Elasticsearch doesn't let you do this:

Click me

PUT mytestindex { "mappings": { "properties": { "obj": { "dynamic": false, "properties": {} } } } } PUT mytestindex/_mapping { "properties": { "obj": { "type": "flattened" } } }

results in

{ "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "can't merge a non object mapping [obj] with an object mapping" } ], "type": "illegal_argument_exception", "reason": "can't merge a non object mapping [obj] with an object mapping" }, "status": 400 }

The multi-stage thing seems like the right approach, however I would like to get @kpollich or @juliaElastic s input here as this is their area of the code.

Considering all of this (especially #180618 ), how do you think we should proceed here?

I think if the first approach doesn't work, going with the third one makes sense, as that's the only one that works with ZDT.

flash1293 · 2024-04-12T11:51:57Z

I started a new series of stacked PRs for this: #180684

refactor es_index_patterns

64603be

This was referenced Apr 11, 2024

[Logs Explorer] Use human readable dataset names from the manifest files #180450

Closed

[Logs Explorer] Search across datasets integrated with search #180563

Draft

Update index.ts

d6326f4

flash1293 marked this pull request as ready for review April 11, 2024 09:05

flash1293 requested review from a team as code owners April 11, 2024 09:06

flash1293 requested review from szwarckonrad and parkiino April 11, 2024 09:06

flash1293 added release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v8.14.0 labels Apr 11, 2024

flash1293 requested review from Kerry350 and tonyghiani April 11, 2024 09:07

tonyghiani reviewed Apr 11, 2024

View reviewed changes

flash1293 and others added 5 commits April 11, 2024 11:36

Update index.ts

e48400f

fix pattern/title thing

9e289b8

Merge branch '177731-search-across-datasets-2' of github.com:flash129…

640fe88

…3/kibana into 177731-search-across-datasets-2

[CI] Auto-commit changed files from 'node scripts/check_mappings_upda…

871a327

…te --fix'

[CI] Auto-commit changed files from 'node scripts/jest_integration -u…

0106d57

… src/core/server/integration_tests/ci_checks'

juliaElastic reviewed Apr 11, 2024

View reviewed changes

pgayvallet reviewed Apr 11, 2024

View reviewed changes

This was referenced Apr 11, 2024

[Logs Explorer] Global search across integrations/datasets on DataSourceSelector #177731

Open

[Fleet] Add dataset human readable names to epm-packages SO #180618

Closed

flash1293 closed this Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Refactor es_index_patterns for searchability #180553

[Fleet] Refactor es_index_patterns for searchability #180553

flash1293 commented Apr 11, 2024

apmmachine commented Apr 11, 2024

elasticmachine commented Apr 11, 2024

tonyghiani Apr 11, 2024

flash1293 Apr 11, 2024

tonyghiani Apr 11, 2024

flash1293 Apr 11, 2024

juliaElastic Apr 11, 2024

kibana-ci commented Apr 11, 2024 •

edited

Loading

pgayvallet left a comment

pgayvallet Apr 11, 2024

flash1293 Apr 11, 2024

pgayvallet Apr 11, 2024 •

edited

Loading

pgayvallet Apr 11, 2024

flash1293 Apr 11, 2024

pgayvallet Apr 11, 2024

pgayvallet Apr 12, 2024

flash1293 Apr 12, 2024

juliaElastic Apr 12, 2024

flash1293 commented Apr 12, 2024

	/**
	* The schema applied when retrieving documents of a higher version from the cluster.
	* Used for multi-version compatibility in managed environments.
	*
	* When retrieving a savedObject document from an index, if the version of the document
	* is higher than the latest version known of the Kibana instance, the document will go
	* through the `forwardCompatibility` schema of the associated model version.
	*
	* E.g a Kibana instance with model version `2` for type `foo` types fetches a `foo` document
	* at model version `3`. The document will then go through the `forwardCompatibility`
	* of the model version 2 (if present).
	*
	* See {@link SavedObjectModelVersionForwardCompatibilitySchema} for more info.
	*/
	forwardCompatibility?: SavedObjectModelVersionForwardCompatibilitySchema;

[Fleet] Refactor es_index_patterns for searchability #180553

[Fleet] Refactor es_index_patterns for searchability #180553

Conversation

flash1293 commented Apr 11, 2024

apmmachine commented Apr 11, 2024

🤖 GitHub comments

elasticmachine commented Apr 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kibana-ci commented Apr 11, 2024 • edited Loading

💔 Build Failed

Failed CI Steps

Test Failures

Metrics [docs]

Public APIs missing exports

History

pgayvallet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgayvallet Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

1. Not doing data mutation

2. Use the same field with different data formats

3. Use a new field for the new format and synchronize the 2 fields

In summary

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flash1293 commented Apr 12, 2024

kibana-ci commented Apr 11, 2024 •

edited

Loading

pgayvallet Apr 11, 2024 •

edited

Loading