Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Refactor es_index_patterns for searchability #180553

Closed

Conversation

flash1293
Copy link
Contributor

Related to #177731

This changes the shape of es_index_patterns in the epm package saved object to make it easier to search on it.

See #177731 (comment) for an explanation of the idea

@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@flash1293 flash1293 marked this pull request as ready for review April 11, 2024 09:05
@flash1293 flash1293 requested review from a team as code owners April 11, 2024 09:06
@flash1293 flash1293 added release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v8.14.0 labels Apr 11, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

es_index_patterns: {
dynamic: false,
properties: {
pattern: { type: 'keyword' },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doubt: isn't it mismatching with the backfilled title property later used in the backfillFnfunction?
Also, should we keep it aligned and update the mapping with title instead of pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, I forgot to backport this change from my other PR integrating it with the integration+datastream search. Should be fixed now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some other places in the PR where pattern is still used in place of title, like in some tests and the current_mappings file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urgh, went through these.

backfillFn: (doc) => {
return {
attributes: {
es_index_patterns: Object.entries(doc.attributes.es_index_patterns).map(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check for null here? Object.entries throws on null/undefined.

@kibana-ci
Copy link
Collaborator

kibana-ci commented Apr 11, 2024

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #24 / EPM Endpoints installs and uninstalls all assets installs all assets when installing a package for the first time should have created the correct saved object
  • [job] [logs] FTR Configs #24 / EPM Endpoints installs and uninstalls all assets installs all assets when installing a package for the first time should have created the correct saved object
  • [job] [logs] Jest Tests #2 / installAssetsForInputPackagePolicy should install es index patterns for input package if package is installed
  • [job] [logs] Jest Tests #2 / installAssetsForInputPackagePolicy should install es index patterns for input package if package is installed
  • [job] [logs] FTR Configs #11 / security APIs - Kerberos Kerberos authentication finishing SPNEGO should properly set cookie and authenticate user
  • [job] [logs] Jest Integration Tests #6 / transition from md5 hashes to model versions ensures the hashToVersionMap does not miss any mappings changes
  • [job] [logs] Jest Integration Tests #6 / transition from md5 hashes to model versions ensures the hashToVersionMap does not miss any mappings changes

Metrics [docs]

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
fleet 61 62 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment

Comment on lines +616 to +620
type: 'data_backfill',
backfillFn: (doc) => {
return {
attributes: {
es_index_patterns: Object.entries(doc.attributes.es_index_patterns).map(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, if I read the thing correctly, this is not a data backfill you're doing here, it's an incompatible mutation of existing data that would break BWC on serverless?

Are we really fine with that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, yes, this would be incompatible. What's the best approach for this? Renaming the property to something else so there is no overlap?

Copy link
Contributor

@pgayvallet pgayvallet Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a different property is a start, but I'm afraid this won't be sufficient.

To be absolutely transparent, what you're trying to do here (BWC data mutation) isn't really supported today due to zero downtime migrations (and lack of documentation on how you could do it)

In theory, your model must remain compatible in case of rollback, meaning that changes can basically only be additional properties (potentially hidden to older versions using the forwardCompatibility schema.

/**
* The schema applied when retrieving documents of a higher version from the cluster.
* Used for multi-version compatibility in managed environments.
*
* When retrieving a savedObject document from an index, if the version of the document
* is higher than the latest version known of the Kibana instance, the document will go
* through the `forwardCompatibility` schema of the associated model version.
*
* E.g a Kibana instance with model version `2` for type `foo` types fetches a `foo` document
* at model version `3`. The document will then go through the `forwardCompatibility`
* of the model version 2 (if present).
*
* See {@link SavedObjectModelVersionForwardCompatibilitySchema} for more info.
*/
forwardCompatibility?: SavedObjectModelVersionForwardCompatibilitySchema;
)

So if you want to refactor the shape of this existing es_index_patterns field, The proper way would be to use a "2-stage migration" and do it on two different releases:

  • release 1 introduces the new field, but still utilize the old one, and keep the two in sync (meaning that writes to the old field must be reflected on the new one and vice-versa, at the application layer, because there's no tooling to do it as the SO level)
  • release 2 will utilize the new field but keep the field sync
  • release 3 will cut the sync

And before you ask, no, there's no tooling or automated way to generate model versions automatically bound/applied to the next releases today, so all this has to be done manually.

TLDR: with serverless, one does not simply change the shape of its data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this is the first time since we released ZDT migrations that we have a PR with a need for data mutation in a migration. Let me check with the team and get back to you regarding what our options are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pgayvallet - the multi-release solution sounds a bit involved for the case at hand but I guess that's how these things have to work now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a lot of work, and the tooling and documentation isn't there.

Given we're not GA yet, I'm checking with the team if we can just 🙈 for this time and improve the workflow on our side for the next time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we've been discussing it with the team and @lukeelmers yesterday, and we came to the conclusion that we unfortunately cannot just let teams do that kind of data transformation the unsafe way: even though we're not GA, the risk of corrupted data following a rollback is something we can't afford, and rollbacks aren't the only risk (in practice, the data migration is performed after the deployments are upgraded, meaning that the fields aren't available - searchable mostly - at soon as the new pods are up)

We also know that we're currently lacking proper tooling and documentation for multi-stage migrations (because we're not officially supporting it today, basically), and we would be fine working on it, however this can take time to polish the design that was suggested when we designed ZDT and implement a proper solution, so this would be more a matter of weeks than days if we were to start today.

However, if waiting until the functionality is ready on our side is not acceptable for you, we can commit on helping and guiding you implementing a solution.

To summarize, the options (we think) you have are:

1. Not doing data mutation

I'm lacking some context (even if I saw the comment this is coming from, so I don't know how viable this is, but would keeping the current shape of the data (and just adding the associated mappings to it to be searchable) be a potential option? Maybe with a flattened field (if you hopefully only need to search on values and not keys)?

If so, this would clearly be the simplest, as all alternatives will imply multi-stage migrations.

2. Use the same field with different data formats

If the mappings of the old and new data format are compatible (e.g. with the data being an object in the old version and an array of the same shape of object in the new version), it could be possible to do a 2-stage migration with:

  • release 1: the application recognize the 2 shapes of the data (array or object), but always write in the old format (no migration is added)
  • release 2: the application write in the new format, and the migration to convert the existing data is introduced

However, looking at how you're transforming your data in the migration you added, I don't think that's an option as they can't share the same mappings (correct?), so I feel like option 1 and the flattened mapping approach would be more realistic.

3. Use a new field for the new format and synchronize the 2 fields

This is the workflow I described previously in this thread, that uses 3 deployment stages in total:

  • release 1: add the new field and keep the two in sync (manually), but the app/business layer still utilize the old one.
  • release 2: switch to utilizing the new field but keep the field sync (for rollbacks)
  • release 3: cut the field sync

This is, in theory, quite straightforward, however the fact that we lack the tooling at the migration and release level makes it harder than it should today:

  • the field synchronization must be done in the application layer, given we don't provide write hooks on the SO APIs right now, which if I trust the first sentence from your comment here, "Unfortunately there is no central place where all updates/create calls go through", will not make things easy. However atm, this would be the only way to do so.

  • synchronization with releases has to be done manually, meaning that it would require three PRs in total, and them being merged at the right time to be shipped in three different serverless releases.

In summary

  • If you can avoid changing the shape of your data right now, we would encourage doing so, as we lack the tooling and documentation to let teams easily implement that today.

    • If that would be acceptable to you but you'd still like to do it when the tooling is ready, please tell us, as it would help prioritizing this enhancement
  • If you need to do this right now, then I think you will have to follow option 3 and do an multi-stage migration for the data. In that case, we can provide support and guidance to help you implementing it today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not doing data mutation only helps us so much, because on a separate PR I'm adding another property to this array that should be searchable as well.

Also, I tried the flattened thing as one of the first options, and it seems like Elasticsearch doesn't let you do this:

Click me
PUT mytestindex
{
  "mappings": {
    "properties": {
      "obj": {
        "dynamic": false,
        "properties": {}
      }
    }
  }
}

PUT mytestindex/_mapping
{
  "properties": {
    "obj": {
      "type": "flattened"
    }
  }
}

results in

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "can't merge a non object mapping [obj] with an object mapping"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "can't merge a non object mapping [obj] with an object mapping"
  },
  "status": 400
}

The multi-stage thing seems like the right approach, however I would like to get @kpollich or @juliaElastic s input here as this is their area of the code.

Considering all of this (especially #180618 ), how do you think we should proceed here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the first approach doesn't work, going with the third one makes sense, as that's the only one that works with ZDT.

@flash1293
Copy link
Contributor Author

I started a new series of stacked PRs for this: #180684

@flash1293 flash1293 closed this Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants