Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose ability to add/modify a mapped field at the data streams level #72142

Open
javanna opened this issue Apr 23, 2021 · 9 comments
Open

Expose ability to add/modify a mapped field at the data streams level #72142

javanna opened this issue Apr 23, 2021 · 9 comments
Labels
:Data Management/Data streams Data streams and their lifecycles >enhancement Team:Data Management Meta label for data/management team

Comments

@javanna
Copy link
Member

javanna commented Apr 23, 2021

Now that all the primitives are in place to create a runtime field, modify/remove it, and make it indexed, we can expose some high-level API at the data streams level that allows users to add a new field, or modify an existing one.

The idea would be to add the new (possibly indexed) field at the next rollover to newer indices, while adding it as a runtime field to existing indices.

As for modifying an existing field, this can be done by overlaying the existing field with a runtime field that has the same name for existing indices, while the definition of the indexed field can be adapted at the next rollover.

The API should also expose the ability to make a runtime field indexed, or to un-index a field by making it a runtime field: both will take effect at the next rollover.

@javanna javanna added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Apr 23, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Apr 23, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@jimczi jimczi added the :Data Management/Data streams Data streams and their lifecycles label Apr 23, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Apr 23, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@jethr0null
Copy link

@javanna @giladgal is there a formal design doc for this phase/work which outlines the specific end-to-end use cases and user workflows that this functionality would support? There are so many moving parts here that I think it's critical we have a clear idea of how the overarching UX fits together (e.g. where does the user initially define the field in each scenario, what is it attached to (e.g. the index pattern, the index ), etc.). We chatted about this in our core features weekly sync and agree that this is important work that we're eager to support but that more information is needed to make sure we're creating the best possible user experience.

If you could either share any existing documentation that captures this information or create and share a new doc containing this information with the core features and UI teams we'd love a chance to discuss/comment so that we can explore the best option for achieving the desired UX.

cc @dakrone @cjcenizal @jakelandis

@javanna
Copy link
Member Author

javanna commented May 28, 2021

This task has not started yet, there are indeed many considerations to make as part of the design phase. From a user perspective I believe that @elastic-jb has been working on identifying end-to-end use cases and user workflows.

@jpountz
Copy link
Contributor

jpountz commented Jul 1, 2021

The RAC team has the need to add new fields to a data stream / write alias without waiting until the next rollover so that it can start sending documents with the new fields right away. Updating the template and then forcing a rollover unfortunately isn't great in this case because we could be talking about tens/hundreds of data streams that would possibly create a tiny index due to this forced rollover. Currently they are working around this problem by first updating the template and then mappings of the current write index, but it would be nicer if Elasticsearch didn't require messing up with backing indices. cc @jasonrhodes @marshallmain

@graphaelli
Copy link
Member

Found this from #74178 and related to @dakrone's comment:

"change the template to make this new field indexed, and for the pre-existing data stream indices add the field as a dynamic runtime field"

I'd like to request consideration for data that is not present at the time field cutover is made but could be reasonably expected to work when they do appear, like searchable snapshots and indexes later restored from backups.

@ruflin
Copy link
Member

ruflin commented Jun 2, 2022

For elastic packages (integrations) we are currently discussing on how many fields we should index by default and where we should use runtime fields instead. This will lead to the cases where later on we find fields that should be able to query. On upgrading the package, we can update the template and roll over to make this happen. But there is no easy way yet to apply this also to the older indices in the data stream to make queries just work.

@javanna
Copy link
Member Author

javanna commented Aug 2, 2022

Summarizing my view on this: the put mappings API allows to hit multiple indices, which is a good fit to add a runtime field to existing indices. For indices yet to be created, we'd need to modify the index template: a data stream does require specifying an index template, but it may be composed of multiple component templates and it is not trivial to figure out which component template needs to be modified. For this reason I am going to remove the Search label here as this is currently a Data management problem.

Possibly this issue's description is too broad as it targets both adding as well as modifying an existing field, as well as making a runtime field indexed. These are tied to each other, yet each scenario has its own subtleties and differences: modifying a field requires adding a runtime field that shadows an existing indexed field, and index the corrected field in newly created indices compared to just adding a new field. Also, the choice of whether the new field should be indexed or runtime ties to making a runtime field fast (aka indexed) which is a common ask: Kibana would like to expose this functionality, but this probably deserves a separate discussion as it's unclear whether exposing it at the data streams level is enough. But even in Kibana, modifying the index template is no trivial as there is no knowledge of which index template should be modified.

@javanna javanna removed :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team labels Aug 2, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

9 participants