Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge mappings API #49026

Open
dimitris-athanasiou opened this issue Nov 13, 2019 · 10 comments
Open

Merge mappings API #49026

dimitris-athanasiou opened this issue Nov 13, 2019 · 10 comments
Labels
>feature :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team team-discuss

Comments

@dimitris-athanasiou
Copy link
Contributor

Describe the feature:

Using the reindex API to copy data from multiple indices into a new index is a great tool and is probably used for many different use cases. We use it in ML too in order to create a static copy of the source indices of data frame analytics, where we can then perform the analysis assuming no data is changing and without affecting production indices.

However, every single use case like this poses the following problem: what should the mappings of the new index be?

At the moment reindex API pushes this responsibility to the user. This has an important benefit: it gives the flexibility to the user to explicitly specify mappings in order to have different mappings than the ones in the reindexed indices. This is one of reindex's main use cases.

However, for those use cases where a copy is intended, having to cope with merging mappings across multiple indices is a hard task to push to users that do not know the inner workings of mappings. I would expect many users out there have written their own way to merge mappings and each probably has edge cases waiting to cause problems. ML certainly has a mappings merging attempt.

I propose there is benefit in adding an API that attempts to merge mappings over some target indices. An optimistic API would be good enough: merge mappings as long as they are exactly the same over target indices. Fields that exist in some of the indices would also be included (as long as there are no conflicts).

The response should return the mappings in a format that can be easily used in a create index request.

@dimitris-athanasiou dimitris-athanasiou added >enhancement discuss :Search/Mapping Index mappings, including merging and defining field types :Distributed/Reindex Issues relating to reindex that are not caused by issues further down labels Nov 13, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Reindex)

@ywelsch
Copy link
Contributor

ywelsch commented Nov 29, 2019

I wonder if this deserves a separate API. If you just want the merged mappings of all source indices, can't you get the mappings for each source index, and do a put mapping call on the target index for each of those source mappings (we already merge incoming mappings with existing mappings when doing the put mappings call). If there is a conflict, you will get an exception message saying so. I don't think this is a hard task and you don't need to know the inner workings of mappings for this.

@dimitris-athanasiou
Copy link
Contributor Author

I hadn't thought of this way of doing it. One of the disadvantages though is you have to create the index first. Which means in case of a conflict you're left with an index to clean up. It is also a lot of calls potentially for an index pattern that matches many indices.

Do we know why reindex API does not use this way to auto-create the destination index?

@rjernst rjernst added Team:Distributed Meta label for distributed team Team:Search Meta label for search team labels May 4, 2020
@ywelsch
Copy link
Contributor

ywelsch commented Aug 12, 2020

We discussed this in today's distributed sync. We think that such an API (to determine the best mapping of a target index that is the copy of one or more source indices) is a generally useful thing, not only for the reindex API that ES offers, but for any kind of reindex flow (or possibly a future multi-index shrink API). We can see reindex making use of this functionality / API, but don't think this is something that should be limited to that API. I'm therefore removing the reindex label, and letting the search team decide how to move forward on this one.

@ywelsch ywelsch removed the :Distributed/Reindex Issues relating to reindex that are not caused by issues further down label Aug 12, 2020
@elasticmachine elasticmachine removed the Team:Distributed Meta label for distributed team label Aug 12, 2020
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna
Copy link
Member

javanna commented Aug 17, 2022

It has been a while since this issue was opened, and no progress was made on it so far. I do still see how a merge mappings API may be useful, but I find it hard to weigh it against many other tasks that the team has on its plate. I do see it as a potential tech-debt item too, especially if consumers have code that deals with mappings merges. Would it be possible to have more concrete pointers on what are the problems that this API solves, and what type of code is currently left to consumers?

@dimitris-athanasiou
Copy link
Contributor Author

The problem this API would solve is this:

  • User has a number of indices following an index pattern (using ILM, etc.).
  • As time went by, fields were added. As a result, those indices have different mappings.
  • User wants to reindex those indices into a new index.
  • User wants to preserve existing mappings.
  • The user has to manually inspect the mappings of each index and compose the merged mappings.

Additionally, there may be conflicts on some fields. The user could provide overrides to resolve the conflicts.
For example, calling the merge_mappings API fails informing the user that field my_field is mapped as integer in some indices and float in some others. Perhaps the user can call the API again now providing something like {..., "overrides": { "my_field": "float"}.

This is an example of an optimistic attempt to merge mappings for a reindex: https://github.com/elastic/elasticsearch/blob/be7c7415627377a1b795400fb8dfcc6cbdf0e322/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/MappingsMerger.java

Alternatively, if not for a single stand-alone API, perhaps this could become functionality of the _reindex API. It is just that as a stand-alone API it could be possibly reused elsewhere.

@javanna
Copy link
Member

javanna commented Aug 18, 2022

Thanks for the additional info, this makes sense to me. I was wondering: could field_caps help identifying conflicts across different indices? Wouldn't it give a quicker view of the different fields in the different indices, without going through all mappings, trying to merge them and fix conflicts?

@dimitris-athanasiou
Copy link
Contributor Author

The point is not just to detect conflicts but to obtain the merged mappings so an index can be created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team team-discuss
Projects
None yet
Development

No branches or pull requests

8 participants