Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas around mappings improvements #64663

Open
9 of 28 tasks
javanna opened this issue Nov 5, 2020 · 4 comments
Open
9 of 28 tasks

Ideas around mappings improvements #64663

javanna opened this issue Nov 5, 2020 · 4 comments
Labels
Meta :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team >tech debt

Comments

@javanna
Copy link
Member

javanna commented Nov 5, 2020

This meta issue tracks some of the improvements and cleanups that we identified while working on the runtime fields project (#59332):

  • Remove the last usage of SearchContext#mapperService: it is used for nested objects
  • Remove QueryShardContext#getObjectMapper : it is mostly used for nested objects and it indirectly exposes retrieving MappedFieldType going through FieldMappers, which are not aware of runtime fields
  • Clean up SearchLookup handling in SearchExecutionContext: a separate lookup object needs to be created for the fetch phase which is passed through to MappedFieldFype#valueFetcher, yet the (wrong) lookup can also be retrieved directly from the SearchExecutionContext
  • Revise NestedScope on SearchExecutionContext
  • Clean up special handling for allowUnmappedFields and mapUnmappedAsText in SearchExecutionContext, they can be changed from consumers which seems dangerous and are only needed for the percolator.
  • Review null checks on what SearchExecutionContext#getFieldType returns. They most likely need to be converted to use QueryShardCOntext#isFieldMapped instead which does not take the allowUnmappedField and mapUnmappedAsString flags into account
  • Review calls to FieldsVisitor#postProcess and make sure that they are all necessary, in some cases we may be able to skip the call as it does nothing depending on which fields we are processing
  • Currently, when an index is created without providing mappings it will have a null document mapper until mappings are provided for it, or a document has indexed in it.
    MapperService exposes the DocumentMapper through which the mappings can be accessed. The fact that DocumentMapper can be null leads to null checks in many places and potential NPEs in callers. We'd like to eagerly create a document mapper for a new index, as soon as it gets created. This has the implication that we would lose the semantics around the fact that when a document mapper is null for an index, it means the index is empty.
    That would also allow us to apply index sorting when the index is created instead of delaying it to the document mapper creation.
  • We have a special case to handle the case of an empty doc being indexed, which triggers a dynamic mapping update although the mappings are empty, to ensure that whenever a doc is registered in an index, it has mappings and its corresponding document mapper will not be null then. This special case can be removed if we make sure that every index always has mappings
  • Should we move MapperRegistry to index.mapper package? (Move MapperRegistry to index.mapper package #69805)
  • Mapping is visible from outside of its package but all of its methods are called from within the same package. Review usages and possibly adjust methods visibility accordingly. Also, its instance fields are all package private and accessed directly without going through getters, yet the root object mapper has also a public getter that is only used within the same package. (Remove redundant methods from DocumentMapper #69803)
  • Make ContentPath less confusing
  • Rework ObjectMapper and RootObjectMapper to not rely on clone
  • Make MappedFieldType responsible for building their own field caps objects, and add in parent/nested information
  • Reduce the number of merge methods on MapperService, and clarify when they should be called
  • Merging does not require the entire MapperService: ideally all of the merge methods exposed by MapperService can be factored out to a lightweight MappingMerger component (merge itself was already removed from DocumentMapper and replaced by calling Mapping#merge directly). This means that it will no longer be required to build a full MapperService in the master node to execute merges when the indices are not allocated to it.
  • Rework NestedDocuments to depend on MappingLookup Use a mapping snapshot for fetching nested docs #66877 (@nik9000)
  • can we remove get field mappings API? or shall we refactor it to make some sense? see also Support include_defaults in the GET-mapping API #5368
  • Can we remove getMapper and fieldMappers from MappingLookup? Why would one have to go through mappers by name? Can we consolidate how to access fields in the mappings? What needs access to mappers and objectmappers? Can we have a unified lookup structure?
  • Do we even need a DocumentMapper given that MappingLookup became almost a copy of it? We should probably use MappingLookup whenever possible and re-evaluate what we need DocumentMapper for.\
  • Rewrite tests (e.g. MapperServiceTestCase) to rely less, or not at all, on DocumentMapper. Many tests build a MapperService only to use a small portion of it, or merge mappings, or retrieve document mapper from it.

Caching cleanups

  • Remove MapperService from SearchExecutionContext entirely. We'd prefer there not be an easy way for the context to get a new snapshot of the mapping. OTOH we don't really want to merge all of the immutable mapping stuff into MappingLookup. We have to decide how to remove it.
  • Remove SearchContext's reference to MapperService. This includes exposing NestedDocuments in SearchExecutionContext so that DefaultSearchContext takes it from there. And cleaning up access to indexService.mapperService() in DefaultSearchContext
  • Replace or remove the parsing function passed to MappingLookup, which hides a reference to DocumentMapper.
  • Pipe all volatile read methods on MapperService through #mappingLookup() to make sure we perform a single volatile read. Or we could remove them entirely in favor the caller going through mappingLookup(). If the caller keeps the mappingLookup for the duration of the operation then the caller will get a consistent snapshot of the mapping during the entire process. This can potentially become a big change but it can be done in steps, for instance we can start from duplicated methods that are exposed in MapperService and DocumentMapper (e.g. simpleMatchToFullName), that could be rather accessed through retrieving MappingLookup.
  • MapperService#getObjectMapper is only used in tests, it can probably be replaced by MappingLookup#getObjectMapper
  • Make all FieldMappers "obviously" immutable.
  • Fix caching issues around reloading analyzers (Reloading analyzers doesn't bust the request cache #66722)
@javanna javanna added :Search/Mapping Index mappings, including merging and defining field types Meta labels Nov 5, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@nik9000
Copy link
Member

nik9000 commented Dec 22, 2020

I've started to group things into themes for easier digestion. I've tried to sort the things for themes in rough priority order.

@javanna
Copy link
Member Author

javanna commented Dec 23, 2020

Thanks @nik9000 I plan to go through the list and re-sort things, possibly opening separate issues where we see fit.

javanna added a commit to javanna/elasticsearch that referenced this issue Jan 27, 2021
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna javanna changed the title Search layer / mappings improvements Ideas around mappings improvements Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team >tech debt
Projects
None yet
Development

No branches or pull requests

4 participants