Duplicate mappers consume too much heap memory #86440

original-brownbear · 2022-05-04T19:01:45Z

Even with recent fixes to the heap consumption of field mappers, large frozen nodes require a lot of heap memory for duplicate mappers. An example from benchmarking today's master branch shows 67 different mappings taking up only ~100kb compressed in the cluster state on a data node holding ~16k indices/shards.
On the other hand the actual MappingLookup instances (and thus the mapper service instances) consume 6.8G of heap.

(not that these are all beats mappings, so lots of fields and many keywords but I think it's a representative use case)

In fact, in the many shards benchmark that this is extracted from (that has this data node run indexing and searching at the time the dump was taken), the MappingLookup instances are the single largest consumer of heap.
It seems to me there's no technical reason why we should not be able to massively deduplicate objects here when all the mappings are the same. We already do a lot of deduplication via tricks like #86301 that helped bring down the consumption a lot recently but it seems we should go a little further here and tackle this at a higher level given the numbers?

Marking/suggesting this as ">bug" since it diminishes the scalability of frozen/warm/cold nodes greatly and should be somewhat straigh-forward to fix hopefully that classification is obviously debatable.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-05-04T19:01:49Z

Pinging @elastic/es-search (Team:Search)

joegallo · 2022-05-19T13:35:12Z

Related to #77466

ywelsch · 2022-07-19T12:15:32Z

I've looked at bit deeper at the memory usage here. MappingLookup is indeed holding onto the majority of memory when having lots of indices on a frozen node. I've analyzed where and why so much memory is being spent there. What plays a factor here is:

the amount of memory taken by MappedFieldType and Mapper objects. For each field that's defined in the mapping, there is a corresponding MappedFieldType and Mapper object. The memory of these depend on the respective implementation, e.g. NumberFieldMapper/NumberFieldType take less memory than the TextFieldMapper/TextFieldType implementations. Typically the space used by these implementations ranges from 10 to 40 bytes per field definition.
the amount of memory taken by the lookup structures in MappingLookup. The MappingLookup class is comprised of a couple of lookup structures that allows looking up fields by their fully expanded path name.
- One such structure is the FieldTypeLookup that internal uses an immutable map to do lookups (ImmutableCollections$MapN). This map implementation, which is internally based on an array, uses 16N bytes storage for N fields. It stores keys and values as pointers (2 * 4 bytes for each field) in the array and has an expand_factor of 2 (i.e. half of the entries in the internal array are empty).
- MappingLookup has more such lookup structures (e.g. fieldMappers and indexAnalyzersMap), which also retain 16N bytes each.
similarly, the amount of memory taken by the lookup structures in ObjectMapper. During indexing, ObjectMapper is used to look up the mapper definitions of subfields within an object field. Again, this map implementation retains 16N bytes.
In total, even in the most basic setting, the amount of memory used by the lookup structures is already 80N bytes. This means that a node with 1000 indices and 1000 fields each holds onto 80MB worth of lookup structures, not counting the MappedFieldType and Mapper objects themselves. Most common MappedFieldType and Mapper definitions actually consume less memory than the lookup structures (10-40N compared to 80N).

What can we do to reduce the amount of memory taken by the lookup structures as well as the MappedFieldType and Mapper objects?

Deduplication at the MappingLookup level (i.e. only having a single such definition for two similar indices) is quite tricky, as it requires indices to be exactly the same (including all internal properties of an index, e.g. also settings such as index created version, as these also determine the properties of the MappedFieldType and Mapper instances). Note that this also leaves very little room for deduplication, as even the smallest difference between two indices wouldn't allow any reuse.
Deduplication at the MappedFieldType and Mapper instance level is one possible path, but requires a proper equals implementation on these. This by itself only gets us so far, however, as the majority of memory is still used by the lookup structures.
Reduce the amount of memory used by the lookup structures by not having multiple maps with similar keys for looking up field-related information. We probably need at least one map for looking up fully expanded keys, and one map for looking up field definitions during indexing, so we could reduce the overhead here from 80N bytes to 32N bytes.
We can alternately use a different type of lookup structure (rather than hashmaps), e.g. something similar to Lucene's FST that can even be stored off-heap. These might be slightly less performant than hashmaps, but that's ok when doing field resolution during search, perhaps not so much during indexation. As we're targeting frozen indices here, where indexation isn't happening anyway, we could live with this limitation.

romseygeek · 2022-07-19T12:32:41Z

As we're targeting frozen indices here, where indexation isn't happening anyway, we could live with this limitation.

Would another option for frozen indices be just not to store the FieldMappers? From a quick look we would have to rework a couple of things around vector codecs and get, and the GetFieldMappings action wouldn't work, but these might be acceptable trade-offs.

original-brownbear added >bug :Search Foundations/Mapping Index mappings, including merging and defining field types labels May 4, 2022

elasticmachine added the Team:Search Meta label for search team label May 4, 2022

javanna mentioned this issue Jul 8, 2022

Include runtime fields in total fields count #88265

Closed

javanna assigned romseygeek and ywelsch Jul 8, 2022

javanna mentioned this issue Jul 11, 2022

Optionally use ECS conventions for dynamic mappings #85692

Closed

ywelsch mentioned this issue Jul 12, 2022

Extract field name from MappedFieldType #88475

Closed

DaveCTurner mentioned this issue Sep 5, 2022

Report stats related to new sizing guidance #86639

Closed

3 tasks

javanna mentioned this issue Sep 8, 2022

Make total fields limit less of a nuisance to users #89911

Open

javanna unassigned romseygeek and ywelsch Oct 13, 2022

javanna added the priority:normal A label for assessing bug priority to be used by ES engineers label Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate mappers consume too much heap memory #86440

Duplicate mappers consume too much heap memory #86440

original-brownbear commented May 4, 2022 •

edited

Loading

elasticmachine commented May 4, 2022

joegallo commented May 19, 2022

ywelsch commented Jul 19, 2022

romseygeek commented Jul 19, 2022

Duplicate mappers consume too much heap memory #86440

Duplicate mappers consume too much heap memory #86440

Comments

original-brownbear commented May 4, 2022 • edited Loading

elasticmachine commented May 4, 2022

joegallo commented May 19, 2022

ywelsch commented Jul 19, 2022

romseygeek commented Jul 19, 2022

original-brownbear commented May 4, 2022 •

edited

Loading