Heap usage reduction in Elasticsearch #31479

aesgithub · 2018-06-20T19:42:57Z

This is a proposal to reduce heap allocations in Elasticsearch code by object reuse. Elasticsearch allocates a number of per-document heap objects, such as Field (and derived Lucene) heap objects for metadata and data fields during indexing. We implement object reuse for Field and ParseContext objects across documents during bulk indexing. The changes improve ES heap allocation rate by 30%, heap garbage by 30% and promotion rate by 25% during indexing, while keeping the indexing rate. Max. GC pause time drops 98% from 13s to 0.3s; and the API tail latencies drop significantly: by about 60% at 100th, and by 50% at 99.9th percentiles. All benchmarks were done using Rally's nyc_taxis dataset, against an i3.16xl single node cluster with 128GB heap and parallel GC (we see similar improvements with CMS; the code does not degrade throughput on small instance types).

The patch URL: https://gist.github.com/aesgithub/cc5b54fc3cf5a3a13f1f5ad3139dfd00
The patch is on top of commit hash 7376c35 (dated Jun 1, 2018). It uses multiple maps to cache Field objects. We did not implement cache eviction policies for the prototype (the indexing, however, is fully functional). The caching is also optimized for NUMA instances (i.e., caches local to a thread), since we found that on multi-socket instance types NUMA contention limits indexing rates.

We want to start this as a discussion on improving some of the heap allocations in Elasticsearch. Making this patch production-ready will take some investment and we want to ensure that it is done after we get feedback.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-06-21T11:23:58Z

Pinging @elastic/es-search-aggs

colings86 · 2018-06-21T11:24:09Z

@jpountz could you take a look at this one?

aesgithub · 2018-07-13T22:02:35Z

pinging back.. any thoughts on the proposal?

redlus · 2018-07-15T13:20:33Z

One of the greatest limitations of elasticsearch in terms of data/node is the 1/48 to 1/96 ratio of the assigned heap to indices size. This optimization may drive major improvements in this respect, and we'd really like to hear what the elastic team has to say.

jpountz · 2018-07-16T08:05:28Z

Sorry for the lack of response, I had missed this issue. I agree we should look into reuse of Field instances, which is recommended by Lucene. I see you were careful with multi-valued fields, which are the main source of complexity here since it is fine to reuse field instances across documents, but not within the same document. I also liked that you didn't try to optimize fields for which this is less likely to help like binary fields or percolator fields, which are less common than keyword and numerics.

This patch improves memory reuse but also adds a bit of CPU overhead because of the hashmap lookup for the cached field. Hashmap lookups are fast, but this parsing code is called in very tight loops. I'd be curious to know whether we can get rid of this lookup somehow to make this change more likely to be a net win for every user.

aesgithub · 2018-07-18T17:28:57Z

Thanks for the support! Adrian, you are right that there is an expected tradeoff with CPU here, since managed allocations are relatively lower overhead. Based on our benchmarks, even with the beefy i3 instance types on AWS, we have not seen any increase in CPU utilization. On the other hand, cutting down young gen allocations (and allocation rate) significantly did cut down promotions (re: pauses) and the associated GC cycles spent.

We initially started with a global map across all ingestion threads. This turned out to have two limitations: heavy data structure operation throughput, and NUMA contention on multi-socket machines (both to operate on the map and GC-related stall cycles). We moved to thread-local maps, which improved the problems (with O(threads) heap instead of O(docs) heap). The code has a few more optimizations around caching. The number of map operations remains proportional to fields parsed, but the CPU cycles used is insignificant relative to the cycles used in bulk ingestion. Given that the field objects depend on the field names, I'm not sure if we could eliminate a lookup operation. Open to ideas!

aesgithub · 2018-08-22T00:44:31Z

Hi Adrian, do you see a path forward? Any PRs that are relevant to the patch?

jpountz · 2018-08-22T14:09:00Z

I have some ideas but I don't like them much due to the complexity that they would introduce... Looking at your patch again, I'm wondering that a simple improvement would be to stop caching on the field name? Said otherwise have a cache of an arbitrary number of SortedNumericDocValuesField, LongPoint, etc. and don't enforce that a given field name always reuses the same instance? This is not going to work with fields that need a configured FieldType but might be good enough?

aesgithub · 2018-08-31T00:25:10Z

Hi Adrian, sorry for the delay. Any reduction in O(documents * fields) heap allocations is better than current! I'm not sure I understand your point yet. Given that field objects have to be created with the name of the field (which depends on the current data), how would we create the array of field objects as the cache?

muralikpbhat · 2018-12-17T06:21:00Z

Hi Adrain, if we re-use a field independent of the field name but type, won't it contradict with the point that you had in the previous comment "it is fine to reuse field instances across documents, but not within the same document" ?

atris · 2019-01-24T09:10:53Z

@jpountz @muralikpbhat One thing that can be done is create a whitelist (maybe just a bit vector with a bit corresponding to a cache) and have a document use all caches except its own. Or maybe just tag cached objects with their corresponding documents in the cache itself. So you look at all Field objects except the ones in your own document when searching the cache for reusing an instance.

Thoughts?

jpountz · 2019-01-24T10:27:20Z

Given that field objects have to be created with the name of the field (which depends on the current data), how would we create the array of field objects as the cache?

Oops good catch, this doesn't work.

if we re-use a field independent of the field name but type, won't it contradict with the point that you had in the previous comment "it is fine to reuse field instances across documents, but not within the same document" ?

I was thinking of only doing it for the first value of a field.

have a document use all caches except its own

Sorry I don't get the idea.

So you look at all Field objects except the ones in your own document when searching the cache for reusing an instance.

One thing that I want to avoid it having to search a cache. Indexing a field doesn't use much CPU. For instance when Lucene processes a LongPoint, it mostly appends the value to the end of a buffer. Saving an object allocation by introducing a hash lookup introduces complexity and doesn't sound like a net win performance-wise to me.

One approach I had in mind was to replace FieldMapper#parse with something like FieldMapper#getPerThreadParser which would return an object that would be responsible for parsing but may only be called from a single thread. This object could cache some state locally to reuse fields. Then we would have to propagate it up to DocumentParser which would maintain ThreadLocal references to trees of per-thread parsers. This way we would still have only one threadlocal object, but no hash lookups anymore. The thing that I would watch if we explored that route would be the complexity that it introduces.

atris · 2019-01-25T06:37:16Z

Couple of more points:

The patch lacks cache eviction
I am not sure if we should be creating a cache class for each type. It looks a bit weird and hard to maintain.

I believe @jpountz 's idea of creating a fieldname independent cache should be a reasonable way of achieving a sizable improvement, without too much of an intrusive change.

aesgithub · 2019-01-25T07:32:41Z

@jpountz we have tried the ThreadLocal idea and propagating that as far up in the call stack as possible. It's been a while, so I don't remember the implementation complexity upfront, but there was a performance hit (~10-20% drop in bulk throughput, iirc) that we incurred due to ThreadLocal (especially under multi-socket NUMA contention).

elasticmachine · 2019-06-03T21:49:48Z

Pinging @elastic/es-core-infra

itiyamas · 2019-11-16T08:29:49Z

One approach I had in mind was to replace FieldMapper#parse with something like FieldMapper#getPerThreadParser which would return an object that would be responsible for parsing but may only be called from a single thread. This object could cache some state locally to reuse fields. Then we would have to propagate it up to DocumentParser which would maintain ThreadLocal references to trees of per-thread parsers. This way we would still have only one threadlocal object, but no hash lookups anymore. The thing that I would watch if we explored that route would be the complexity that it introduces.

@jpountz Isn't what you suggested being done by the current PR itself(it can be refactored to look like your suggestion)? The object you are referring to is FieldObjectCache- which is a per thread object and maintains field level caches. I did not get the part that no hash lookups are needed if we maintain per thread parsers? How would the cache state be accessed for any field in the same thread- we need a lookup anyway, unless we know upfront that the mappings are fixed. Field objects are created by name, which is an immutable field in Field object. All you can change are the values. Am I missing anything here?

colings86 added the :Search/Mapping Index mappings, including merging and defining field types label Jun 21, 2018

colings86 assigned jpountz Jun 21, 2018

jtibshirani added >enhancement high hanging fruit :Core/Infra/Core Core issues without another label labels Jun 3, 2019

rjernst added Team:Core/Infra Meta label for core/infra team Team:Search Meta label for search team labels May 4, 2020

rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020

gwbrown removed the needs:triage Requires assignment of a team area label label Dec 16, 2020

itiyama mentioned this issue Sep 20, 2022

Heap usage reduction in Opensearch opensearch-project/OpenSearch#4560

Open

original-brownbear mentioned this issue Sep 8, 2023

Deduplicate org.apache.lucene.document.FieldType instances across mappers #99361

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heap usage reduction in Elasticsearch #31479

Heap usage reduction in Elasticsearch #31479

aesgithub commented Jun 20, 2018

elasticmachine commented Jun 21, 2018

colings86 commented Jun 21, 2018

aesgithub commented Jul 13, 2018

redlus commented Jul 15, 2018

jpountz commented Jul 16, 2018

aesgithub commented Jul 18, 2018

aesgithub commented Aug 22, 2018

jpountz commented Aug 22, 2018

aesgithub commented Aug 31, 2018

muralikpbhat commented Dec 17, 2018

atris commented Jan 24, 2019 •

edited

jpountz commented Jan 24, 2019

atris commented Jan 25, 2019

aesgithub commented Jan 25, 2019

elasticmachine commented Jun 3, 2019

itiyamas commented Nov 16, 2019

Heap usage reduction in Elasticsearch #31479

Heap usage reduction in Elasticsearch #31479

Comments

aesgithub commented Jun 20, 2018

elasticmachine commented Jun 21, 2018

colings86 commented Jun 21, 2018

aesgithub commented Jul 13, 2018

redlus commented Jul 15, 2018

jpountz commented Jul 16, 2018

aesgithub commented Jul 18, 2018

aesgithub commented Aug 22, 2018

jpountz commented Aug 22, 2018

aesgithub commented Aug 31, 2018

muralikpbhat commented Dec 17, 2018

atris commented Jan 24, 2019 • edited

jpountz commented Jan 24, 2019

atris commented Jan 25, 2019

aesgithub commented Jan 25, 2019

elasticmachine commented Jun 3, 2019

itiyamas commented Nov 16, 2019

atris commented Jan 24, 2019 •

edited