Join GitHub today
Field data should support more than 2B ordinals per segment #3189
I opened a pull request (#3306) which tries to fix this issue. In addition to that, it has some nice memory improvements. Here is the memory usage reported by
Before After SINGLE_VALUED_DENSE_ENUM 488.3 KB 488.3 KB SINGLE_VALUED_DENSE_DATE 4.3 MB 4.3 MB MULTI_VALUED_DATE 10.5 MB 5.9 MB MULTI_VALUED_ENUM 7.8 MB 1.2 MB SINGLE_VALUED_SPARSE_RANDOM 3.5 MB 1.5 MB MULTI_VALUED_SPARSE_RANDOM 7.7 MB 3.4 MB MULTI_VALUED_DENSE_RANDOM 23.7 MB 17.8 MB
Nothing changes for the single-valued case (as expected) but there are some nice savings for the multi-valued case, especially when the values don't require much space.
I also ran
Before: name took millis terms_s 6.1s 30 terms_map_s 20.7s 103 terms_l 13.8s 69 terms_map_l 14s 70 terms_sm 22s 110 terms_map_sm 3.3m 1009 terms_lm 1.3m 391 terms_map_lm 1.3m 390 terms_stats_s_l 31.9s 159 terms_stats_s_lm 1m 322 terms_stats_sm_l 4.3m 1319 After: terms_s 5.4s 27 terms_map_s 20.7s 103 terms_l 12.7s 63 terms_map_l 12.7s 63 terms_sm 40.1s 200 terms_map_sm 3.3m 1015 terms_lm 1.6m 486 terms_map_lm 1.6m 486 terms_stats_s_l 28.8s 144 terms_stats_s_lm 1.3m 415 terms_stats_sm_l 4.3m 1300
In some cases, faceting is slower. I ran the benchmark under a profiler and MonotonicAppendingLongBuffer and AppendingLongBuffer, which are used to store the ordinals, were among the most hot spots. Since they are also the reason why we have these memory savings, maybe it is not that bad?