Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Disk-based field data #3806
Lucene 4.0 introduced doc values, which are very similar to field data except that they are computed and persisted to disk (per segment) at index time. At search time, these data-structures are either loaded into memory or directly read from disk depending on the doc values format. Starting with Lucene 4.5, the default is to load the small data-structures that matter for performance into memory (ordinals) and to keep the large data-structures on disk (values). It would be interesting to have a new field data implementation that would be backed by doc values.
Integration into Elasticsearch would allow for having disk-based field data and for configuring smaller heaps, which would be less subject to garbage collection issues. On the other hand, this will require additional disk space and since doc values are disk-based by default, they will probably be slower for field-data-intensive workloads.