Indexing: Versions.loadDocIdAndVersion should reuse TermsEnums #6212

mikemccand · 2014-05-16T16:36:36Z

Today, every version lookup creates a new TermsEnum for each segment in the index, but this is quite costly, e.g. on NIOFSDir it must clone the IO buffer, and because BlockTreeTermsReader has a lot of internal state.

We'd need a ThreadLocal somewhere/somehow... I have a start at a utility class here: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene5675/lucene/test-framework/src/java/org/apache/lucene/index/PerThreadPKLookup.java maybe we can adapt/use this.

…indexing The TermsEnums used for lookup have highish cost to init, so if we reuse them we may be able to stop using bloom filters. I ran some bulk update performance tests, showing that turning off blooms and reusing the enums gets close to the same performance as master (using blooms and not reusing the enums). Closes elastic#6212

Reusing Lucene's TermsEnum for _uid/version lookups gives a small indexing (updates) speedup and brings us a closer to not having to spend RAM on bloom filters. Closes #6212

mikemccand added enhancement labels May 16, 2014

mikemccand self-assigned this May 20, 2014

mikemccand mentioned this issue May 23, 2014

Reuse Lucene's TermsEnum for faster _uid/version lookup during indexing #6298

Closed

mikemccand closed this as completed in 7552b69 May 31, 2014

jpountz added the release-notes label Jun 19, 2014

jpountz changed the title ~~Versions.loadDocIdAndVersion should reuse TermsEnums~~ Versioning: Versions.loadDocIdAndVersion should reuse TermsEnums Jun 19, 2014

clintongormley changed the title ~~Versioning: Versions.loadDocIdAndVersion should reuse TermsEnums~~ Indexing: Versions.loadDocIdAndVersion should reuse TermsEnums Jul 16, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing: Versions.loadDocIdAndVersion should reuse TermsEnums #6212

Indexing: Versions.loadDocIdAndVersion should reuse TermsEnums #6212

mikemccand commented May 16, 2014

Indexing: Versions.loadDocIdAndVersion should reuse TermsEnums #6212

Indexing: Versions.loadDocIdAndVersion should reuse TermsEnums #6212

Comments

mikemccand commented May 16, 2014