Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing: Versions.loadDocIdAndVersion should reuse TermsEnums #6212

Closed
mikemccand opened this issue May 16, 2014 · 0 comments

Comments

Projects
None yet
2 participants
@mikemccand
Copy link
Contributor

commented May 16, 2014

Today, every version lookup creates a new TermsEnum for each segment in the index, but this is quite costly, e.g. on NIOFSDir it must clone the IO buffer, and because BlockTreeTermsReader has a lot of internal state.

We'd need a ThreadLocal somewhere/somehow... I have a start at a utility class here: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene5675/lucene/test-framework/src/java/org/apache/lucene/index/PerThreadPKLookup.java maybe we can adapt/use this.

@mikemccand mikemccand self-assigned this May 20, 2014

mikemccand added a commit to mikemccand/elasticsearch that referenced this issue May 23, 2014

Core: reuse Lucene's TermsEnum for faster _uid/version lookup during …
…indexing

The TermsEnums used for lookup have highish cost to init, so if we
reuse them we may be able to stop using bloom filters.  I ran some bulk
update performance tests, showing that turning off blooms and reusing
the enums gets close to the same performance as master (using blooms
and not reusing the enums).

Closes elastic#6212

mikemccand added a commit that referenced this issue May 31, 2014

Core: reuse Lucene's TermsEnum for faster _uid/version lookup during
Reusing Lucene's TermsEnum for _uid/version lookups gives a small
indexing (updates) speedup and brings us a closer to not having
to spend RAM on bloom filters.

Closes #6212

@jpountz jpountz changed the title Versions.loadDocIdAndVersion should reuse TermsEnums Versioning: Versions.loadDocIdAndVersion should reuse TermsEnums Jun 19, 2014

@clintongormley clintongormley changed the title Versioning: Versions.loadDocIdAndVersion should reuse TermsEnums Indexing: Versions.loadDocIdAndVersion should reuse TermsEnums Jul 16, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.