Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Reuse Lucene's TermsEnum for faster _uid/version lookup during indexing #6298
The TermsEnums used for lookup have highish cost to init, so if we
I ran a bulk indexing perf test comparing master vs this patch, updating 10M small log-entry type docs ("index" command, passing _id), with random UUIDs (%10d, worst case for terms dict), using MMapDir, and the results look promising: reusing the TermsEnum gets back much of the performance that bloom filters buy us today.
However, this is a small test (10M docs), the index was fully hot...