New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Do not cache term filters by default. #7583
Conversation
These filters are directly based on postings lists and can already iterate/advance quickly. It is still possible to opt-in for caching.
LGTM. |
LGTM too :) good one |
This change concerns me a bit, based on the numbers in #7577 (comment) A lot of the time, we use term filters on dense values like:
These common use cases are all dense and often combined in the same filter. According to the benchmarks, each of them would be 50% slower than they are today. The other concern I have is that caching term filters allows us to completely skip segments that don't contain that term. Without that caching, we're going to have to do a term lookup on potentially thousands of segments which we would previously have skipped. What impact does that have on performance? The numbers shown in #7577 (comment) for Leapfrog look promising, but the bool filter won't do that today so users will experience a drop in performance out of the box. This change (and #7577) seem like they could be very useful in the longer term, but they require more (as yet unwritten) supporting changes to provide a net out-of-the-box benefit. I'd vote for not putting them into 1.4 but delaying them until we can implement a more complete solution. (Apologies if I have misunderstood the impact of these changes) |
@clintongormley Your analysis is correct but I think it underestimates the cost of caching:
I agree that #7577 needs more work in order to be better in all cases and not only the sparse case, but I think this change is good? Maybe something that needs to be improved is the default value of the |
I think it might also be useful to cache term filters used in aliases, since creating an alias with a term filter has a high chance of being re-used (for future work, just an idea). |
We just had a discussion about this issue and could not reach to an agreement about what the best default A better solution could be to make the decision based on usage as outlined in #8449. |
http://www.elasticsearch.org/blog/2015-01-07-this-week-in-elasticsearch/ I thought that this is merged. |
Oops, that's probably because I forgot to remove the labels when closing. Thanks for the report, I'll try to get it removed from the blog post as well. |
OK I just removed it. |
These filters are directly based on postings lists and can already
iterate/advance quickly. It is still possible to opt-in for caching.