New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up shortcutTotalHitCount
using the new Weight#count
API
#81034
Comments
Pinging @elastic/es-search (Team:Search) |
I take |
The PR in Lucene that add Weight#count : apache/lucene#242 |
@OlivierCavadenti you may be interested in #81322 - some of the plumbing I had to to there is the same as you'd have to do for this. |
Thanks ! I think I am near something but I have two tests that fail (testCountWithoutDeletions => "should not collect more than 0 doc per segment, got 1", it seems we have a special case with hasDeletions == false ? ) and testNumericSortOptimization (java.lang.AssertionError: I will investigate this week-end. |
Our TopDocsCollectorContext has an optimization to try and avoid counting total hit count for queries like match all docs, term query and field exists query, relying on the statistics from each segment instead. This optimization has been recently streamlined in lucene through the introduction of Weight#count and now leveraged directly by TotalHitCountCollector in lucene with https://issues.apache.org/jira/browse/LUCENE-10620 , later complemented by elastic#88396 within Elasticsearch. With this, we can remove this internal optimization and instead leverage the default lucene behaviour which covers more queries and will be possibly expanded in the future as well. Closes elastic#81034
Our TopDocsCollectorContext has an optimization to try and avoid counting total hit count for queries like match all docs, term query and field exists query, relying on the statistics from each segment instead. This optimization has been recently streamlined in lucene through the introduction of Weight#count and now leveraged directly by TotalHitCountCollector in lucene with https://issues.apache.org/jira/browse/LUCENE-10620 , later complemented by #88396 within Elasticsearch. With this, we can remove this internal optimization and instead leverage the default lucene behaviour which covers more queries and will be possibly expanded in the future as well. Closes #81034
I have reverted #89047, which means that we still have our own shortcutTotalHitCount which is independent from |
Elasticsearch has a special
shortcutTotalHitCount
hack inTopDocsCollectorContext.java
that it uses to avoid collecting all matches only to get the hit count when the hit count can otherwise be inferred from index statistics.Lucene introduced a new API that does exactly that and that should support more queries in the near future. We should cut over usage of
shortcutTotalHitCount
toWeight#count
instead.The text was updated successfully, but these errors were encountered: