Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up shortcutTotalHitCount using the new Weight#count API #81034

Open
jpountz opened this issue Nov 25, 2021 · 6 comments · Fixed by #89047
Open

Clean up shortcutTotalHitCount using the new Weight#count API #81034

jpountz opened this issue Nov 25, 2021 · 6 comments · Fixed by #89047
Assignees
Labels
>refactoring :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team >tech debt

Comments

@jpountz
Copy link
Contributor

jpountz commented Nov 25, 2021

Elasticsearch has a special shortcutTotalHitCount hack in TopDocsCollectorContext.java that it uses to avoid collecting all matches only to get the hit count when the hit count can otherwise be inferred from index statistics.

Lucene introduced a new API that does exactly that and that should support more queries in the near future. We should cut over usage of shortcutTotalHitCount to Weight#count instead.

@jpountz jpountz added :Search/Search Search-related issues that do not fall into other categories >refactoring labels Nov 25, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Nov 25, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@OlivierCavadenti
Copy link
Contributor

I take

@OlivierCavadenti
Copy link
Contributor

The PR in Lucene that add Weight#count : apache/lucene#242

@nik9000
Copy link
Member

nik9000 commented Dec 10, 2021

@OlivierCavadenti you may be interested in #81322 - some of the plumbing I had to to there is the same as you'd have to do for this.

@OlivierCavadenti
Copy link
Contributor

@OlivierCavadenti you may be interested in #81322 - some of the plumbing I had to to there is the same as you'd have to do for this.

Thanks ! I think I am near something but I have two tests that fail (testCountWithoutDeletions => "should not collect more than 0 doc per segment, got 1", it seems we have a special case with hasDeletions == false ? ) and testNumericSortOptimization (java.lang.AssertionError:
Expected :GREATER_THAN_OR_EQUAL_TO
Actual : EQUAL_TO).

I will investigate this week-end.

@javanna javanna self-assigned this Mar 21, 2022
javanna added a commit to javanna/elasticsearch that referenced this issue Aug 2, 2022
Our TopDocsCollectorContext has an optimization to try and avoid counting total hit count for queries like match all docs, term query and field exists query, relying on the statistics from each segment instead. This optimization has been recently streamlined in lucene through the introduction of Weight#count and now leveraged directly by TotalHitCountCollector in lucene with https://issues.apache.org/jira/browse/LUCENE-10620 , later complemented by elastic#88396 within Elasticsearch.

With this, we can remove this internal optimization and instead leverage the default lucene behaviour which covers more queries and will be possibly expanded in the future as well.

Closes elastic#81034
javanna added a commit that referenced this issue Feb 6, 2023
Our TopDocsCollectorContext has an optimization to try and avoid counting total hit count for queries like match all docs, term query and field exists query, relying on the statistics from each segment instead. This optimization has been recently streamlined in lucene through the introduction of Weight#count and now leveraged directly by TotalHitCountCollector in lucene with https://issues.apache.org/jira/browse/LUCENE-10620 , later complemented by #88396 within Elasticsearch.

With this, we can remove this internal optimization and instead leverage the default lucene behaviour which covers more queries and will be possibly expanded in the future as well.

Closes #81034
@javanna
Copy link
Member

javanna commented Mar 29, 2023

I have reverted #89047, which means that we still have our own shortcutTotalHitCount which is independent from Weight#count in Lucene. This is something we will want to address. One way I considered was creating a throw-away Weight, but that requires duplicating the Weight which may have a cost. Ideally, the mechanism is brought to Lucene entirely, which is something that I will look into.

@javanna javanna reopened this Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>refactoring :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team >tech debt
Projects
None yet
5 participants