-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug with distinct attributes and nbHits #2532
Comments
Hello @bsurai thanks a lot for your repo explaining the issue! |
Hey @bsurai, Thank you for the time you took to set up a reproducible environment, I was able to reproduce the behavior on my side. I have done an explanation of what is happening and the reason why it is tricky to fix without impacting the performance. The PR I linked improved a little bit the situation by prioritizing a part of the code that was responsible for the final I hope it helps, |
563: Improve the `estimatedNbHits` when a `distinctAttribute` is specified r=irevoire a=Kerollmops This PR is related to meilisearch/meilisearch#2532 but it doesn't fix it entirely. It improves it by computing the excluded documents (the ones with an already-seen distinct value) before stopping the loop, I think it was a mistake and should always have been this way. The reason it doesn't fix the issue is that Meilisearch is lazy, just to be sure not to compute too many things and answer by taking too much time. When we deduplicate the documents by their distinct value we must do it along the water, everytime we see a new document we check that its distinct value of it doesn't collide with an already returned document. The reason we can see the correct result when enough documents are fetched is that we were lucky to see all of the different distinct values possible in the dataset and all of the deduplication was done, no document can be returned. If we wanted to implement that to have a correct `extimatedNbHits` every time we should have done a pass on the whole set of possible distinct values for the distinct attribute and do a big intersection, this could cost a lot of CPU cycles. Co-authored-by: Kerollmops <clement@meilisearch.com>
@Kerollmops Thank you for efforts. Creation 2 types of indexes. One for detailed items and one for unique items.
Cons:
|
Closed by #2546, will be present in rc1! 🚀 |
We are having exactly the same issue. Checking it in v0.30.1 and the issue still persists (for both Have the fix already been added in v0.30.1 release? |
Hello @freescout-helpdesk, I see the issue you are talking about is this one. We will let you know in the open issue, this one is an old and irrelevant one anymore. See you there |
This is my GitHub repository with:
I hope it's pretty well documented.
The text was updated successfully, but these errors were encountered: