Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with distinct attributes and nbHits #2532

Closed
bsurai opened this issue Jun 17, 2022 · 6 comments · Fixed by #2546
Closed

Bug with distinct attributes and nbHits #2532

bsurai opened this issue Jun 17, 2022 · 6 comments · Fixed by #2546
Assignees
Labels
bug Something isn't working as expected milli Related to the milli workspace v0.28.0 PRs/issues solved in v0.28.0
Milestone

Comments

@bsurai
Copy link

bsurai commented Jun 17, 2022

This is my GitHub repository with:

  • dataset
  • meilisearch settings
  • test case
  • screenshots

I hope it's pretty well documented.

@curquiza
Copy link
Member

Hello @bsurai thanks a lot for your repo explaining the issue!
We'll try to take care of this as soon as possible!

@curquiza curquiza added the bug Something isn't working as expected label Jun 20, 2022
@curquiza curquiza changed the title Test case for distinct attributes and nbHits Bug with distinct attributes and nbHits Jun 20, 2022
@Kerollmops
Copy link
Member

Hey @bsurai,

Thank you for the time you took to set up a reproducible environment, I was able to reproduce the behavior on my side. I have done an explanation of what is happening and the reason why it is tricky to fix without impacting the performance. The PR I linked improved a little bit the situation by prioritizing a part of the code that was responsible for the final nbHits/estimatedNbHits count, you will see 131 instead of 171 when you fetch the first 20 results.

I hope it helps,
Have a good day!

@curquiza curquiza added this to the v0.28.0 milestone Jun 22, 2022
bors bot added a commit to meilisearch/milli that referenced this issue Jun 22, 2022
563: Improve the `estimatedNbHits` when a `distinctAttribute` is specified r=irevoire a=Kerollmops

This PR is related to meilisearch/meilisearch#2532 but it doesn't fix it entirely. It improves it by computing the excluded documents (the ones with an already-seen distinct value) before stopping the loop, I think it was a mistake and should always have been this way.

The reason it doesn't fix the issue is that Meilisearch is lazy, just to be sure not to compute too many things and answer by taking too much time. When we deduplicate the documents by their distinct value we must do it along the water, everytime we see a new document we check that its distinct value of it doesn't collide with an already returned document. 

The reason we can see the correct result when enough documents are fetched is that we were lucky to see all of the different distinct values possible in the dataset and all of the deduplication was done, no document can be returned.

If we wanted to implement that to have a correct `extimatedNbHits` every time we should have done a pass on the whole set of possible distinct values for the distinct attribute and do a big intersection, this could cost a lot of CPU cycles.

Co-authored-by: Kerollmops <clement@meilisearch.com>
@curquiza curquiza added the milli Related to the milli workspace label Jun 22, 2022
@curquiza curquiza linked a pull request Jun 23, 2022 that will close this issue
@bsurai
Copy link
Author

bsurai commented Jun 23, 2022

@Kerollmops Thank you for efforts.
I think, we will try some type of workarounds:

Creation 2 types of indexes. One for detailed items and one for unique items.
Pros:

  • it returns nbHits more accurate.
  • it should be fast

Cons:

  • we need 2 indexes instead of one.

@curquiza
Copy link
Member

Closed by #2546, will be present in rc1! 🚀

@curquiza curquiza added the v0.28.0 PRs/issues solved in v0.28.0 label Aug 24, 2022
@freescout-helpdesk
Copy link

We are having exactly the same issue. Checking it in v0.30.1 and the issue still persists (for both totalHits and estimatedTotalHits values).

Have the fix already been added in v0.30.1 release?

@curquiza
Copy link
Member

curquiza commented Dec 7, 2022

Hello @freescout-helpdesk, I see the issue you are talking about is this one. We will let you know in the open issue, this one is an old and irrelevant one anymore.

See you there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected milli Related to the milli workspace v0.28.0 PRs/issues solved in v0.28.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants