Bug with distinct attributes and nbHits #2532

bsurai · 2022-06-17T11:43:27Z

This is my GitHub repository with:

dataset
meilisearch settings
test case
screenshots

I hope it's pretty well documented.

curquiza · 2022-06-20T08:08:24Z

Hello @bsurai thanks a lot for your repo explaining the issue!
We'll try to take care of this as soon as possible!

Kerollmops · 2022-06-22T09:52:33Z

Thank you for the time you took to set up a reproducible environment, I was able to reproduce the behavior on my side. I have done an explanation of what is happening and the reason why it is tricky to fix without impacting the performance. The PR I linked improved a little bit the situation by prioritizing a part of the code that was responsible for the final nbHits/estimatedNbHits count, you will see 131 instead of 171 when you fetch the first 20 results.

I hope it helps,
Have a good day!

563: Improve the `estimatedNbHits` when a `distinctAttribute` is specified r=irevoire a=Kerollmops This PR is related to meilisearch/meilisearch#2532 but it doesn't fix it entirely. It improves it by computing the excluded documents (the ones with an already-seen distinct value) before stopping the loop, I think it was a mistake and should always have been this way. The reason it doesn't fix the issue is that Meilisearch is lazy, just to be sure not to compute too many things and answer by taking too much time. When we deduplicate the documents by their distinct value we must do it along the water, everytime we see a new document we check that its distinct value of it doesn't collide with an already returned document. The reason we can see the correct result when enough documents are fetched is that we were lucky to see all of the different distinct values possible in the dataset and all of the deduplication was done, no document can be returned. If we wanted to implement that to have a correct `extimatedNbHits` every time we should have done a pass on the whole set of possible distinct values for the distinct attribute and do a big intersection, this could cost a lot of CPU cycles. Co-authored-by: Kerollmops <clement@meilisearch.com>

bsurai · 2022-06-23T09:05:26Z

@Kerollmops Thank you for efforts.
I think, we will try some type of workarounds:

Creation 2 types of indexes. One for detailed items and one for unique items.
Pros:

it returns nbHits more accurate.
it should be fast

Cons:

we need 2 indexes instead of one.

curquiza · 2022-06-23T14:50:30Z

Closed by #2546, will be present in rc1! 🚀

freescout-helpdesk · 2022-12-07T08:29:39Z

We are having exactly the same issue. Checking it in v0.30.1 and the issue still persists (for both totalHits and estimatedTotalHits values).

Have the fix already been added in v0.30.1 release?

curquiza · 2022-12-07T09:27:45Z

Hello @freescout-helpdesk, I see the issue you are talking about is this one. We will let you know in the open issue, this one is an old and irrelevant one anymore.

See you there

bsurai mentioned this issue Jun 17, 2022

Wrong nbHits when using distinct attribute #2152

Closed

curquiza assigned Kerollmops Jun 20, 2022

curquiza added the bug Something isn't working as expected label Jun 20, 2022

curquiza changed the title ~~Test case for distinct attributes and nbHits~~ Bug with distinct attributes and nbHits Jun 20, 2022

Kerollmops mentioned this issue Jun 22, 2022

Improve the estimatedNbHits when a distinctAttribute is specified meilisearch/milli#563

Merged

curquiza added this to the v0.28.0 milestone Jun 22, 2022

curquiza added the milli Related to the milli workspace label Jun 22, 2022

curquiza linked a pull request Jun 23, 2022 that will close this issue

Bump milli to 0.31.1 #2546

Merged

curquiza closed this as completed Jun 23, 2022

curquiza added the v0.28.0 PRs/issues solved in v0.28.0 label Aug 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug with distinct attributes and nbHits #2532

Bug with distinct attributes and nbHits #2532

bsurai commented Jun 17, 2022 •

edited

curquiza commented Jun 20, 2022

Kerollmops commented Jun 22, 2022

bsurai commented Jun 23, 2022

curquiza commented Jun 23, 2022

freescout-helpdesk commented Dec 7, 2022

curquiza commented Dec 7, 2022

Bug with distinct attributes and nbHits #2532

Bug with distinct attributes and nbHits #2532

Comments

bsurai commented Jun 17, 2022 • edited

curquiza commented Jun 20, 2022

Kerollmops commented Jun 22, 2022

bsurai commented Jun 23, 2022

curquiza commented Jun 23, 2022

freescout-helpdesk commented Dec 7, 2022

curquiza commented Dec 7, 2022

bsurai commented Jun 17, 2022 •

edited