Update nbHits count with filtered documents #849

balajisivaraman · 2020-07-11T10:25:59Z

Closes #764
close #1039

After discussing with @MarinPostma on Slack, this is my first attempt at implementing this for the basic flow that will go through bucket_sort_with_distinct.

A few thoughts here:

For getting the count of filtered documents alone, I originally thought of using filter_map.values().filter(|&&v| !v).count(). In a few cases, this was the same as what I have now implemented. But I realised I couldn't do something similar for distinct. So for being consistent, I have implemented both in a similar fashion.
I also needed the contains_key check to ensure we're not counting the same document ID twice.

@MarinPostma also mentioned that this will be an approximation since the sort is lazy. In the test example that I've updated, the actual filtered count will be just 19 (for male records), but due to the limit in play, it returns 32 (filtering out 11 records overall).

Please let me know if this is the kind of fix we are looking for, and I can implement it in the placeholder search also.

balajisivaraman · 2020-07-20T15:09:15Z

@MarinPostma, Updated for placeholder search also. Let me know if I have missed anything.

MarinPostma · 2020-07-29T11:42:27Z

Hey @balajisivaraman, sorry I need time to look into this, I'll review this PR asap

balajisivaraman · 2020-07-30T17:55:57Z

@MarinPostma, No worries! Thanks.

MarinPostma · 2020-07-31T12:20:52Z

meilisearch-core/src/bucket_sort.rs

                            Some(key) => buf_distinct.register(key),
                            None => buf_distinct.register_without_key(),
                        };
+
+                        if !distinct_accepted && !contains_key {


why do you need to check contain_key?

Ah, now I see this is a mistake. I noticed that when I did the filter_accepted check, I needed the contains_key since the document IDs were repeated in the groups. If I didn't do the contains_key for filter, then I got overall nbHits as 0 since filtered_count was high. I think I don't need this for distinct_accepted like you pointed out. I will remove it.

MarinPostma · 2020-07-31T12:21:35Z

meilisearch-core/src/bucket_sort.rs

+                        let contains_key = key_cache.contains_key(&document.id);
                        let entry = key_cache.entry(document.id);


Suggested change

let contains_key = key_cache.contains_key(&document.id);

let entry = key_cache.entry(document.id);

let entry = key_cache.entry(document.id);

let contains_key = entry.is_some();

meilisearch-core/src/query_builder.rs

neevany · 2020-10-05T06:33:31Z

Hi Guys, anything i can contribute to this issue to get it merged? Thanks

MarinPostma

Hello! Very sorry for being this long to review this PR. It still need some little changes, but it is a good job nonetheless. Thanks a lot for your contribution!

MarinPostma · 2020-10-12T13:43:01Z

meilisearch-core/src/bucket_sort.rs

+                    } else if !contains_key {
+                        filtered_count += 1;


Suggested change

} else if !contains_key {

filtered_count += 1;

}

MarinPostma · 2020-10-12T13:45:55Z

meilisearch-core/src/bucket_sort.rs

@@ -331,10 +333,16 @@ where
                        let entry = key_cache.entry(document.id);
                        let key = entry.or_insert_with(|| (distinct)(document.id).map(Rc::new));

-                        match key.clone() {
+                        let distinct_accepted = match key.clone() {


Suggested change

let distinct_accepted = match key.clone() {

for document in group.iter() {

let filter_accepted = match &filter {

Some(filter) => {

let entry = filter_map.entry(document.id);

let accepted = *entry.or_insert_with(|| (filter)(document.id));

if !accepted {

filtered_count += 1;

}

accepted

}

None => true,

};

I would suggest this instead, this is more straightforward I think

@MarinPostma, I finally got around to this. Just for clarification, this diff should be applied in place of Lines 322 to 333 right? And I should just get rid of the distinct_accepted logic in the if filter_accepted block. However, when I tried this originally (and again now after your suggestion), for the test search_with_filter, I get 0 as the final count, presumably because it filters out everything in the sample set. When I tried this now, I got a 'attempt to subtract with overflow' panic also. Am I missing something?

You're right I have been a little too fast on this one, I have tried this one out and it seems to be working:

for group in group.binary_group_by_mut(|a, b| criterion.eq(&ctx, a, b)) { // we must compute the real distinguished len of this sub-group for document in group.iter() { let filter_accepted = match &filter { Some(filter) => { let entry = filter_map.entry(document.id); *entry.or_insert_with(|| { let accepted = (filter)(document.id); // we only want to count it out the first time we see it if !accepted { filtered_count += 1; } accepted }) } None => true, }; if filter_accepted { let entry = key_cache.entry(document.id); let mut seen = true; let key = entry.or_insert_with(|| { seen = false; (distinct)(document.id).map(Rc::new) }); let distinct = match key.clone() { Some(key) => buf_distinct.register(key), None => buf_distinct.register_without_key(), }; // we only want to count the document if it is the first time we see it and // if it wasn't accepted by distinct if !seen && !distinct { filtered_count += 1; } } // the requested range end is reached: stop computing distinct if buf_distinct.len() >= range.end { break; } } documents_seen += group.len(); groups.push(group); // if this sub-group does not overlap with the requested range // we must update the distinct map and its start index if buf_distinct.len() < range.start { buf_distinct.transfert_to_internal(); distinct_raw_offset = documents_seen; } // we have sort enough documents if the last document sorted is after // the end of the requested range, we can continue to the next criterion if buf_distinct.len() >= range.end { continue 'criteria; } }

these are the tests I have been trying it on:

#[actix_rt::test] async fn test_filter_nb_hits_search_normal() { let mut server = common::Server::with_uid("test"); let body = json!({ "uid": "test", "primaryKey": "id", }); server.create_index(body).await; let documents = json!([ { "id": 1, "content": "a", "color": "green", "size": 1, }, { "id": 2, "content": "a", "color": "green", "size": 2, }, { "id": 3, "content": "a", "color": "blue", "size": 3, }, ]); server.add_or_update_multiple_documents(documents).await; let (response, _) = server.search_post(json!({"q": "a"})).await; assert_eq!(response["nbHits"], 3); let (response, _) = server.search_post(json!({"q": "a", "filters": "size = 1"})).await; assert_eq!(response["nbHits"], 1); server.update_distinct_attribute(json!("color")).await; let (response, _) = server.search_post(json!({"q": "a"})).await; assert_eq!(response["nbHits"], 2); let (response, _) = server.search_post(json!({"q": "a", "filters": "size < 3"})).await; println!("result: {}", response); assert_eq!(response["nbHits"], 1); }

the reason for the subtracting with overflow is that we are counting some items more than once (as you've probably figured), we really want to count them only once we first see them :)

we basically have to try the same tests on placeholder search now

MarinPostma · 2020-10-12T13:47:07Z

meilisearch-core/src/query_builder.rs

+                    let is_filtered = (filter)(**item);
+                    if is_filtered {
+                        filtered_count += 1;
+                    }
+                    is_filtered


Suggested change

let is_filtered = (filter)(**item);

if is_filtered {

filtered_count += 1;

}

is_filtered

let accepted = (filter)(**item);

if !accepted {

filtered_count += 1;

}

accepted

balajisivaraman · 2020-10-14T15:15:26Z

@MarinPostma, Thanks for the comments. Is it okay if I get to this in the next couple of days and close it out by the weekend?

MarinPostma · 2020-10-14T15:16:46Z

Yes take your time! ThankS for your help 🙂

ManyTheFish · 2020-10-26T12:32:53Z

we should test the behavior of #1039 to know if this PR fix it.

…search

balajisivaraman · 2020-11-19T15:36:36Z

@MarinPostma, Done. Thanks for the help on this one, I had trouble figuring some things out. I tested the placeholder search test with the fix removed, and it failed correctly. So I think we look okay here.

codecov · 2020-11-19T15:57:53Z

Codecov Report

Merging #849 (75e22fc) into master (ef6b56d) will increase coverage by 0.42%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #849      +/-   ##
==========================================
+ Coverage   76.53%   76.96%   +0.42%     
==========================================
  Files         104      104              
  Lines       12127    12179      +52     
==========================================
+ Hits         9282     9374      +92     
+ Misses       2845     2805      -40

Impacted Files	Coverage Δ
meilisearch-core/src/bucket_sort.rs	`82.48% <100.00%> (+1.09%)`	⬆️
meilisearch-core/src/query_builder.rs	`98.23% <100.00%> (+1.89%)`	⬆️
meilisearch-http/tests/placeholder_search.rs	`98.90% <100.00%> (+0.10%)`	⬆️
meilisearch-http/tests/search.rs	`99.40% <100.00%> (+0.03%)`	⬆️
meilisearch-core/src/number.rs	`31.78% <0.00%> (ø)`
meilisearch-http/tests/common.rs	`90.86% <0.00%> (+0.02%)`	⬆️
meilisearch-core/src/filters/mod.rs	`85.71% <0.00%> (+1.19%)`	⬆️
meilisearch-core/src/store/mod.rs	`66.87% <0.00%> (+1.91%)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a67862...75e22fc. Read the comment docs.

MarinPostma · 2020-11-24T12:19:48Z

bors try

bors · 2020-11-24T12:42:28Z

try

Build succeeded:

MarinPostma

looks good to me, thank you :)

MarinPostma · 2020-11-26T09:52:57Z

bors r+

bors · 2020-11-26T10:15:35Z

Build succeeded:

balajisivaraman · 2020-11-26T10:17:12Z

Thanks so much for merging this, and the support on this one from @MarinPostma.

MarinPostma · 2020-11-26T11:09:28Z

Thank you! And sorry for taking so long :)

theo-lubert · 2020-12-09T01:13:22Z

Testing the new v0.17.0 release as we speak (great work), but it seems the patch does not work if there is a limit

For exemple, with 3 documents in my index "events" (2 of them with a date in the past):

limit: 25 and date >= ${Date.now} gives you nbHits: 1 (correct)
but,
limit: 1 and date >= ${Date.now} gives you nbHits: 3 (incorrect)

MarinPostma · 2020-12-09T09:10:02Z

hello @theo-lubert, this is perfectly normal since we lazily filter elements, since you only ask for one, and it is not filtered out, the other are still counted.

theo-lubert · 2020-12-09T09:20:02Z

Thanks @MarinPostma , so as I understand it, this is not a good fit for (filtered) pagination then. Any plans on that ? I don't find any feature related to that in the roadmap, should I create a new issue ?

Vincent56 · 2021-05-07T12:30:43Z

hi! is there any news on this front? having filters show the correct nbhits would be amazing!

curquiza · 2021-05-11T18:11:55Z

Hello @Vincent56, the product team is currently working on the expected behavior we want for nbHits. We'll keep you informed 🙂
It might help us if you would like to share with us your usecase of nbHits

celilsahin · 2021-05-28T22:32:53Z

Hello @Vincent56, the product team is currently working on the expected behavior we want for nbHits. We'll keep you informed 🙂
It might help us if you would like to share with us your usecase of nbHits

My filters;
{ "q":"", "limit":10, "offset":0, "filters": "city=test AND confirm!=DELETE AND authorId=T8jFa4" }
{ "q":"", "limit":10, "offset":0, "filters": "city=test AND confirm!=DELETE" }

nbHits = +76, but hit 15

balajisivaraman force-pushed the wip_764 branch from f1c9326 to 5693087 Compare July 20, 2020 15:08

MarinPostma suggested changes Jul 31, 2020

View reviewed changes

balajisivaraman force-pushed the wip_764 branch from 5693087 to b49ab80 Compare August 2, 2020 09:57

qdequele added this to the v0.14.0 milestone Aug 5, 2020

bidoubiwa mentioned this pull request Aug 10, 2020

Prepare documentation for MeiliSearch v0.14 meilisearch/documentation#470

Merged

4 tasks

ManyTheFish removed this from the 09/2020 milestone Aug 20, 2020

MarinPostma mentioned this pull request Aug 25, 2020

facetsDistribution counts don't take into account filters #929

Closed

ManyTheFish added this to the 10/2020 milestone Aug 31, 2020

MarinPostma mentioned this pull request Sep 24, 2020

offset queries return wrong number of hits #947

Closed

qdequele removed this from the 10/2020 milestone Sep 28, 2020

ManyTheFish added this to the 11/20 milestone Oct 5, 2020

MarinPostma added the RFR label Oct 12, 2020

MarinPostma suggested changes Oct 12, 2020

View reviewed changes

MarinPostma mentioned this pull request Oct 22, 2020

Question: Combine facets with filter #1039

Closed

MarinPostma removed this from the 11/20 milestone Oct 27, 2020

MarinPostma removed the RFR label Oct 30, 2020

balajisivaraman added 2 commits November 19, 2020 19:35

feat(search): update nbHits count with filtered docs for core flow

43df4a5

feat(search): update nbHits count with filtered docs for placeholder …

75e22fc

…search

balajisivaraman force-pushed the wip_764 branch from b49ab80 to 75e22fc Compare November 19, 2020 15:34

balajisivaraman marked this pull request as ready for review November 19, 2020 15:37

MarinPostma added the RFR label Nov 19, 2020

bors bot added a commit that referenced this pull request Nov 24, 2020

Try #849:

01ceccb

MarinPostma approved these changes Nov 24, 2020

View reviewed changes

MarinPostma requested a review from Kerollmops November 24, 2020 13:11

qdequele added this to the 12/2020 milestone Nov 26, 2020

Kerollmops approved these changes Nov 26, 2020

View reviewed changes

MarinPostma removed the RFR label Nov 26, 2020

bors bot merged commit f564a9c into meilisearch:master Nov 26, 2020

balajisivaraman deleted the wip_764 branch November 26, 2020 10:16

theo-lubert mentioned this pull request Dec 9, 2020

Total result count (nbHits) not updated for filters #764

Closed

curquiza mentioned this pull request Dec 9, 2020

Confusing nbHits #1120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update nbHits count with filtered documents #849

Update nbHits count with filtered documents #849

balajisivaraman commented Jul 11, 2020 •

edited by MarinPostma

balajisivaraman commented Jul 20, 2020

MarinPostma commented Jul 29, 2020

balajisivaraman commented Jul 30, 2020

MarinPostma Jul 31, 2020

balajisivaraman Aug 2, 2020

MarinPostma Jul 31, 2020

neevany commented Oct 5, 2020

MarinPostma left a comment

MarinPostma Oct 12, 2020

MarinPostma Oct 12, 2020

balajisivaraman Nov 16, 2020

MarinPostma Nov 16, 2020

MarinPostma Nov 16, 2020

MarinPostma Nov 16, 2020

MarinPostma Oct 12, 2020

balajisivaraman commented Oct 14, 2020

MarinPostma commented Oct 14, 2020

ManyTheFish commented Oct 26, 2020

balajisivaraman commented Nov 19, 2020

codecov bot commented Nov 19, 2020 •

edited

MarinPostma commented Nov 24, 2020

bors bot commented Nov 24, 2020

MarinPostma left a comment

MarinPostma commented Nov 26, 2020

bors bot commented Nov 26, 2020

balajisivaraman commented Nov 26, 2020

MarinPostma commented Nov 26, 2020

theo-lubert commented Dec 9, 2020

MarinPostma commented Dec 9, 2020 •

edited

theo-lubert commented Dec 9, 2020

Vincent56 commented May 7, 2021

curquiza commented May 11, 2021 •

edited

celilsahin commented May 28, 2021 •

edited

		let contains_key = key_cache.contains_key(&document.id);
		let entry = key_cache.entry(document.id);

-                        let distinct_accepted = match key.clone() {
+                for document in group.iter() {
+                    let filter_accepted = match &filter {
+                        Some(filter) => {
+                            let entry = filter_map.entry(document.id);
+                            let accepted = *entry.or_insert_with(|| (filter)(document.id));
+                            if !accepted {
+                                filtered_count += 1;
+                            }
+                            accepted
+                        }
+                        None => true,
+                    };

Update nbHits count with filtered documents #849

Update nbHits count with filtered documents #849

Conversation

balajisivaraman commented Jul 11, 2020 • edited by MarinPostma

balajisivaraman commented Jul 20, 2020

MarinPostma commented Jul 29, 2020

balajisivaraman commented Jul 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neevany commented Oct 5, 2020

MarinPostma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

balajisivaraman commented Oct 14, 2020

MarinPostma commented Oct 14, 2020

ManyTheFish commented Oct 26, 2020

balajisivaraman commented Nov 19, 2020

codecov bot commented Nov 19, 2020 • edited

Codecov Report

MarinPostma commented Nov 24, 2020

bors bot commented Nov 24, 2020

try

MarinPostma left a comment

Choose a reason for hiding this comment

MarinPostma commented Nov 26, 2020

bors bot commented Nov 26, 2020

balajisivaraman commented Nov 26, 2020

MarinPostma commented Nov 26, 2020

theo-lubert commented Dec 9, 2020

MarinPostma commented Dec 9, 2020 • edited

theo-lubert commented Dec 9, 2020

Vincent56 commented May 7, 2021

curquiza commented May 11, 2021 • edited

celilsahin commented May 28, 2021 • edited

balajisivaraman commented Jul 11, 2020 •

edited by MarinPostma

codecov bot commented Nov 19, 2020 •

edited

MarinPostma commented Dec 9, 2020 •

edited

curquiza commented May 11, 2021 •

edited

celilsahin commented May 28, 2021 •

edited