Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow slop in both directions #2020

Merged
merged 3 commits into from May 7, 2023
Merged

allow slop in both directions #2020

merged 3 commits into from May 7, 2023

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented May 1, 2023

allow slop in both directions
so "big wolf"~3 can also match "wolf big"

This also fixes #1934, when the docsets were reordered by size and didn't
match the terms.

allow slop in both directions
so "big wolf"~3 can also match "wolf big"

This also fixes #1934, when the docsets were reordered by size and didn't
match the terms.
@@ -104,7 +104,8 @@ pub(crate) fn intersection_count(left: &[u32], right: &[u32]) -> usize {
/// resulting array in left.
///
/// Returns the length of the intersection
fn intersection(left: &mut [u32], right: &[u32]) -> usize {
#[inline]
fn intersection(left: &mut Vec<u32>, right: &[u32]) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to return count if left is truncated anyway so that the information is "returned" via left.len()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but it's clear for the function caller

Copy link
Collaborator

@fulmicoton fulmicoton May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for this change? If it is to simplify the code, as @adamreichold noted, the new API is really bad.

I'd rather keep the previous logic. If you want to simplify this part, do this in a separate PR, as we will need to make sure that it does not hurt performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's simpler to use, you get the results in left, instead of left[..returned size]

Copy link
Collaborator

@fulmicoton fulmicoton May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. But currently the change is halfway, can we remove the returned value?

Also can we check it improves or does not hurt perf on phrase queries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. But currently the change is halfway, can we remove the returned value?

We need to return the count, in case left is not updated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the results looks good!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to return the count, in case left is not updated

I don't understand.
left.len() is not always the same as the result of the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I thought was about intersection_count_with_slop
returned count is never used for intersection now, I removed it

@codecov-commenter
Copy link

codecov-commenter commented May 1, 2023

Codecov Report

Merging #2020 (daed07a) into main (ba309e1) will decrease coverage by 0.04%.
The diff coverage is 84.55%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main    #2020      +/-   ##
==========================================
- Coverage   94.42%   94.38%   -0.04%     
==========================================
  Files         319      319              
  Lines       59216    59003     -213     
==========================================
- Hits        55914    55691     -223     
- Misses       3302     3312      +10     
Impacted Files Coverage Δ
src/query/phrase_query/phrase_scorer.rs 89.04% <79.31%> (+0.18%) ⬆️
src/query/phrase_query/mod.rs 97.41% <97.22%> (+4.42%) ⬆️

... and 13 files with indirect coverage changes

@PSeitz PSeitz requested a review from fulmicoton May 1, 2023 15:09
if left_val < right_slop {
left_index += 1;
} else if right_slop <= left_val && left_val <= right_val {
let distance = left_val.abs_diff(right_val);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat

@fulmicoton fulmicoton merged commit 4c58b00 into main May 7, 2023
5 checks passed
@fulmicoton fulmicoton deleted the slop_test branch May 7, 2023 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Phrase wit Slop Bug
4 participants