Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add PhrasePrefixQuery #1842

Merged
merged 8 commits into from
Feb 22, 2023
Merged

add PhrasePrefixQuery #1842

merged 8 commits into from
Feb 22, 2023

Conversation

trinity-1686a
Copy link
Contributor

@trinity-1686a trinity-1686a commented Feb 8, 2023

@trinity-1686a trinity-1686a marked this pull request as ready for review February 8, 2023 15:43
fn query_terms<'a>(&'a self, visitor: &mut dyn FnMut(&'a Term, bool)) {
for (_, term) in &self.phrase_terms {
visitor(term, true);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you forgot the prefix (expanded prefix which is a suffix)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not put it on purpose, we don't use that Term directly, only as a bound in a range (and at this instant, we can't expand it yet). RangeQuery isn't giving its bounds when calling that function, so it seemed more coherent to not do that here either. If you think I still should emit it, please reply back.

for &(offset, ref term) in &self.phrase_terms {
if let Some(postings) = reader
.inverted_index(term.field())?
.read_postings_no_deletes(term, IndexRecordOption::WithFreqsAndPositions)?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you copied this from phrase_weight but the code is meaningless.. As far as I can tell read_postings_no_deletes does the same as read_postings.

Can you open an issue to cleanup this mess read_postings_no_deletes?

Correctness is (probably) not at stake here... We do the check at the collector level anyway.
There is a non-trivial optimization decision to take here.

Do we do

check_phrase(intersection(remove_deletes(postings)))
or
remove_deletes(check_phrase(intersection(postings)))

The earlier probably seems trivially faster, but I suspect it depends on the filtering power of deletes.
Anyway for the moment we can probably stick to

remove_deletes(check_phrase(intersection(postings)))

for simplificity, and clean up the existing code.

while stream.advance() && (suffixes.len() as u32) < self.max_expansions {
new_term.clear_with_type(new_term.typ());
new_term.append_bytes(stream.key());
if reader.has_deletes() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above. we don't need the distinction

Copy link
Collaborator

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but can you go throught the minor suggestions?

They are mostly about naming / lack of comments / one bug / and a WTF coming from the original phrase scorer.

@codecov-commenter
Copy link

Codecov Report

Merging #1842 (cee7272) into main (e2aa5af) will decrease coverage by 0.15%.
The diff coverage is 78.51%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main    #1842      +/-   ##
==========================================
- Coverage   94.64%   94.50%   -0.15%     
==========================================
  Files         301      305       +4     
  Lines       54924    55392     +468     
==========================================
+ Hits        51985    52346     +361     
- Misses       2939     3046     +107     
Impacted Files Coverage Δ
src/query/mod.rs 100.00% <ø> (ø)
src/query/phrase_query/mod.rs 92.98% <ø> (ø)
.../query/phrase_prefix_query/phrase_prefix_scorer.rs 71.53% <71.53%> (ø)
...c/query/phrase_prefix_query/phrase_prefix_query.rs 76.34% <76.34%> (ø)
.../query/phrase_prefix_query/phrase_prefix_weight.rs 80.00% <80.00%> (ø)
src/query/phrase_prefix_query/mod.rs 100.00% <100.00%> (ø)
src/query/phrase_query/phrase_scorer.rs 88.85% <100.00%> (+0.62%) ⬆️
src/schema/document.rs 87.64% <0.00%> (-4.50%) ⬇️
src/schema/schema.rs 98.92% <0.00%> (+0.13%) ⬆️
sstable/src/dictionary.rs 89.44% <0.00%> (+0.31%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants