add PhrasePrefixQuery #1842

trinity-1686a · 2023-02-08T10:02:03Z

see quickwit-oss/quickwit#2266

src/query/phrase_prefix_query/phrase_prefix_scorer.rs

src/query/phrase_prefix_query/phrase_prefix_query.rs

fulmicoton · 2023-02-20T01:12:42Z

src/query/phrase_prefix_query/phrase_prefix_query.rs

+    fn query_terms<'a>(&'a self, visitor: &mut dyn FnMut(&'a Term, bool)) {
+        for (_, term) in &self.phrase_terms {
+            visitor(term, true);
+        }


you forgot the prefix (expanded prefix which is a suffix)

I did not put it on purpose, we don't use that Term directly, only as a bound in a range (and at this instant, we can't expand it yet). RangeQuery isn't giving its bounds when calling that function, so it seemed more coherent to not do that here either. If you think I still should emit it, please reply back.

fulmicoton · 2023-02-20T01:26:59Z

src/query/phrase_prefix_query/phrase_prefix_weight.rs

+            for &(offset, ref term) in &self.phrase_terms {
+                if let Some(postings) = reader
+                    .inverted_index(term.field())?
+                    .read_postings_no_deletes(term, IndexRecordOption::WithFreqsAndPositions)?


I know you copied this from phrase_weight but the code is meaningless.. As far as I can tell read_postings_no_deletes does the same as read_postings.

Can you open an issue to cleanup this mess read_postings_no_deletes?

Correctness is (probably) not at stake here... We do the check at the collector level anyway.
There is a non-trivial optimization decision to take here.

Do we do

check_phrase(intersection(remove_deletes(postings)))
or
remove_deletes(check_phrase(intersection(postings)))

The earlier probably seems trivially faster, but I suspect it depends on the filtering power of deletes.
Anyway for the moment we can probably stick to

remove_deletes(check_phrase(intersection(postings)))

for simplificity, and clean up the existing code.

fulmicoton · 2023-02-20T01:27:55Z

src/query/phrase_prefix_query/phrase_prefix_weight.rs

+        while stream.advance() && (suffixes.len() as u32) < self.max_expansions {
+            new_term.clear_with_type(new_term.typ());
+            new_term.append_bytes(stream.key());
+            if reader.has_deletes() {


same as above. we don't need the distinction

fulmicoton

LGTM but can you go throught the minor suggestions?

They are mostly about naming / lack of comments / one bug / and a WTF coming from the original phrase scorer.

src/query/phrase_prefix_query/phrase_prefix_query.rs

src/query/phrase_prefix_query/phrase_prefix_weight.rs

codecov-commenter · 2023-02-20T15:20:21Z

Codecov Report

Merging #1842 (cee7272) into main (e2aa5af) will decrease coverage by 0.15%.
The diff coverage is 78.51%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main    #1842      +/-   ##
==========================================
- Coverage   94.64%   94.50%   -0.15%     
==========================================
  Files         301      305       +4     
  Lines       54924    55392     +468     
==========================================
+ Hits        51985    52346     +361     
- Misses       2939     3046     +107

Impacted Files	Coverage Δ
src/query/mod.rs	`100.00% <ø> (ø)`
src/query/phrase_query/mod.rs	`92.98% <ø> (ø)`
.../query/phrase_prefix_query/phrase_prefix_scorer.rs	`71.53% <71.53%> (ø)`
...c/query/phrase_prefix_query/phrase_prefix_query.rs	`76.34% <76.34%> (ø)`
.../query/phrase_prefix_query/phrase_prefix_weight.rs	`80.00% <80.00%> (ø)`
src/query/phrase_prefix_query/mod.rs	`100.00% <100.00%> (ø)`
src/query/phrase_query/phrase_scorer.rs	`88.85% <100.00%> (+0.62%)`	⬆️
src/schema/document.rs	`87.64% <0.00%> (-4.50%)`	⬇️
src/schema/schema.rs	`98.92% <0.00%> (+0.13%)`	⬆️
sstable/src/dictionary.rs	`89.44% <0.00%> (+0.31%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

fulmicoton reviewed Feb 8, 2023

View reviewed changes

src/query/phrase_prefix_query/phrase_prefix_scorer.rs Outdated Show resolved Hide resolved

trinity-1686a marked this pull request as ready for review February 8, 2023 15:43

trinity-1686a requested a review from fulmicoton February 13, 2023 14:09