Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose phrase-prefix queries via the built-in query parser #2044

Merged
merged 3 commits into from
Jun 1, 2023
Merged

Expose phrase-prefix queries via the built-in query parser #2044

merged 3 commits into from
Jun 1, 2023

Conversation

adamreichold
Copy link
Collaborator

This proposes the less-than-imaginative syntax field:"phrase ter"* to perform a phrase prefix query against field using phrase and ter as the terms. The aim of this is to make this type of query more discoverable and simplify manual testing.

I did consider exposing the max_expansions parameter similar to how slop is handled, but I think that this is rather something that should be configured via the querser parser (similar to set_field_boost and set_field_fuzzy) as choosing it requires rather intimiate knowledge of the backing index.

@fulmicoton
Copy link
Collaborator

fulmicoton commented May 18, 2023

@adamreichold is this something that you need?
(this does seem useful...)

@adamreichold
Copy link
Collaborator Author

@adamreichold is this something that you need? (this does seem useful...)

I would say that it is something that I want: I have an internal UI for testing purposes which just uses the built-in query parser. Hence, if we want to try out some functionality before we commit to integrating it, it is easiest to expose it in the built-in query parser so the data people can play with it.

I also thought that it is a rather straight-forward extension of the existing support for phrase queries and slop that does not really introduce anything new conceptually. So the maintenance effort should be limited making this a reasonable trade-off?

@codecov-commenter
Copy link

Codecov Report

Merging #2044 (557af61) into main (62709b8) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@           Coverage Diff           @@
##             main    #2044   +/-   ##
=======================================
  Coverage   94.43%   94.43%           
=======================================
  Files         319      319           
  Lines       59689    59750   +61     
=======================================
+ Hits        56365    56425   +60     
- Misses       3324     3325    +1     
Impacted Files Coverage Δ
query-grammar/src/query_grammar.rs 99.73% <100.00%> (+<0.01%) ⬆️
query-grammar/src/user_input_ast.rs 97.29% <100.00%> (+0.04%) ⬆️
src/query/query_parser/logical_ast.rs 82.81% <100.00%> (+1.45%) ⬆️
src/query/query_parser/query_parser.rs 94.77% <100.00%> (+0.16%) ⬆️

... and 3 files with indirect coverage changes

@fulmicoton
Copy link
Collaborator

@adamreichold that makes sense... Let's see if we can slip that into the next tantivy release.

@fulmicoton
Copy link
Collaborator

@PSeitz can you take this review?

@PSeitz
Copy link
Contributor

PSeitz commented May 24, 2023

@adamreichold Can you add a test with data?

What happens for "wo"*

@adamreichold
Copy link
Collaborator Author

adamreichold commented May 24, 2023

Can you add a test with data?

I believe those would not be added to the query parser tests but rather to the implementation of phrase-prefix weight (as with the other query types). But that weight already has tests using an index with data. Is there anything specific missing from those tests that should be added?

What happens for "wo"*

Nice catch! I added an explicit error for that case as phrase-prefix queries only really make sense with at least two terms (in contrast to phrase queries which sensibly "downgrade" to term queries).

@adamreichold
Copy link
Collaborator Author

I believe those would not be added to the query parser tests but rather to the implementation of phrase-prefix weight (as with the other query types). But that weight already has tests using an index with data. Is there anything specific missing from those tests that should be added?

@PSeitz Friendly ping on the above question.

@PSeitz
Copy link
Contributor

PSeitz commented May 31, 2023

Can you add a test with data?

I believe those would not be added to the query parser tests but rather to the implementation of phrase-prefix weight (as with the other query types). But that weight already has tests using an index with data. Is there anything specific missing from those tests that should be added?

I was thinking of an end to end test, that makes sure everything is correctly connected. It's also a form of documentation and helps discovering features

This proposes the less-than-imaginative syntax `field:"phrase ter"*` to
perform a phrase prefix query against `field` using `phrase` and `ter` as the
terms. The aim of this is to make this type of query more discoverable and
simplify manual testing.

I did consider exposing the `max_expansions` parameter similar to how slop is
handled, but I think that this is rather something that should be configured via
the querser parser (similar to `set_field_boost` and `set_field_fuzzy`) as
choosing it requires rather intimiate knowledge of the backing index.
@adamreichold
Copy link
Collaborator Author

I was thinking of an end to end test, that makes sure everything is correctly connected. It's also a form of documentation and helps discovering features

With this motivation in mind, I added a dedicated example as the discoverability of unit tests is limited and adding a doc test to QueryParser, but only for phrase-prefix queries would most likely confuse people reading that comment for the first time.

@adamreichold
Copy link
Collaborator Author

adamreichold commented May 31, 2023

I replaced the variable name option by indexing_options as suggested in both cases.

Copy link
Contributor

@PSeitz PSeitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks!

@PSeitz PSeitz merged commit b325d56 into quickwit-oss:main Jun 1, 2023
4 checks passed
@adamreichold adamreichold deleted the query-parser-phrase-prefix branch June 1, 2023 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants