-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite match and match_phrase queries to term queries on keyword fields #82612
Rewrite match and match_phrase queries to term queries on keyword fields #82612
Conversation
Pinging @elastic/es-search (Team:Search) |
@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample |
@elasticmachine run elasticsearch-ci/part-1 |
@@ -63,26 +63,26 @@ | |||
|
|||
static { | |||
addCandidate(""" | |||
"match": { "keyword_field": "value"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reviewers: this test writes percolator queries into an old index, then upgrades and tests that reading the index gets the same queries out. This commit changes how keyword fields are rewritten, so breaks the assumption in this test that the queries will look the same. Changing the query so that it targets a text field instead of a keyword field preserves the constraint under test.
QueryBuilder queryBuilder = new MatchPhraseQueryBuilder(KEYWORD_FIELD_NAME, "value"); | ||
SearchExecutionContext context = createSearchExecutionContext(); | ||
QueryBuilder rewritten = queryBuilder.rewrite(context); | ||
assertThat(rewritten, instanceOf(TermQueryBuilder.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to check the field and the value, just to make sure?
SearchExecutionContext context = createSearchExecutionContext(); | ||
QueryBuilder rewritten = queryBuilder.rewrite(context); | ||
assertThat(rewritten, instanceOf(TermQueryBuilder.class)); | ||
assertThat(rewritten.boost(), equalTo(2f)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check field and value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also took a quick look and left one minor remark. Also one more edge case that I had a question about: if the original query contains a "zero_match" of ALL (or Null for that matter), I think we get a slight difference if the query value is empty. All other cases should be fine since a Keyword analyzer never returns no tokens, but I think that case needs to be specifically handled.
// and possibly shortcut | ||
if (analyzer != null) { | ||
if (sec.getIndexAnalyzers().get(analyzer) == Lucene.KEYWORD_ANALYZER) { | ||
TermQueryBuilder termQueryBuilder = new TermQueryBuilder(fieldName, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe also add a test for this code path, I think only keyword fields are covered atm
// If we're using the default keyword analyzer then we can rewrite this to a TermQueryBuilder | ||
// and possibly shortcut | ||
if (analyzer != null) { | ||
if (sec.getIndexAnalyzers().get(analyzer) == Lucene.KEYWORD_ANALYZER) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe also add a test for this code path, I think only keyword fields are covered atm
An excellent suggestion, as it turns out the code was incorrect. I've updated, with new tests for query-level |
if (zeroTermsQuery == ZeroTermsQueryOption.ALL) { | ||
return new MatchAllQueryBuilder(); | ||
} | ||
return new MatchNoneQueryBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use zeroTermsQuery#asQuery() instead if the if statement here? That would include the NULL option which returns null, I don't know if this can cause problems in further rewriting.
} | ||
return this; | ||
} | ||
|
||
private NamedAnalyzer configuredAnalyzer(SearchExecutionContext context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can make the query analyzer an input argument, then the two copies of this could be merged into a static utility function somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses the fieldname as well; I'm not sure how generally useful this is, really. I think two copies is fine?
It turns out that the MatchQueryParser already detects if we have a keyword analyzer and skips the zero terms query logic in that case; so I think we need to explicitly not handle it here? |
That's the cause of the failing test in VersionStringFieldTests - it indexes an empty version, and checks that searching for an empty string finds it. |
@elasticmachine run elasticsearch-ci/part-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why you backed out of the zero terms query logic, LGTM
…lds (elastic#82612) Term queries can in certain circumstances (eg when run against constant keyword fields) rewrite themselves to match_no_docs queries, which is very useful for filtering out shards from searches and field_caps requests. But match and match_phrase queries can reduce down to simple term queries when there is no fuzziness defined on them, and when they are run using a keyword analyzer. This commit makes simple match and match_phrase rewrite themselves to term queries when run against keyword fields. Fixes elastic#82515
Term queries can in certain circumstances (eg when run against constant keyword
fields) rewrite themselves to match_no_docs queries, which is very useful for filtering
out shards from searches and field_caps requests. But
match
andmatch_phrase
queries can reduce down to simple term queries when there is no fuzziness defined
on them, and when they are run using a keyword analyzer.
This commit makes simple
match
andmatch_phrase
rewrite themselves toterm
queries when run against keyword fields.
Fixes #82515