Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite match and match_phrase queries to term queries on keyword fields #82612

Merged
merged 10 commits into from
Jan 17, 2022

Conversation

romseygeek
Copy link
Contributor

@romseygeek romseygeek commented Jan 14, 2022

Term queries can in certain circumstances (eg when run against constant keyword
fields) rewrite themselves to match_no_docs queries, which is very useful for filtering
out shards from searches and field_caps requests. But match and match_phrase
queries can reduce down to simple term queries when there is no fuzziness defined
on them, and when they are run using a keyword analyzer.

This commit makes simple match and match_phrase rewrite themselves to term
queries when run against keyword fields.

Fixes #82515

@romseygeek romseygeek added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.1.0 labels Jan 14, 2022
@romseygeek romseygeek self-assigned this Jan 14, 2022
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jan 14, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@romseygeek
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-tests-windows-sample

@romseygeek
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-1

@@ -63,26 +63,26 @@

static {
addCandidate("""
"match": { "keyword_field": "value"}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: this test writes percolator queries into an old index, then upgrades and tests that reading the index gets the same queries out. This commit changes how keyword fields are rewritten, so breaks the assumption in this test that the queries will look the same. Changing the query so that it targets a text field instead of a keyword field preserves the constraint under test.

QueryBuilder queryBuilder = new MatchPhraseQueryBuilder(KEYWORD_FIELD_NAME, "value");
SearchExecutionContext context = createSearchExecutionContext();
QueryBuilder rewritten = queryBuilder.rewrite(context);
assertThat(rewritten, instanceOf(TermQueryBuilder.class));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to check the field and the value, just to make sure?

SearchExecutionContext context = createSearchExecutionContext();
QueryBuilder rewritten = queryBuilder.rewrite(context);
assertThat(rewritten, instanceOf(TermQueryBuilder.class));
assertThat(rewritten.boost(), equalTo(2f));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check field and value?

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also took a quick look and left one minor remark. Also one more edge case that I had a question about: if the original query contains a "zero_match" of ALL (or Null for that matter), I think we get a slight difference if the query value is empty. All other cases should be fine since a Keyword analyzer never returns no tokens, but I think that case needs to be specifically handled.

// and possibly shortcut
if (analyzer != null) {
if (sec.getIndexAnalyzers().get(analyzer) == Lucene.KEYWORD_ANALYZER) {
TermQueryBuilder termQueryBuilder = new TermQueryBuilder(fieldName, value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe also add a test for this code path, I think only keyword fields are covered atm

// If we're using the default keyword analyzer then we can rewrite this to a TermQueryBuilder
// and possibly shortcut
if (analyzer != null) {
if (sec.getIndexAnalyzers().get(analyzer) == Lucene.KEYWORD_ANALYZER) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe also add a test for this code path, I think only keyword fields are covered atm

@romseygeek
Copy link
Contributor Author

nit: maybe also add a test for this code path, I think only keyword fields are covered atm

An excellent suggestion, as it turns out the code was incorrect. I've updated, with new tests for query-level analyzer overrides and to deal with zero terms queries as well.

if (zeroTermsQuery == ZeroTermsQueryOption.ALL) {
return new MatchAllQueryBuilder();
}
return new MatchNoneQueryBuilder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use zeroTermsQuery#asQuery() instead if the if statement here? That would include the NULL option which returns null, I don't know if this can cause problems in further rewriting.

}
return this;
}

private NamedAnalyzer configuredAnalyzer(SearchExecutionContext context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can make the query analyzer an input argument, then the two copies of this could be merged into a static utility function somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses the fieldname as well; I'm not sure how generally useful this is, really. I think two copies is fine?

@romseygeek
Copy link
Contributor Author

It turns out that the MatchQueryParser already detects if we have a keyword analyzer and skips the zero terms query logic in that case; so I think we need to explicitly not handle it here?

@romseygeek
Copy link
Contributor Author

That's the cause of the failing test in VersionStringFieldTests - it indexes an empty version, and checks that searching for an empty string finds it.

@romseygeek
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-1

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you backed out of the zero terms query logic, LGTM

@romseygeek romseygeek added the auto-backport-and-merge Automatically create backport pull requests and merge when ready label Jan 17, 2022
@romseygeek romseygeek removed the auto-backport-and-merge Automatically create backport pull requests and merge when ready label Jan 17, 2022
@romseygeek romseygeek merged commit 2d77ef5 into elastic:master Jan 17, 2022
@romseygeek romseygeek deleted the query/match-phrase-rewrite branch January 17, 2022 17:02
romseygeek added a commit to romseygeek/elasticsearch that referenced this pull request Mar 21, 2022
…lds (elastic#82612)

Term queries can in certain circumstances (eg when run against constant keyword
fields) rewrite themselves to match_no_docs queries, which is very useful for filtering
out shards from searches and field_caps requests. But match and match_phrase
queries can reduce down to simple term queries when there is no fuzziness defined
on them, and when they are run using a keyword analyzer.

This commit makes simple match and match_phrase rewrite themselves to term
queries when run against keyword fields.

Fixes elastic#82515
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize match_phrase query on constant keyword field to match_none if possible
4 participants