Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize how queries on _index are handled at rewrite time #52486

Merged
merged 37 commits into from
Feb 26, 2020

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Feb 18, 2020

#49713 aims at introducing a new constant_keyword field which, like _index, always rewrites queries to a MatchAllQueryBuilder or a MatchNoneQueryBuilder in order to skip shards in the can_match phase. This change introduces a new ConstantFieldType marker class that helps get this functionality with any field and not just _index.

Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder.

This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed.

Closes #49254

This field is a specialization of the `keyword` field for the case when all
documents have the same value. It typically performs more efficiently than
keywords at query time by figuring out whether all or none of the documents
match at rewrite time, like `term` queries on `_index`.

The name is up for discussion. I liked including `keyword` in it, so that we
still have room for a `singleton_numeric` in the future. However I'm unsure
whether to call it `singleton`, `constant` or something else, any opinions?

For this field there is a choice between
 1. accepting values in `_source` when they are equal to the value configured
    in mappings, but rejecting mapping updates
 2. rejecting values in `_source` but then allowing updates to the value that
    is configured in the mapping
This commit implements option 1, so that it is possible to reindex from/to an
index that has the field mapped as a keyword with no changes to the source.
@jpountz jpountz added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.0.0 labels Feb 18, 2020
@jpountz jpountz added the v7.7.0 label Feb 18, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@jpountz jpountz changed the title Generalizy how queries on _index are handled at rewrite time Generalize how queries on _index are handled at rewrite time Feb 18, 2020
@jpountz
Copy link
Contributor Author

jpountz commented Feb 19, 2020

@elasticmachine run elasticsearch-ci/2

@jtibshirani jtibshirani self-requested a review February 21, 2020 16:02
Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jpountz, makes sense

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks good to me, I had one suggestion I wanted to raise but don't feel strongly about. Currently the contract for classes implementing ConstantFieldType is that they should produce either MatchNoDocsQuery or MatchAllDocsQuery for certain query factory methods. They should also make sure to break out of termsQuery early if they find a match. To me this strategy felt more a bit more robust:

  • Add abstract methods like bool matchesExact(Object value, QueryShardContext context) to ConstantFieldType.
  • Define final versions of termQuery, termsQuery, etc. in ConstantFieldType that call into these methods.

return new MatchNoneQueryBuilder();
}
QueryShardContext context = queryRewriteContext.convertToShardContext();
if (context != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, could just be if (context != null && context.fieldMapper(...)) { ... }

return queryBuilder.toQuery(context);
}

static Query parseQuery(QueryShardContext context, boolean mapUnmappedFieldsAsString, XContentParser parser) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unused, I think it can just be deleted?

@@ -103,7 +104,7 @@ public final Query toQuery(QueryShardContext context) throws IOException {
if (boost != DEFAULT_BOOST) {
if (query instanceof SpanQuery) {
query = new SpanBoostQuery((SpanQuery) query, boost);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my knowledge, I'm wondering why this change (and the ones to AbstractQueryTestCase) were necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change as a way to keep tests simple. For instance here is what ConstantScoreQueryBuilderTests#doAssertLuceneQuery looks like in master today.

    @Override
    protected void doAssertLuceneQuery(ConstantScoreQueryBuilder queryBuilder, Query query, QueryShardContext context) throws IOException {
        Query innerQuery = queryBuilder.innerQuery().toQuery(context);
        if (innerQuery == null) {
            assertThat(query, nullValue());
        } else {
            assertThat(query, instanceOf(ConstantScoreQuery.class));
            ConstantScoreQuery constantScoreQuery = (ConstantScoreQuery) query;
            assertThat(constantScoreQuery.getQuery(), instanceOf(innerQuery.getClass()));
        }
    }

This test only worked because most queries on unmapped fields would create the same query as on a keyword field. But with this change, queries on unmapped fields get rewritten as a MatchNoneQueryBuilder. And when its inner query rewrites to a MatchNoneQueryBuilder, ConstantScoreQueryBuilder itself rewrites to a MatchNoneQueryBuilder. So I updated the logic this way:

    @Override
    protected void doAssertLuceneQuery(ConstantScoreQueryBuilder queryBuilder, Query query, QueryShardContext context) throws IOException {
        Query innerQuery = queryBuilder.innerQuery().rewrite(context).toQuery(context);
        if (innerQuery == null) {
            assertThat(query, nullValue());
        } else if (innerQuery instanceof MatchNoDocsQuery) {
            assertThat(query, instanceOf(MatchNoDocsQuery.class));
        } else {
            assertThat(query, instanceOf(ConstantScoreQuery.class));
            ConstantScoreQuery constantScoreQuery = (ConstantScoreQuery) query;
            assertThat(constantScoreQuery.getQuery(), instanceOf(innerQuery.getClass()));
        }
    }

But this fails if the inner query is wrapped in a BoostQuery. So I had to either change AbstractQueryBuilder to no longer wrap MatchNoDocsQuery with a boost, or change this test (and a couple other ones IIRC) to check whether the inner query might be wrapped inside a BoostQuery. I chose the former, which sounded simpler to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clear explanation!

@jpountz
Copy link
Contributor Author

jpountz commented Feb 24, 2020

@jtibshirani Thanks for your comments, I wondered whether I should do something like that and leaned towards not doing it to avoid introducing new abstractions, but I also agree that it reduces chances to not implement the contract. I'll implement your suggestion.

@jpountz jpountz merged commit 8830eb6 into elastic:master Feb 26, 2020
@jpountz jpountz deleted the enhancement/simplify_in_rewrite branch February 26, 2020 09:33
jpountz added a commit to jpountz/elasticsearch that referenced this pull request Feb 26, 2020
…ic#52486)

Since this change refactors rewrites, I also took it as an opportunity to adrress elastic#49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder.

This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed.

Closes elastic#49254
jpountz added a commit that referenced this pull request Feb 26, 2020
Generalize how queries on `_index` are handled at rewrite time (#52486)

Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder.

This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed.

Closes #49254
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rewrite queries on unmapped fields as MatchNoDocsQueryBuilder
5 participants