Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delegate wildcard query creation to MappedFieldType. #34062

Merged
merged 3 commits into from
Sep 26, 2018

Conversation

jtibshirani
Copy link
Contributor

@jtibshirani jtibshirani commented Sep 25, 2018

This is consistent with how we handle fuzzy, prefix, and regexp queries.

As part of this change, disallow wildcard queries on non-string fields, in addition to collation fields.

@jtibshirani jtibshirani added >bug :Search/Search Search-related issues that do not fall into other categories v7.0.0 >refactoring labels Sep 25, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @jtibshirani. I left some comments.

@@ -345,6 +348,15 @@ public Query prefixQuery(String value, @Nullable MultiTermQuery.RewriteMethod me
throw new QueryShardException(context, "Can only use prefix queries on keyword and text fields - not on [" + name + "] which is of type [" + typeName() + "]");
}

public Query wildcardQuery(String value, QueryShardContext context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same reasoning can apply for wildcard, prefix and regex queries so the default impl should throw a QueryShardException ? Only StringFieldType fields should be able to build a wildcard query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keyword field applies the normalizer on termQuery. Depending on the normalizer the wildcard and escaped characters could be removed/replaced so I wonder if we should apply the same logic than QueryParserBase#analyzeWildcard for keyword fields. This is out of scope for this pr but it made me realize that we might have a bug here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't go further and disallow wildcard queries for all non-keyword or text fields, as some other field types like _index explicitly support wildcard queries.

I missed this part sorry. I think we should explicitly add the support in the _index field type rather than supporting this query on all fields. Currently the support for prefix queries is also broken so we don't really use this ability.

Copy link
Contributor Author

@jtibshirani jtibshirani Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll make sure only string fields support wildcards by default. Maybe I'll add an upgrade note too in case this breaks some types we don't have test coverage for (will make it easier for users to debug + file issues)?

This is out of scope for this pr but it made me realize that we might have a bug here.

Makes sense, I'll make a note to follow-up on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through the non-string field types, what do you think should be done with metadata types like IdFieldType, IgnoredFieldType, and RoutingFieldType? My intuition is we should switch them to being string fields to avoid breaking any queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change would only break wildcard query on these fields, right ? +1 to make them string fields, prefix and regex query do not work currently because of this so it would be a bug fix. I am also ok to do that in a follow up, the changes in this pr have a different scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if (fieldType == null) {
term = new Term(fieldName, BytesRefs.toBytesRef(value));
Term term = new Term(fieldName, BytesRefs.toBytesRef(value));
query = new WildcardQuery(term);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if the field does not exist we could return a MatchNoDocsQuery ?

Copy link
Contributor Author

@jtibshirani jtibshirani Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one thing that has confused me in the past: if a field type doesn't exist, we typically still create a query of the same form (see TermsQueryBuilder#doToQuery amongst other examples).

In any case, maybe I could make this change in a follow-up PR, as I was just hoping for a straight refactor here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one thing that has confused me in the past: if a field type doesn't exist, we typically still create a query of the same form (see TermsQueryBuilder#doToQuery amongst other examples).

Yes we don't have a clear policy for this. The reason I prefer the MatchNoDocsQuery is that we can fold the reason in the message and if users check the Lucene query through the _validate API they can see that this field is not present in the mapping. If we build the same form there is no easy way for the user to understand why a specific query matches no document. Anyway we can discuss this in a follow up, no need to make that change in this pr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

For backwards compatibility, we maintain support on the `_index` field.
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jtibshirani , let's fix prefix and regex queries for metadata fields in a follow up.

@@ -345,6 +348,15 @@ public Query prefixQuery(String value, @Nullable MultiTermQuery.RewriteMethod me
throw new QueryShardException(context, "Can only use prefix queries on keyword and text fields - not on [" + name + "] which is of type [" + typeName() + "]");
}

public Query wildcardQuery(String value, QueryShardContext context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change would only break wildcard query on these fields, right ? +1 to make them string fields, prefix and regex query do not work currently because of this so it would be a bug fix. I am also ok to do that in a follow up, the changes in this pr have a different scope.

if (fieldType == null) {
term = new Term(fieldName, BytesRefs.toBytesRef(value));
Term term = new Term(fieldName, BytesRefs.toBytesRef(value));
query = new WildcardQuery(term);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one thing that has confused me in the past: if a field type doesn't exist, we typically still create a query of the same form (see TermsQueryBuilder#doToQuery amongst other examples).

Yes we don't have a clear policy for this. The reason I prefer the MatchNoDocsQuery is that we can fold the reason in the message and if users check the Lucene query through the _validate API they can see that this field is not present in the mapping. If we build the same form there is no easy way for the user to understand why a specific query matches no document. Anyway we can discuss this in a follow up, no need to make that change in this pr.

@jtibshirani jtibshirani changed the title Disallow wildcard queries on collation fields. Delegate wildcard query creation to MappedFieldType. Sep 26, 2018
@jtibshirani jtibshirani merged commit de8bfb9 into elastic:master Sep 26, 2018
@jtibshirani jtibshirani deleted the wildcard-query-builder branch September 26, 2018 16:36
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Sep 27, 2018
…fallback

* elastic/master:
  TEST: Add engine is closed as expected failure msg
  Adjust bwc version for max_seq_no_of_updates
  Build DocStats from SegmentInfos in ReadOnlyEngine (elastic#34079)
  When creating wildcard queries, use MatchNoDocsQuery when the field type doesn't exist. (elastic#34093)
  [DOCS] Moves graph to docs folder (elastic#33472)
  Mute MovAvgIT#testHoltWintersNotEnoughData
  Security: use default scroll keepalive (elastic#33639)
  Calculate changed roles on roles.yml reload (elastic#33525)
  Scripting: Reflect factory signatures in painless classloader (elastic#34088)
  XContentBuilder to handle BigInteger and BigDecimal (elastic#32888)
  Delegate wildcard query creation to MappedFieldType. (elastic#34062)
  Painless: Cleanup Cache (elastic#33963)
jtibshirani added a commit that referenced this pull request Sep 28, 2018
* Delegate wildcard query creation to MappedFieldType.
* Disallow wildcard queries on collation fields.
* Disallow wildcard queries on non-string fields.
kcm pushed a commit that referenced this pull request Oct 30, 2018
* Delegate wildcard query creation to MappedFieldType.
* Disallow wildcard queries on collation fields.
* Disallow wildcard queries on non-string fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug >refactoring :Search/Search Search-related issues that do not fall into other categories v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants