Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query parsers to throw exception when multiple field names are provided #19791

Merged
merged 17 commits into from
Aug 5, 2016

Conversation

javanna
Copy link
Member

@javanna javanna commented Aug 3, 2016

Most of our queries support a single field name, the field that gets queried. That is the key of the object which contains the query options. For such queries, in case multiple fields are presented, in most of the cases the query goes through and either the last or the first field only will be read and queried.

This PR changes the behaviour of all those parsers to throw exception in case multiple field names are provided.

Closes #19547

}
}

if (text == null) {
Copy link
Member Author

@javanna javanna Aug 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed these checks at the end of parsing as they are not needed. The query builder throws error anyways when required fields are missing as we validate its input and that is also what we test already (testIllegalArguments test method).

@jpountz
Copy link
Contributor

jpountz commented Aug 5, 2016

I left some comments, this looks great!

throw new ParsingException(parser.getTokenLocation(), "[match] query doesn't support multiple fields, found ["
+ fieldName + "] and [" + currentFieldName + "]");
}
fieldName = currentFieldName;
Copy link
Member

@tlrx tlrx Aug 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match query is tricky because it can be written as:

{
  "match" :  {
    "message1" : {
       "query" : "this is a test"
    }
  }
}

but also:

{
  "match" : {
    "message1" : "this is a test"
  }
}

and in case of multiple fields we just catch the first form, not the second which is very used too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many queries actually have a short syntax like the match one. I'm afraid what you bring up is a problem that's common to them all. I will dig.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know, sorry for that :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this situation is already at least partially caught by making QueryParseContext#parseInnerQueryBuilder stricter as in checking what is the current token after the query gets parsed. I have ideas on how to test it too but I'd prefer to do this in a followup PR if you don' mind. We already test these short syntaxes but we never inject bogus objects in them like we do with the "standard" json output, we can totally introduce that for more coverage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

We already test these short syntaxes but we never inject bogus objects in them like we do with the "standard" json output, we can totally introduce that for more coverage.

Since query builders now implements ToXContent I think we could introduce bogus object using a XContentBuilder that randomly duplicates fields. Just an idea.

@tlrx
Copy link
Member

tlrx commented Aug 5, 2016

It's a nice change, thanks for doing that. I left comments concerning the queries that can be written differently like match* queries. I don't know if we can handle every format here but that would be great.

Range Query, like many other queries, used to parse when the query refers to multiple fields and the last one would win. We rather throw an exception now instead.

Closes elastic#19547
Prefix Query, like many other queries, used to parse when the query refers to multiple fields and the last one would win. We rather throw an exception now instead.
Also added tests for short prefix quer variant.
Regexp Query, like many other queries, used to parse even when the query referred to multiple fields and the last one would win. We rather throw an exception now instead.
Also added test for short prefix query variant.
Wildcard Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead.
Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Match phrase Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead.
Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Geo distance Query, like many other queries, used to parse even when the query referred to multiple fields and the last one would win. We rather throw an exception now instead.
…elds

Match phrase prefix Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead.
Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Match Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead.
Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Common Terms Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead.
Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Span term Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead.
Also modified the parsing code to consume the whole query object.
Fuzzy Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead.
Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
…BJECT token

Instead of being lenient in QueryParseContext#parseInnerQueryBuilder we check that the token where the parser stopped reading was END_OBJECT, and throw error otherwise. This is a best effort to verify that the parsers read a whole object rather than stepping out in the middle of it due to malformed queries.
@javanna javanna force-pushed the fix/multiple_fields_queries branch from a4c5df3 to 7f0bd56 Compare August 5, 2016 11:55
@javanna javanna merged commit 4c1a3b9 into elastic:master Aug 5, 2016
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v5.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants