New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simple_query_string -
operator not working with default_operator OR
#4707
Comments
Could you provide a complete reproduction that would help us to reproduce the issue? See http://www.elasticsearch.org/help/ for an example. |
-
operator not working with default_operator OR
Recreation:
The negated term is ignored, because with |
@clintongormley not sure how this is a bug though? This is like saying "contains the term this OR does not contain removed", which the document does (it contains the term "this") |
I think that a user would expect |
@clintongormley I'm still not sure that this is actually a bug, the negated clause with the default |
@dakrone @clintongormley Here is another use case from the field. Is this a bug or not currently supported (a feature)? Don't think default_operator:AND will help here since the use case is to return both documents in the example. thx |
@ppf2 Actually, your example works as expected if you use the
to:
The latter, when run through validate-query, shows the following:
|
Ah got it, thx @clintongormley for the tip! |
Taking a look. |
This is working as expected as @dakrone has inferred. The SimpleQueryParser should be thought of as only using AND and OR as operators. There is no concept of SHOULD and MUST other than internally to create the AND and OR queries. So when doing the query "+this -removed" the AND (+) is actually ignored as it is not thought of as a MUST. Using SimpleQueryParser this will always be the case where the query ends up being documents that either have 'this' OR not 'removed' ... Also note, that while this will return all the documents, the not 'removed' still affects scoring so it's not meaningless. Going to leave this open for now for further discussion if necessary. |
While it may be working as designed, I'd argue that the syntax is surprising to most users. For example:
I would expect the following:
To get what I want (ie "Give me docs with one or three, but exclude anything with two") I need to write it as At the very least it should be well documented but, given that this query is intended to be exposed to users who will not read documentation, I would say that the syntax could be improved. |
@clintongormley that's because google (as well as most other search engines in 21st century) is using |
Not really. if you query https://www.google.com/?gws_rd=ssl#q=elasticsearch+reference+query+dsl+oracle it gladly returns high ranking hits and just tells you: Missing: oracle Switching to AND breaks many analysis chains such as n-grams. With a good ranking algo its also not necessary, its just that DefaultSimilarity is really weak here. |
I agree that this syntax is ugly -- "one three +-two" ; however, I am reluctant to special case the not operator because right now you have one OR three OR NOT two which while may be unexpected is predictable, but if I change this it becomes one OR three AND NOT two which is no longer predictable because it ignores the default operator and it loses its consistency. It is also very difficult to predict proper sub queries outside of this simple case. Take for example "one -three two" -- is this one AND not three OR two? Do I need to reorder this? I think this would end up being more confusing because of the way operator precedence works in that it's always first come first serve. |
What google does, is some weird "fuzzy" AND (or something like n-grams is an advanced feature, I think if a user can figure out how to enable n-gram (or configure any other custom analysis chain) they should be able to figure out how to switch from AND to OR in the query. Anyway, I shouldn't hijack this discussion. I apologize for that. Back to the original topic. I think that my expectation would be that
|
I should explain further what happens right now, each time an AND is switched to an OR or vice versa a new boolean query branch is created. So if you have a b c +d +e f the tree ends up looking like bq( should bq( should bq( should a should b should c ) must d must e ) should f) so changing the not operator to always use must will have an inconsistent change in boolean query branches since operator precedence is always left to right. We could change it to be something like @imotov suggests (maybe this should be a different parser altogether in Lucene?), but then you have should, must, and must not... if you're truly a basic user I think and/or is easier to understand than should/must/must not. |
Yes, and this is where it breaks my expectation. To me order of elements in the query shouldn't make any difference because "+" and "-" feel like unary operators but they behave in strange ways. |
@imotov What you're saying makes sense to me from the point of view of someone that regularly deals with search, but for someone less technical I think and/or make more sense. Honestly, the default to OR is a bit odd to me too because if someone, say my mother, types "dog food" into the google she expects it to be anded together there at least through decent scoring (as you and @rmuir mentioned earlier). I think making a new parser with the behavior of must/should/must not makes sense depending on what our target audience wants. SimpleQueryParser2 or something. |
All right after a bit more thought and discussion, I've come to agree with everyone in this issue that this behavior is unexpected for everyone. I'll work on making a Lucene patch for the SimpleQueryParser using the behavior describe by @imotov and @rmuir where the structure will be a single bq per subquery. |
@jdconrad did anything ever come of this? Did you open any issue in Lucene that we can track? |
@clintongormley Sorry, I must've gotten distracted by other issues before I had anytime to address this. I'll have to take a bit of time to remember what we had discussed. |
Let's document and close |
Okay, opened a PR to document this, and then it can be closed. |
This can be confusing when unexpected. Resolves elastic#4707
This can be confusing when unexpected. Resolves #4707
This can be confusing when unexpected. Resolves #4707
This can be confusing when unexpected. Resolves #4707
This can be confusing when unexpected. Resolves #4707
Just stumbled on this limitation myself. I'd like to echo @imotov 's suggestion from Oct 15, 2017. (I'll paste below).
The usecase is based on the previously-mentioned 'google expectation' (or really any major search engine at this point) that can be approximated with a The appeal of I'm a little bummed that I have to roll my own parser to offer a commonly-accepted negation operator. It's not the end of the world, but adds friction to anyone looking for an otherwise extremely nice (almost turnkey) drop-in query which largely meets common syntax expectations. @clintongormley, where is the right forum to re-open this? It looks like this was closed with a doc-comment b/c it belongs in Lucene's JIRA. |
I tried to negates a keyword at query but it already comes in result
simple query is
"query": {
"simple_query_string": {
"query": ""This repository" -removed",
"fields": [
"content",
"headline"
]
}
},
The text was updated successfully, but these errors were encountered: