simple_query_string `-` operator not working with default_operator `OR` #4707

ksaritek · 2014-01-13T15:32:52Z

I tried to negates a keyword at query but it already comes in result

simple query is
"query": {
"simple_query_string": {
"query": ""This repository" -removed",
"fields": [
"content",
"headline"
]
}
},

imotov · 2014-01-13T21:41:42Z

Could you provide a complete reproduction that would help us to reproduce the issue? See http://www.elasticsearch.org/help/ for an example.

clintongormley · 2014-12-24T18:14:50Z

Recreation:

PUT /t/t/1
{
  "content": "This repository has been removed"
}


GET /t/_search
{
  "query": {
    "simple_query_string": {
      "query": "+this -removed",
      "fields": [
        "content",
        "headline"
      ]
    }
  }
}

The negated term is ignored, because with default_operator: "OR", it is optional.

dakrone · 2014-12-24T18:59:57Z

@clintongormley not sure how this is a bug though? This is like saying "contains the term this OR does not contain removed", which the document does (it contains the term "this")

clintongormley · 2014-12-29T10:19:50Z

I think that a user would expect foo bar -baz (with default operator OR) to match as if they had written: +(foo bar) -baz. Otherwise, a negated clause with default operator OR is meaningless.

dakrone · 2015-02-23T23:20:12Z

@clintongormley I'm still not sure that this is actually a bug, the negated clause with the default OR operation is not meaningless (it has meaning, it just means" anything that does not contain this"). I think what is desired is setting default_operator to AND, which is a user-changeable setting?

ppf2 · 2015-08-19T04:55:28Z

@dakrone @clintongormley Here is another use case from the field. Is this a bug or not currently supported (a feature)? Don't think default_operator:AND will help here since the use case is to return both documents in the example. thx

clintongormley · 2015-08-25T08:25:49Z

@ppf2 Actually, your example works as expected if you use the simple_query_string (you used the query_stringquery instead). As a query string query, you would need to change:

"(name:Android) OR NOT (status:approved)"

to:

"(name:Android) OR (NOT status:approved)"

The latter, when run through validate-query, shows the following:

"name:android (-status:approved +*:*)"

ppf2 · 2015-08-25T17:03:24Z

Ah got it, thx @clintongormley for the tip!

jdconrad · 2015-10-14T15:32:36Z

Taking a look.

jdconrad · 2015-10-14T16:54:37Z

This is working as expected as @dakrone has inferred. The SimpleQueryParser should be thought of as only using AND and OR as operators. There is no concept of SHOULD and MUST other than internally to create the AND and OR queries. So when doing the query "+this -removed" the AND (+) is actually ignored as it is not thought of as a MUST. Using SimpleQueryParser this will always be the case where the query ends up being documents that either have 'this' OR not 'removed' ... Also note, that while this will return all the documents, the not 'removed' still affects scoring so it's not meaningless. Going to leave this open for now for further discussion if necessary.

clintongormley · 2015-10-15T11:51:52Z

While it may be working as designed, I'd argue that the syntax is surprising to most users. For example:

POST t/t/_bulk
{"index":{"_id":1}}
{"foo":"one"}
{"index":{"_id":2}}
{"foo":"two"}
{"index":{"_id":3}}
{"foo":"one two"}
{"index":{"_id":4}}
{"foo":"three"}

I would expect the following:

"one": Return docs 1 & 3 (works)
"-two": Return docs 1 & 3 (works)
"one -two": Return doc 1 (returns 1, 3, & 4)
"one three -two": Return docs 1 & 4 (returns 1, 3 & 4)

To get what I want (ie "Give me docs with one or three, but exclude anything with two") I need to write it as "one three +-two". That is not at all intuitive. If I typed "windows -microsoft" into google, I wouldn't expect Google to return all of the documents on the internet which don't contain the word microsoft.

At the very least it should be well documented but, given that this query is intended to be exposed to users who will not read documentation, I would say that the syntax could be improved.

imotov · 2015-10-15T14:24:32Z

If I typed "windows -microsoft" into google, I wouldn't expect Google to return all of the documents on the internet which don't contain the word microsoft.

@clintongormley that's because google (as well as most other search engines in 21st century) is using AND instead of OR as a default operator, which should be the default behavior for elasticsearch as well IMO. Having OR as a default operator is causing all sort of confusion for many new users.

rmuir · 2015-10-15T14:32:44Z

Not really. if you query https://www.google.com/?gws_rd=ssl#q=elasticsearch+reference+query+dsl+oracle it gladly returns high ranking hits and just tells you: Missing: oracle

Switching to AND breaks many analysis chains such as n-grams. With a good ranking algo its also not necessary, its just that DefaultSimilarity is really weak here.

jdconrad · 2015-10-15T15:46:28Z

I agree that this syntax is ugly -- "one three +-two" ; however, I am reluctant to special case the not operator because right now you have one OR three OR NOT two which while may be unexpected is predictable, but if I change this it becomes one OR three AND NOT two which is no longer predictable because it ignores the default operator and it loses its consistency. It is also very difficult to predict proper sub queries outside of this simple case. Take for example "one -three two" -- is this one AND not three OR two? Do I need to reorder this? I think this would end up being more confusing because of the way operator precedence works in that it's always first come first serve.

imotov · 2015-10-15T15:51:13Z

What google does, is some weird "fuzzy" AND (or something like should with large minimum should match) search that google turns on a long tail queries with a large number of terms. But the basic behavior with 2-3 term queries resembles AND more than OR, would you agree?

n-grams is an advanced feature, I think if a user can figure out how to enable n-gram (or configure any other custom analysis chain) they should be able to figure out how to switch from AND to OR in the query.

Anyway, I shouldn't hijack this discussion. I apologize for that. Back to the original topic. I think that my expectation would be that foo bar +baz -qux should be translated into something like this:

{
   "bool": {
        "should" : [
            {
                "term" : { "_all" : "foo" }
            }, {
                "term" : { "_all" : "bar" }
            }
        ],
        "must" : {
            "term" : { "_all" : "baz" }
        },
        "must_not" : {
            "term" : { "_all" : "qux" }
        }
   }
}

jdconrad · 2015-10-15T16:00:30Z

I should explain further what happens right now, each time an AND is switched to an OR or vice versa a new boolean query branch is created. So if you have a b c +d +e f the tree ends up looking like

bq( should bq( should bq( should a should b should c ) must d must e ) should f)

so changing the not operator to always use must will have an inconsistent change in boolean query branches since operator precedence is always left to right.

We could change it to be something like @imotov suggests (maybe this should be a different parser altogether in Lucene?), but then you have should, must, and must not... if you're truly a basic user I think and/or is easier to understand than should/must/must not.

imotov · 2015-10-15T16:13:24Z

I should explain further what happens right now, each time an AND is switched to an OR or vice versa a new boolean query branch is created.

Yes, and this is where it breaks my expectation. To me order of elements in the query shouldn't make any difference because "+" and "-" feel like unary operators but they behave in strange ways.

jdconrad · 2015-10-15T20:08:26Z

@imotov What you're saying makes sense to me from the point of view of someone that regularly deals with search, but for someone less technical I think and/or make more sense. Honestly, the default to OR is a bit odd to me too because if someone, say my mother, types "dog food" into the google she expects it to be anded together there at least through decent scoring (as you and @rmuir mentioned earlier). I think making a new parser with the behavior of must/should/must not makes sense depending on what our target audience wants. SimpleQueryParser2 or something.

jdconrad · 2015-10-15T21:53:34Z

All right after a bit more thought and discussion, I've come to agree with everyone in this issue that this behavior is unexpected for everyone. I'll work on making a Lucene patch for the SimpleQueryParser using the behavior describe by @imotov and @rmuir where the structure will be a single bq per subquery.

clintongormley · 2016-11-06T11:18:51Z

@jdconrad did anything ever come of this? Did you open any issue in Lucene that we can track?

jdconrad · 2016-11-07T17:29:42Z

@clintongormley Sorry, I must've gotten distracted by other issues before I had anytime to address this. I'll have to take a bit of time to remember what we had discussed.

clintongormley · 2016-12-23T10:23:37Z

Let's document and close

dakrone · 2017-01-06T23:48:48Z

Okay, opened a PR to document this, and then it can be closed.

This can be confusing when unexpected. Resolves elastic#4707

This can be confusing when unexpected. Resolves #4707

peterdm · 2018-05-11T18:16:03Z

Just stumbled on this limitation myself. I'd like to echo @imotov 's suggestion from Oct 15, 2017. (I'll paste below).

(quote) I think that my expectation would be that foo bar +baz -qux should be translated into something like this:

{
   "bool": {
        "should" : [
            {
                "term" : { "_all" : "foo" }
            }, {
                "term" : { "_all" : "bar" }
            }
        ],
        "must" : {
            "term" : { "_all" : "baz" }
        },
        "must_not" : {
            "term" : { "_all" : "qux" }
        }
   }
}

The usecase is based on the previously-mentioned 'google expectation' (or really any major search engine at this point) that can be approximated with a default_operator: "OR" accompanied by a high minimum_should_match.

The appeal of simple_query_string is in the ability to meet the syntactical needs of users who are familiar with Google operators (without throwing parse exceptions).

I'm a little bummed that I have to roll my own parser to offer a commonly-accepted negation operator. It's not the end of the world, but adds friction to anyone looking for an otherwise extremely nice (almost turnkey) drop-in query which largely meets common syntax expectations.

@clintongormley, where is the right forum to re-open this? It looks like this was closed with a doc-comment b/c it belongs in Lucene's JIRA.

clintongormley changed the title ~~negates a single token (-) at simple_query_string seems not working properly~~ simple_query_string - operator not working with default_operator OR Dec 24, 2014

clintongormley added >bug :Query DSL v1.5.0 labels Dec 24, 2014

clintongormley assigned dakrone Dec 24, 2014

clintongormley removed the v1.5.0 label Dec 24, 2014

clintongormley assigned jdconrad and unassigned dakrone Oct 14, 2015

jdconrad mentioned this issue Oct 19, 2015

Bug with simple_query_string, minimum_should_match, and multiple fields. #13884

Closed

clintongormley mentioned this issue Jun 1, 2016

Simple Query String: NOT operator causes all documents in index to be returned #18603

Closed

clintongormley unassigned jdconrad Dec 23, 2016

dakrone mentioned this issue Jan 6, 2017

Document simple_query_string negation with default_operator of OR #22480

Merged

dakrone added a commit to dakrone/elasticsearch that referenced this issue Jan 10, 2017

Document simple_query_string negation with default_operator of OR

66cf3d3

This can be confusing when unexpected. Resolves elastic#4707

dakrone closed this as completed in #22480 Jan 10, 2017

dakrone added a commit that referenced this issue Jan 10, 2017

Document simple_query_string negation with default_operator of OR

0ef6166

This can be confusing when unexpected. Resolves #4707

dakrone added a commit that referenced this issue Jan 10, 2017

Document simple_query_string negation with default_operator of OR

958c3d7

This can be confusing when unexpected. Resolves #4707

dakrone added a commit that referenced this issue Jan 10, 2017

Document simple_query_string negation with default_operator of OR

019e3dd

This can be confusing when unexpected. Resolves #4707

dakrone added a commit that referenced this issue Jan 10, 2017

Document simple_query_string negation with default_operator of OR

1df0643

This can be confusing when unexpected. Resolves #4707

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simple_query_string `-` operator not working with default_operator `OR` #4707

simple_query_string `-` operator not working with default_operator `OR` #4707

ksaritek commented Jan 13, 2014

imotov commented Jan 13, 2014

clintongormley commented Dec 24, 2014

dakrone commented Dec 24, 2014

clintongormley commented Dec 29, 2014

dakrone commented Feb 23, 2015

ppf2 commented Aug 19, 2015

clintongormley commented Aug 25, 2015

ppf2 commented Aug 25, 2015

jdconrad commented Oct 14, 2015

jdconrad commented Oct 14, 2015

clintongormley commented Oct 15, 2015

imotov commented Oct 15, 2015

rmuir commented Oct 15, 2015

jdconrad commented Oct 15, 2015

imotov commented Oct 15, 2015

jdconrad commented Oct 15, 2015

imotov commented Oct 15, 2015

jdconrad commented Oct 15, 2015

jdconrad commented Oct 15, 2015

clintongormley commented Nov 6, 2016

jdconrad commented Nov 7, 2016

clintongormley commented Dec 23, 2016

dakrone commented Jan 6, 2017

peterdm commented May 11, 2018

simple_query_string - operator not working with default_operator OR #4707

simple_query_string - operator not working with default_operator OR #4707

Comments

ksaritek commented Jan 13, 2014

imotov commented Jan 13, 2014

clintongormley commented Dec 24, 2014

dakrone commented Dec 24, 2014

clintongormley commented Dec 29, 2014

dakrone commented Feb 23, 2015

ppf2 commented Aug 19, 2015

clintongormley commented Aug 25, 2015

ppf2 commented Aug 25, 2015

jdconrad commented Oct 14, 2015

jdconrad commented Oct 14, 2015

clintongormley commented Oct 15, 2015

imotov commented Oct 15, 2015

rmuir commented Oct 15, 2015

jdconrad commented Oct 15, 2015

imotov commented Oct 15, 2015

jdconrad commented Oct 15, 2015

imotov commented Oct 15, 2015

jdconrad commented Oct 15, 2015

jdconrad commented Oct 15, 2015

clintongormley commented Nov 6, 2016

jdconrad commented Nov 7, 2016

clintongormley commented Dec 23, 2016

dakrone commented Jan 6, 2017

peterdm commented May 11, 2018

simple_query_string `-` operator not working with default_operator `OR` #4707

simple_query_string `-` operator not working with default_operator `OR` #4707