Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop-words not removed during simple-query query-time analysis #28855

Closed
mohmad-null opened this issue Feb 28, 2018 · 3 comments

Comments

Projects
None yet
5 participants
@mohmad-null
Copy link

commented Feb 28, 2018

ES 6.2.1
I've noticed that the simple-query-string query type (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html) doesn't seem to handle stopwords at all at analysis time, best exemplified with "default_operator: AND".

Consider the below - We create an index and change the default analyzer to use English stopwords:

PUT /simp_idx
{
  "mappings": {
    "my_type": {
      "properties": {
  		"field_1": {
    			"type": "text"
    		}
      }
    }
  },
  	"settings": {
		"number_of_shards": 1,
		"number_of_replicas": 0,
		"analysis": {

			"filter": {
				"english_stop": {
					"type": "stop",
					"stopwords": "_english_"
				}
			},
			"analyzer": {
				"default": {
					"tokenizer": "standard",
					"filter": [
						"english_stop"
					]
				}
			}
		}
	}
}

And then populate it:

PUT /simp_idx/my_type/1
{
  "field_1": "place of beauty"
}
PUT /simp_idx/my_type/2
{
  "field_1": "place and beauty"
}

Now, if we query this with the regular query_string, we get the expected two results:

GET /simp_idx/my_type/_search
{
  "query": {
    "query_string" : {
        "query": "place of",
        "default_operator": "and"
    }
  }
}

But the same query using simple-query-string and the AND operator finds no results:

GET /simp_idx/my_type/_search
{
  "query": {
    "simple_query_string" : {
        "query": "place of",
			"fields": [ "field_1"],
        "default_operator": "and"
    }
  }
}

Remove the "of" from the query and it will work as expected.

Maybe this is intentional because the SQS is "simple", but it's not documented on the SQS page - the only explicitly stated difference is no exception raising. Seems like a bug though, hence the report.

@javanna

This comment has been minimized.

Copy link
Member

commented Mar 1, 2018

I can reproduce this, and I don't think this is on purpose. cc @elastic/es-search-aggs

@jimczi jimczi self-assigned this Mar 1, 2018

jimczi added a commit to jimczi/elasticsearch that referenced this issue Mar 1, 2018

Fix query_string and simple_query_string to ignore removed terms
This change ensures that we ignore terms removed from the analysis rather than returning a match_no_docs query for the part
that contain the stop word. For instance a query like "the AND fox" should ignore "the" if it is considered as a stop word instead of
adding a match_no_docs query.
This change also fixes the analysis of prefix terms that start with a stop word (e.g. `the*`). In such case if `analyze_wildcard` is true and `the`
is considered as a stop word this part of the query is rewritten into a match_no_docs query. Since it's a prefix query this change forces the prefix query
on `the` even if it is removed from the analysis.

Fixes elastic#28855
Fixes elastic#28856

@jimczi jimczi closed this in #28871 Mar 4, 2018

jimczi added a commit that referenced this issue Mar 4, 2018

Fix (simple)_query_string to ignore removed terms (#28871)
This change ensures that we ignore terms removed from the analysis rather than returning a match_no_docs query for the part
that contain the stop word. For instance a query like "the AND fox" should ignore "the" if it is considered as a stop word instead of
adding a match_no_docs query.
This change also fixes the analysis of prefix terms that start with a stop word (e.g. `the*`). In such case if `analyze_wildcard` is true and `the`
is considered as a stop word this part of the query is rewritten into a match_no_docs query. Since it's a prefix query this change forces the prefix query
on `the` even if it is removed from the analysis.

Fixes #28855
Fixes #28856

jimczi added a commit that referenced this issue Mar 4, 2018

Fix (simple)_query_string to ignore removed terms (#28871)
This change ensures that we ignore terms removed from the analysis rather than returning a match_no_docs query for the part
that contain the stop word. For instance a query like "the AND fox" should ignore "the" if it is considered as a stop word instead of
adding a match_no_docs query.
This change also fixes the analysis of prefix terms that start with a stop word (e.g. `the*`). In such case if `analyze_wildcard` is true and `the`
is considered as a stop word this part of the query is rewritten into a match_no_docs query. Since it's a prefix query this change forces the prefix query
on `the` even if it is removed from the analysis.

Fixes #28855
Fixes #28856

sebasjm pushed a commit to sebasjm/elasticsearch that referenced this issue Mar 10, 2018

Fix (simple)_query_string to ignore removed terms (elastic#28871)
This change ensures that we ignore terms removed from the analysis rather than returning a match_no_docs query for the part
that contain the stop word. For instance a query like "the AND fox" should ignore "the" if it is considered as a stop word instead of
adding a match_no_docs query.
This change also fixes the analysis of prefix terms that start with a stop word (e.g. `the*`). In such case if `analyze_wildcard` is true and `the`
is considered as a stop word this part of the query is rewritten into a match_no_docs query. Since it's a prefix query this change forces the prefix query
on `the` even if it is removed from the analysis.

Fixes elastic#28855
Fixes elastic#28856
@jesseamancio

This comment has been minimized.

Copy link

commented May 7, 2018

I am dealing with this problem and I intend to downgrade ES to a sound release.
Anyone knows which previous version of ES is free of this bug ?

@pluk

This comment has been minimized.

Copy link

commented Aug 21, 2018

ES 6.3.1
This bug is repeated if we have more than one field.
Stop words are not cut out in simple_query_string and finds no results.

Fields have the same type ("text")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.