-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop-words not removed during simple-query query-time analysis #28855
Labels
>bug
:Search Relevance/Analysis
How text is split into tokens
Team:Search Relevance
Meta label for the Search Relevance team in Elasticsearch
Comments
I can reproduce this, and I don't think this is on purpose. cc @elastic/es-search-aggs |
jimczi
added a commit
to jimczi/elasticsearch
that referenced
this issue
Mar 1, 2018
This change ensures that we ignore terms removed from the analysis rather than returning a match_no_docs query for the part that contain the stop word. For instance a query like "the AND fox" should ignore "the" if it is considered as a stop word instead of adding a match_no_docs query. This change also fixes the analysis of prefix terms that start with a stop word (e.g. `the*`). In such case if `analyze_wildcard` is true and `the` is considered as a stop word this part of the query is rewritten into a match_no_docs query. Since it's a prefix query this change forces the prefix query on `the` even if it is removed from the analysis. Fixes elastic#28855 Fixes elastic#28856
jimczi
added a commit
that referenced
this issue
Mar 4, 2018
This change ensures that we ignore terms removed from the analysis rather than returning a match_no_docs query for the part that contain the stop word. For instance a query like "the AND fox" should ignore "the" if it is considered as a stop word instead of adding a match_no_docs query. This change also fixes the analysis of prefix terms that start with a stop word (e.g. `the*`). In such case if `analyze_wildcard` is true and `the` is considered as a stop word this part of the query is rewritten into a match_no_docs query. Since it's a prefix query this change forces the prefix query on `the` even if it is removed from the analysis. Fixes #28855 Fixes #28856
jimczi
added a commit
that referenced
this issue
Mar 4, 2018
This change ensures that we ignore terms removed from the analysis rather than returning a match_no_docs query for the part that contain the stop word. For instance a query like "the AND fox" should ignore "the" if it is considered as a stop word instead of adding a match_no_docs query. This change also fixes the analysis of prefix terms that start with a stop word (e.g. `the*`). In such case if `analyze_wildcard` is true and `the` is considered as a stop word this part of the query is rewritten into a match_no_docs query. Since it's a prefix query this change forces the prefix query on `the` even if it is removed from the analysis. Fixes #28855 Fixes #28856
sebasjm
pushed a commit
to sebasjm/elasticsearch
that referenced
this issue
Mar 10, 2018
This change ensures that we ignore terms removed from the analysis rather than returning a match_no_docs query for the part that contain the stop word. For instance a query like "the AND fox" should ignore "the" if it is considered as a stop word instead of adding a match_no_docs query. This change also fixes the analysis of prefix terms that start with a stop word (e.g. `the*`). In such case if `analyze_wildcard` is true and `the` is considered as a stop word this part of the query is rewritten into a match_no_docs query. Since it's a prefix query this change forces the prefix query on `the` even if it is removed from the analysis. Fixes elastic#28855 Fixes elastic#28856
I am dealing with this problem and I intend to downgrade ES to a sound release. |
ES 6.3.1 Fields have the same type ("text") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
>bug
:Search Relevance/Analysis
How text is split into tokens
Team:Search Relevance
Meta label for the Search Relevance team in Elasticsearch
ES 6.2.1
I've noticed that the simple-query-string query type (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html) doesn't seem to handle stopwords at all at analysis time, best exemplified with "default_operator: AND".
Consider the below - We create an index and change the default analyzer to use English stopwords:
And then populate it:
Now, if we query this with the regular query_string, we get the expected two results:
But the same query using simple-query-string and the AND operator finds no results:
Remove the "of" from the query and it will work as expected.
Maybe this is intentional because the SQS is "simple", but it's not documented on the SQS page - the only explicitly stated difference is no exception raising. Seems like a bug though, hence the report.
The text was updated successfully, but these errors were encountered: