-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analysis: protecting tokens based on their length #4877
Labels
Comments
Is there an expected date when this will be implemented? |
This should be done in Lucene - nice easy one to adopt |
+1 |
abeyad
pushed a commit
to abeyad/elasticsearch
that referenced
this issue
Mar 15, 2017
This commit adds support for the pattern keyword marker filter in Lucene. Previously, the keyword marker filter in Elasticsearch supported specifying a keywords set or a path to a set of keywords. This commit exposes the regular expression pattern based keyword marker filter also available in Lucene, so that any token matching the pattern specified by the `keywords_pattern` setting is excluded from being stemmed by any stemming filters. Closes elastic#4877
abeyad
pushed a commit
that referenced
this issue
Mar 28, 2017
This commit adds support for the pattern keyword marker filter in Lucene. Previously, the keyword marker filter in Elasticsearch supported specifying a keywords set or a path to a set of keywords. This commit exposes the regular expression pattern based keyword marker filter also available in Lucene, so that any token matching the pattern specified by the `keywords_pattern` setting is excluded from being stemmed by any stemming filters. Closes #4877
abeyad
pushed a commit
that referenced
this issue
Mar 28, 2017
This commit adds support for the pattern keyword marker filter in Lucene. Previously, the keyword marker filter in Elasticsearch supported specifying a keywords set or a path to a set of keywords. This commit exposes the regular expression pattern based keyword marker filter also available in Lucene, so that any token matching the pattern specified by the `keywords_pattern` setting is excluded from being stemmed by any stemming filters. Closes #4877
Excellent! And a much more widely useful solution than I had expected. Thanks! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I would like to be able to tell the keyword marker to protect tokens 1-4 characters in length, or tell the minimal english stemmer to ignore tokens shorter than 5 characters.
Perhaps the more generic thing to have would be a Minimum Length Keyword Marker that could go in front of the other filters.
Based on discussion at https://groups.google.com/forum/#!msg/elasticsearch/uFlKWq2HvQk/mM8KjaItPH0J
The text was updated successfully, but these errors were encountered: