-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] "# => ALPHANUM" in word_delimiter filter's type_table can cause exception. #6867
Comments
In light of recent developments, particularly this commit, OpenSearch seems to be leaning towards supporting comments in user-supplied analyzer files (ref). However, the hashtag (#) remains an uncovered case from the previous commit. I'd like to reopen this discussion and present a couple of approaches and their implications: Discontinue support for comments in WordDelimiterFilter and WordDelimiterGraphFilter: Pros:
Cons:
Special Case Handling for "# =>" in WordDelimiterFilter and WordDelimiterGraphFilter: Pros:
Cons:
Note: The following is a preliminary solution I tested on my local setup, and it works seamlessly:
@nathanmyles, if possible, I'd appreciate your insights on this issue, especially in light of your earlier comments here. |
I personally like the suggestion to add special-case handling for |
Yeah, I can get behind having a special case for the hashtag mappings. I’m not familiar with the use case where we’d want to support comments in these mappings. My understanding is that it’s possible to load these configurations via a file, so we likely do want to maintain that support if true. The special case is probably best to maintain backwards compatibility. The complexity tradeoff doesn’t seem too bad to me. |
+1 to that, I think it should be possible to support comments and hashtags nicely |
+1 this helps move us forward with hashtags as a thing in modern society while still providing backwards compatibility |
Describe the bug
If
# => ALPHANUM
is the only entry in aword_delimiter
filter'stype_table
, analysis using that filter causes an exception. If the#
is replaced with\u0023
then it works as expected. The same behavior is seen with theword_delimiter_graph
filter.To Reproduce
OS 2.6.0 (and 2.5.0 where I originally noticed) produces:
Server log:
Notes:
# => ALPHANUM
is the only entry in thetype_table
. Adding a second entry causes the analysis to succeed, but it behaves as if the# => ALPHANUM
is not present. For example:Produces (incorrect):
Note the
text2
token should be#text2
. The ordering of the the twotype_table
entries does not matter.# => ALPHANUM
to\\u0023 => ALPHANUM
(0x23 is of course the character code for#
) causes the analysis to work as expected. This:Produces (correct):
Expected behavior
# => ALPHANUM
in aword_delimiter
orword_delimiter_graph
filter'stype_table
should treat#
as an alphanumeric character.Plugins
None.
Screenshots
N/A
Host/Environment (please complete the following information):
Official OS 2.6.0 Docker image run under Docker Desktop 4.17.0 (99724) on Mac OS 13.2.1 (Intel).
Additional context
None
The text was updated successfully, but these errors were encountered: