New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Highlighters sometime highlight additional caracters #11726
Comments
The Lucene docs for the PatternReplaceCharFilter (https://lucene.apache.org/core/5_2_0/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceCharFilter.html) say:
(I know this isn't the one you're using, but the same advice probably applies to the mapping char filter) That said, the output from the
returns:
while this:
returns:
I would expect the |
This definitely looks like a bug, and I've created a simple pure Lucene test case showing it, but I'm not yet sure how to fix it; it could be the API for correcting offsets from CharFilter is too simplistic ... I'll open a Lucene issue for discussion. |
Thank you ! Le ven. 19 juin 2015 17:31, Michael McCandless notifications@github.com a
|
OK I opened https://issues.apache.org/jira/browse/LUCENE-6595 but I'm not sure how to fix it! |
I'm encountering a similar issue: I'm using a prefix-query on the field I'm highlighting.
So the highlight is shifted to the left. There seems to be some problem with finding the correct starting position.
|
@svola are you also using a char_filter? If not, then yours is likely a different issue with maybe the german decomposition not setting the right offsets for the tokens it creates? |
Is there any progress on the problem? |
It didn't change in ES 5. Same problem for me. |
The linked Lucene issue is still open, which is where this needs to be fixed. cc @elastic/es-search-aggs |
The linked Lucene issue is still open, which is where this needs to be fixed. |
Was there ever a fix found for this? Still seems to be broken 7 years later |
Verified on 1.5.2 and 1.6.0
When using a charfilter to remove some characters (in our case we want to ignore
(
and)
) the highlighters will highlight trailing removed caractersHere is a sample sense session to reproduce the issue.
the highlights for the last query look like
and
I would expect the closing paren not to be highlighted ...
The text was updated successfully, but these errors were encountered: