New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Highlighting sometimes returns too much text (orders of magnitude larger than fragment_size) #9442
Comments
I second this issue, as I am randomly getting up to 1100 characters per fragment, as well (fragment_size set to 300). Does anyone have any ideas? |
I got same issue, sometimes got 1000+ chars, although I set fragment_size to 200. |
I second this too. This is really a problem since this occurs unpredictably except that it seems to occur only with search terms consisting of two or more words. |
Good point, @monsieur-d - @nik9000 can we get this marked as a bug like issue #12648 while you are working on highlighting issues? I would be happy to provide a curl sample! |
I think I got it: |
And this shows that its actually just a bug in the plain highlighter: Our old, cranky friend the fvh does the right thing here. |
Very good. I'm looking forward to a fix :-) |
Out of curiosity, are there any further updates on this issue? |
We're also interested in resolving this issue |
I had the same issue and was not able to work around.
So basically |
You have to escape the The issue you linked looks like it was fixed for 2.4.0 and 5.0.0. Not that this isn't still a problem, but it isn't one I've looked at for a long, long time. |
This appears to be fixed in 5.0 |
This doesn't seem to be fixed (using 5.5.0). I had to specify |
Same issue in 5.4.1. Fixed it by specifying the type explicitly: |
Specifying |
how did you fix your problem??? |
"I had to set number_of_fragments: 0 to get all the text, then implement my own highlight-aware truncating in code." You need to write application code to do the truncation. It's a terrible workaround because you'll be getting far more text over the network than you need. The code will depend on the language you're using but the algorithm is effectively:
|
Index mapping :
Document inserted :
https://gist.github.com/scharron/684a4fbab85135c203ee
Query :
I would expect to get about 200 characters in 2 fragments, but instead I get 12KB of data in only one fragment.
I also tried the
postings
and thefast vector
highlighters with the same result.The behaviour is the same for both an ES 1.4.2 and ES 1.3.4.
It may be related to other curious behaviours : http://stackoverflow.com/questions/28167990/curious-behaviour-of-fragment-size-in-elasticsearch-highlighting?noredirect=1
The text was updated successfully, but these errors were encountered: