New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringIndexOutOfBoundsException[String index out of range: -8] while Highlighting #2931

Closed
lmenezes opened this Issue Apr 24, 2013 · 10 comments

Comments

Projects
None yet
2 participants
@lmenezes
Contributor

lmenezes commented Apr 24, 2013

This issue happens on 0.20.4 and 0.90RC2 and probably every other version(?) since it I guess its related to:

https://issues.apache.org/jira/browse/LUCENE-4899

Here is a test that manages to reproduce the error. First 2 queries should execute ok, but third should fail.

curl -XPOST 'http://127.0.0.1:9200/test?' -d '{ "mappings" : { "test" : { "properties" : { "name" : { "type": "string", "index_analyzer": "name_index_analyzer", "search_analyzer": "name_search_analyzer", "term_vector" : "with_positions_offsets" } } } }, "settings" : { "analysis" : { "filter" : { "my_ngram" : { "max_gram" : 20, "min_gram" : 1, "type" : "ngram" } }, "analyzer" : { "name_index_analyzer": { "tokenizer": "whitespace", "filter": [ "my_ngram" ] }, "name_search_analyzer": { "tokenizer": "whitespace" } } } }}'

curl -XPUT 'http://localhost:9200/test/test/1' -d '{"name": "logicacmg ehemals avinci - the know how company"}'

curl -XGET 'http://localhost:9200/test/test/_search' -d '{ "query": { "match": { "name": { "query": "logica" } } }, "highlight": { "fields": { "name": {} } }}'

curl -XGET 'http://localhost:9200/test/test/_search' -d '{ "query": { "match": { "name": { "query": "logica ma" } } }, "highlight": { "fields": { "name": {} } }}'

curl -XGET 'http://localhost:9200/test/test/_search' -d '{ "query": { "match": { "name": { "query": "logica m" } } }, "highlight": { "fields": { "name": {} } }}'

Maybe its possible a work around, or a Lucene upgrade to 4.3(since it seems to be fixed there)?

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Apr 24, 2013

Contributor

@lmenezes yes this is very likely caused by LUCENE-4899 - The release of 4.3 is already rolling and should be done by the end of the week. I will take your example here and put it into the 4.3 upgrade branch and see if it still fails to make sure this is actually fixed.

Contributor

s1monw commented Apr 24, 2013

@lmenezes yes this is very likely caused by LUCENE-4899 - The release of 4.3 is already rolling and should be done by the end of the week. I will take your example here and put it into the 4.3 upgrade branch and see if it still fails to make sure this is actually fixed.

@ghost ghost assigned s1monw Apr 24, 2013

@lmenezes

This comment has been minimized.

Show comment
Hide comment
@lmenezes

lmenezes Apr 24, 2013

Contributor

@s1monw Cool. Regarding ES updating to 4.3, is it realistic expecting that for 0.90? Or even just a patch for this particular issue?

Contributor

lmenezes commented Apr 24, 2013

@s1monw Cool. Regarding ES updating to 4.3, is it realistic expecting that for 0.90? Or even just a patch for this particular issue?

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Apr 24, 2013

Contributor

given that 0.90 is pretty close I can't promise anything but we are considering it. I don't think I can really patch this issue without copying a lot of code. The only thing you can do as a workaround is to not use the term vector highlighter until then.

Contributor

s1monw commented Apr 24, 2013

given that 0.90 is pretty close I can't promise anything but we are considering it. I don't think I can really patch this issue without copying a lot of code. The only thing you can do as a workaround is to not use the term vector highlighter until then.

@lmenezes

This comment has been minimized.

Show comment
Hide comment
@lmenezes

lmenezes Apr 24, 2013

Contributor

Hum... I tried using the regular highlighting, but that yields some pretty weird stuff:

"logicacmlogicag ehemals avinci - the know how company"

Test:

curl -XPOST 'http://127.0.0.1:9200/test?' -d '{ "mappings" : { "test" : { "properties" : { "name" : { "type": "string", "index_analyzer": "name_index_analyzer", "search_analyzer": "name_search_analyzer"} } } }, "settings" : { "analysis" : { "filter" : { "my_ngram" : { "max_gram" : 20, "min_gram" : 1, "type" : "ngram" } }, "analyzer" : { "name_index_analyzer": { "tokenizer": "whitespace", "filter": [ "my_ngram" ] }, "name_search_analyzer": { "tokenizer": "whitespace" } } } }}'

curl -XPUT 'http://localhost:9200/test/test/1' -d '{"name": "logicacmg ehemals avinci - the know how company"}'

curl -XGET 'http://localhost:9200/test/test/_search' -d '{ "query": { "match": { "name": { "query": "logica m" } } }, "highlight": { "fields": { "name": {} } }}'

Doesn't really works for our case.

Contributor

lmenezes commented Apr 24, 2013

Hum... I tried using the regular highlighting, but that yields some pretty weird stuff:

"logicacmlogicag ehemals avinci - the know how company"

Test:

curl -XPOST 'http://127.0.0.1:9200/test?' -d '{ "mappings" : { "test" : { "properties" : { "name" : { "type": "string", "index_analyzer": "name_index_analyzer", "search_analyzer": "name_search_analyzer"} } } }, "settings" : { "analysis" : { "filter" : { "my_ngram" : { "max_gram" : 20, "min_gram" : 1, "type" : "ngram" } }, "analyzer" : { "name_index_analyzer": { "tokenizer": "whitespace", "filter": [ "my_ngram" ] }, "name_search_analyzer": { "tokenizer": "whitespace" } } } }}'

curl -XPUT 'http://localhost:9200/test/test/1' -d '{"name": "logicacmg ehemals avinci - the know how company"}'

curl -XGET 'http://localhost:9200/test/test/_search' -d '{ "query": { "match": { "name": { "query": "logica m" } } }, "highlight": { "fields": { "name": {} } }}'

Doesn't really works for our case.

@lmenezes

This comment has been minimized.

Show comment
Hide comment
@lmenezes

lmenezes Apr 24, 2013

Contributor

@s1monw the highlighted stuff wasn't formatted as it should, but I guess you get the idea.

Contributor

lmenezes commented Apr 24, 2013

@s1monw the highlighted stuff wasn't formatted as it should, but I guess you get the idea.

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Apr 24, 2013

Contributor

@lmenezes I added the test and it passes, would be good if you can take a look if the result is as you expect it?

Contributor

s1monw commented Apr 24, 2013

@lmenezes I added the test and it passes, would be good if you can take a look if the result is as you expect it?

@lmenezes

This comment has been minimized.

Show comment
Hide comment
@lmenezes

lmenezes Apr 24, 2013

Contributor

@s1monw i think you just added tests for the 2 cases that already work on Lucene pre 4.3. Your are missing the 3rd query, the one that fails. I tried executing the third query and I got the same results on your branch(the weird HL and also the OutOfBounds).

  • as a plus, this branch didn't work on my os x, only on ubuntu. The shards were constantly in "initializing", so i wasn't able to run that on osx. any idea?
Contributor

lmenezes commented Apr 24, 2013

@s1monw i think you just added tests for the 2 cases that already work on Lucene pre 4.3. Your are missing the 3rd query, the one that fails. I tried executing the third query and I got the same results on your branch(the weird HL and also the OutOfBounds).

  • as a plus, this branch didn't work on my os x, only on ubuntu. The shards were constantly in "initializing", so i wasn't able to run that on osx. any idea?
@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Apr 24, 2013

Contributor

hey @lmenezes so the issue why this doesn't work / fails is that the ngram filter you are using is basically broken and produces somehow wrong positions. I am working on a fix for this and I will update you accordingly.

regarding mac osx, I am running on osx just fine... did you try mvn clean first?

Contributor

s1monw commented Apr 24, 2013

hey @lmenezes so the issue why this doesn't work / fails is that the ngram filter you are using is basically broken and produces somehow wrong positions. I am working on a fix for this and I will update you accordingly.

regarding mac osx, I am running on osx just fine... did you try mvn clean first?

@lmenezes

This comment has been minimized.

Show comment
Hide comment
@lmenezes

lmenezes Apr 24, 2013

Contributor

hey @s1monw cool!

about the mac osx, i'll give it a go later today and let you know. it was a fresh clone from github, so clean shouldnt be necessary i believe. anyway, i'll keep you posted just in case.

Contributor

lmenezes commented Apr 24, 2013

hey @s1monw cool!

about the mac osx, i'll give it a go later today and let you know. it was a fresh clone from github, so clean shouldnt be necessary i believe. anyway, i'll keep you posted just in case.

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Apr 24, 2013

Contributor

I opened LUCENE-4955 for this since this is really caused by a bug / problem in the NGramFilter. This won't make it into lucene 4.3 but we can temporarily port that once it's committed in lucene.

Contributor

s1monw commented Apr 24, 2013

I opened LUCENE-4955 for this since this is really caused by a bug / problem in the NGramFilter. This won't make it into lucene 4.3 but we can temporarily port that once it's committed in lucene.

@s1monw s1monw closed this in bd7ff69 Apr 27, 2013

martijnvg pushed a commit that referenced this issue Apr 25, 2018

Maintain order of operations semantics on follower
A following engine even for a primary shard needs to maintain order of
operations semantics as if it were behaving like a replica. That is,
rather than assuming that the order of operations presented to the
engine is the de facto order of operations as is the case for a leader
engine for a primary shard, a following engine must behave like all
replicas behave which is that they resolve order of operations based on
sequence numbers. This commit causes this to be the case for following
engines.

Relates #2931

jasontedor added a commit to jasontedor/elasticsearch that referenced this issue May 11, 2018

Maintain order of operations semantics on follower
A following engine even for a primary shard needs to maintain order of
operations semantics as if it were behaving like a replica. That is,
rather than assuming that the order of operations presented to the
engine is the de facto order of operations as is the case for a leader
engine for a primary shard, a following engine must behave like all
replicas behave which is that they resolve order of operations based on
sequence numbers. This commit causes this to be the case for following
engines.

Relates elastic#2931
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment