Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransportAnalyzeAction causes StringIndexOutOfBoundsException on first attempt to analyze a numeric field #2953

Closed
recoil opened this Issue Apr 30, 2013 · 0 comments

Comments

Projects
None yet
2 participants
@recoil
Copy link

commented Apr 30, 2013

I'm seeing the following in Elastic 0.90.0 during the first attempt attempt to initiate an AnalyzeRequest against a numeric field:

Caused by: java.util.concurrent.ExecutionException: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:285)
        at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:272)
        at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:113)
        at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:45)
        ... 38 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at java.lang.String.<init>(String.java:207)
        at org.elasticsearch.index.analysis.NumericTokenizer.reset(NumericTokenizer.java:59)
        at org.elasticsearch.index.analysis.NumericTokenizer.reset(NumericTokenizer.java:54)
        at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:202)
        at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:57)
        at org.elasticsearch.action.support.single.custom.TransportSingleCustomOperationAction$AsyncSingleAction$2.run(TransportSingleCustomOperationAction.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        ... 1 more

The cause of the problem appears to be that the reset() method of the NumericTokenizer is called twice the first time analyzer.tokenStream() is called for a the field in a thread:

  • The first call happens as a result of Lucene calling createComponents() on the NumericAnalyzer. This eventually results in the construction of a NumericTokenizer with a char[] buffer, which calls reset() during construction. This first reset() call leaves the FastStringReader associated with the Tokenizer with a next value that is equal to the length of that buffer because it reads all the chars out of the reader.
  • The second call happens as a result of the explicit call to reset() in TransportAnalyzeAction immediately after the TokenStream has been retrieved from the analyzer. Unfortunately calling the method a second time triggers the if (next >= length) check in the read() method of the associated FastStringReader to return -1. NumericTokenizer then tries to use -1 as the number of chars to use when constructing a String, which throws the StringIndexOutOfBoundsException above.

@ghost ghost assigned s1monw Apr 30, 2013

@s1monw s1monw closed this in 773ea03 Apr 30, 2013

s1monw added a commit that referenced this issue May 6, 2013

Fail will IAE if a numeric field is used for the anaysis endpoint.
Analysing a numeric field will return UTF-16 representations of
of Lucenes numeric prefix terms. Those terms are meaningless in general
unless used for lookups in the lucene index. Passing a numeric field
to the analysis action is most likely a bug.

Closes #2953 #2952

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Fail will IAE if a numeric field is used for the anaysis endpoint.
Analysing a numeric field will return UTF-16 representations of
of Lucenes numeric prefix terms. Those terms are meaningless in general
unless used for lookups in the lucene index. Passing a numeric field
to the analysis action is most likely a bug.

Closes elastic#2953 elastic#2952
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.