Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransportAnalyzeAction causes IllegalArgumentException: NumericTokenStream does not support CharTermAttribute #2952

Closed
recoil opened this Issue Apr 30, 2013 · 8 comments

Comments

Projects
None yet
4 participants
@recoil
Copy link

recoil commented Apr 30, 2013

I'm getting the following Exception in Elastic 0.90.0 when I attempt to initiate an AnalyzeRequest against a numeric field:

Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: NumericTokenStream does not support CharTermAttribute.
        at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:285)
        at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:272)
        at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:113)
        at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:45)
        ... 38 more
Caused by: java.lang.IllegalArgumentException: NumericTokenStream does not support CharTermAttribute.
        at org.apache.lucene.analysis.NumericTokenStream$NumericAttributeFactory.createAttributeInstance(NumericTokenStream.java:136)
        at org.apache.lucene.util.AttributeSource.addAttribute(AttributeSource.java:271)
        at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:203)
        at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:57)
        at org.elasticsearch.action.support.single.custom.TransportSingleCustomOperationAction$AsyncSingleAction$2.run(TransportSingleCustomOperationAction.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        ... 1 more

I believe that this is probably caused by the following change introduced in Lucene 4:

NumericTokenStream now works directly on byte[] terms. If you plug a TokenFilter on top of this stream, you will likely get an IllegalArgumentException, because the NTS does not support TermAttribute/CharTermAttribute

(From http://lucene.apache.org/core/4_2_1/changes/Changes.html#4.0.0-alpha.changes_in_backwards_compatibility_policy)

Line 203 of TransportAnalyzeAction is attempting to add a CharTermAttribute to a TokenStream instance.

@ghost ghost assigned s1monw Apr 30, 2013

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Apr 30, 2013

hey, so I am on the fence here to call this a bug. I mean I agree this is not nice and we should return either a clear exception that running this on a numeric field is not supported or return the same stuff as 0.20. Are you doing this intentionally and if so what is the usecase?

s1monw added a commit that referenced this issue Apr 30, 2013

Fail will IAE if a numeric field is used for the anaysis endpoint.
Analysing a numeric field will return UTF-16 representations of
of Lucenes numeric prefix terms. Those terms are meaningless in general
unless used for lookups in the lucene index. Passing a numeric field
to the analysis action is most likely a bug.

Closes #2953 #2952
@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Apr 30, 2013

Closed via 773ea03

@s1monw s1monw closed this Apr 30, 2013

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented Apr 30, 2013

even if I closed this one I am still interested in what your usecase is. Please feel free to add it here! and thanks again for reporting this.

s1monw added a commit that referenced this issue May 6, 2013

Fail will IAE if a numeric field is used for the anaysis endpoint.
Analysing a numeric field will return UTF-16 representations of
of Lucenes numeric prefix terms. Those terms are meaningless in general
unless used for lookups in the lucene index. Passing a numeric field
to the analysis action is most likely a bug.

Closes #2953 #2952
@Schuk

This comment has been minimized.

Copy link

Schuk commented May 15, 2013

I ran into the same problem while testing the analyzer. I have a field called src_ip which is of type IP

curl -XGET 'http://localhost:9200/test/_analyze?field=src_ip&pretty' -d "192.168.0.1"
{
  "error" : "IllegalArgumentException[NumericTokenStream does not support CharTermAttribute.]",
  "status" : 500
}
@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented May 15, 2013

how would you want this field to be analyzed? I mean what do you expect as a return value here? I think what you get is correct, you can't analyze an IP here.

@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented May 16, 2013

In these cases, couldn't we just return the actual term that is stored in
the index?

On 15 May 2013 22:56, Simon Willnauer notifications@github.com wrote:

how would you want this field to be analyzed? I mean what do you expect as
a return value here? I think what you get is correct, you can't analyze an
IP here.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2952#issuecomment-17965619
.

@Schuk

This comment has been minimized.

Copy link

Schuk commented May 16, 2013

Yes I was expecting the term which is stored in the index.
I misunderstood the "token" for "term" in the analyze results. As an IP is stored as integer I had expected to see this integer.

@s1monw

This comment has been minimized.

Copy link
Contributor

s1monw commented May 16, 2013

the terms that are indexed here are opaque binary terms. they don't make much sense at all. I mean we can return just the number but it might be misleading. not sure if we should do that

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Fail will IAE if a numeric field is used for the anaysis endpoint.
Analysing a numeric field will return UTF-16 representations of
of Lucenes numeric prefix terms. Those terms are meaningless in general
unless used for lookups in the lucene index. Passing a numeric field
to the analysis action is most likely a bug.

Closes elastic#2953 elastic#2952
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.