You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While running inference using XML-RoBERTa model: 02shanky/finetuned-twitter-xlm-roberta-base-emotion, if input contains certain mixed emojis and strings , such as: 馃槺馃槺馃槺馃槺馃槺馃槺this is weird, inference will fail.
The error message is: [array_index_out_of_bounds_exception Root causes: array_index_out_of_bounds_exception: Index 33 out of bounds for length 32]: Index 33 out of bounds for length 32
Steps to Reproduce
Import 02shanky/finetuned-twitter-xlm-roberta-base-emotion model from huggingface: eland_import_hub_model --url {es_url} -u {es_user} -p {es_password} --insecure --hub-model-id 02shanky/finetuned-twitter-xlm-roberta-base-emotion --task-type text_classification --start
Go to Machine Learning -> Model Management -> Trained Models: test the model using the text: "馃槺馃槺馃槺馃槺馃槺馃槺this is weird"
Observed
Error log
[2024-01-31T10:38:25,435][WARN ][r.suppressed ] [node-0] path: /_ml/trained_models/02shanky__finetuned-twitter-xlm-roberta-base-emotion/_infer, params: {model_id=02shanky__finetuned-twitter-xlm-roberta-base-emotion}, status: 500
java.lang.ArrayIndexOutOfBoundsException: Index 33 out of bounds for length 32
at org.elasticsearch.xpack.ml.inference.nlp.tokenizers.UnigramTokenizer.tokenize(UnigramTokenizer.java:283) ~[?:?]
at org.elasticsearch.xpack.ml.inference.nlp.tokenizers.UnigramTokenizer.incrementToken(UnigramTokenizer.java:223) ~[?:?]
at org.elasticsearch.xpack.ml.inference.nlp.tokenizers.XLMRobertaTokenizer.innerTokenize(XLMRobertaTokenizer.java:173) ~[?:?]
at org.elasticsearch.xpack.ml.inference.nlp.tokenizers.NlpTokenizer.tokenize(NlpTokenizer.java:60) ~[?:?]
at org.elasticsearch.xpack.ml.inference.nlp.tokenizers.XLMRobertaTokenizer.lambda$requestBuilder$0(XLMRobertaTokenizer.java:132) ~[?:?]
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273) ~[?:?]
at java.util.stream.IntPipeline$1$1.accept(IntPipeline.java:180) ~[?:?]
at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104) ~[?:?]
at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:712) ~[?:?]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[?:?]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[?:?]
at org.elasticsearch.xpack.ml.inference.nlp.tokenizers.XLMRobertaTokenizer.lambda$requestBuilder$1(XLMRobertaTokenizer.java:133) ~[?:?]
at org.elasticsearch.xpack.ml.inference.deployment.InferencePyTorchAction.doRun(InferencePyTorchAction.java:122) ~[?:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at org.elasticsearch.xpack.ml.inference.pytorch.PriorityProcessWorkerExecutorService$OrderedRunnable.run(PriorityProcessWorkerExecutorService.java:54) ~[?:?]
at org.elasticsearch.xpack.ml.job.process.AbstractProcessWorkerExecutorService.start(AbstractProcessWorkerExecutorService.java:111) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917) ~[elasticsearch-8.13.0-SNAPSHOT.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1583) ~[?:?]
The text was updated successfully, but these errors were encountered:
Elasticsearch Version
8.12.0
OS Version
Linux x86
Problem Description
While running inference using XML-RoBERTa model: 02shanky/finetuned-twitter-xlm-roberta-base-emotion, if input contains certain mixed emojis and strings , such as:
馃槺馃槺馃槺馃槺馃槺馃槺this is weird
, inference will fail.The error message is:
[array_index_out_of_bounds_exception Root causes: array_index_out_of_bounds_exception: Index 33 out of bounds for length 32]: Index 33 out of bounds for length 32
Steps to Reproduce
eland_import_hub_model --url {es_url} -u {es_user} -p {es_password} --insecure --hub-model-id 02shanky/finetuned-twitter-xlm-roberta-base-emotion --task-type text_classification --start
Observed
Error log
The text was updated successfully, but these errors were encountered: