Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String#scan raises java.lang.ArrayIndexOutOfBoundsException with multi-byte characters #5513

Closed
pocke opened this Issue Dec 13, 2018 · 10 comments

Comments

Projects
None yet
3 participants
@pocke
Copy link

pocke commented Dec 13, 2018

Problem

ruby -e '"aaaaaaaaaa".scan("あああ")' raises an error in JRuby.

Environment

JRuby version

I can reproduce this error with JRuby 9.0.1.0, 9.2.5.0 and HEAD of master branch (d03c357).

$ ruby -v
jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 OpenJDK 64-Bit Server VM 25.192-b26 on 1.8.0_192-b26 +jit [linux-amd64]

$ ruby -v
jruby 9.2.5.0 (2.5.0) 2018-12-06 6d5a228 OpenJDK 64-Bit Server VM 25.192-b26 on 1.8.0_192-b26 +jit [linux-x86_64]

$ bin/jruby -v
jruby 9.2.6.0-SNAPSHOT (2.5.3) 2018-12-13 d03c357 OpenJDK 64-Bit Server VM 25.192-b26 on 1.8.0_192-b26 +jit [linux-x86_64]

I cannot reproduce with JRuby 1.7.27.
I also tried it with JRuby 9.0.0.0, but installing it was failed, so I'm not sure v9.0.0.0 has this error.

Operating system

Arch Linux

$ uname -a
Linux jigglypuff 4.19.8-arch1-1-ARCH #1 SMP PREEMPT Sat Dec 8 13:49:11 UTC 2018 x86_64 GNU/Linux

Expected Behavior

Do not raise any errors.

Actual Behavior

It raises java.lang.ArrayIndexOutOfBoundsException.

$ ruby -e 'p "aaaaaaaaaa".scan("あああ")'
Unhandled Java exception: java.lang.ArrayIndexOutOfBoundsException: -1342547898
java.lang.ArrayIndexOutOfBoundsException: -1342547898
  rb_memsearch_qs_utf8 at org/jruby/util/StringSupport.java:2503
             memsearch at org/jruby/util/StringSupport.java:2048
           strseqIndex at org/jruby/RubyString.java:3275
         patternSearch at org/jruby/RubyString.java:4402
              scanOnce at org/jruby/RubyString.java:4362
                  scan at org/jruby/RubyString.java:4330
                  call at org/jruby/RubyString$INVOKER$i$1$0$scan.gen:-1
                  call at org/jruby/internal/runtime/methods/JavaMethod.java:399
          cacheAndCall at org/jruby/runtime/callsite/CachingCallSite.java:346
                  call at org/jruby/runtime/callsite/CachingCallSite.java:172
     invokeOther2:scan at -e:1
                <main> at -e:1
   invokeWithArguments at java/lang/invoke/MethodHandle.java:627
                  load at org/jruby/ir/Compiler.java:94
             runScript at org/jruby/Ruby.java:849
           runNormally at org/jruby/Ruby.java:772
           runNormally at org/jruby/Ruby.java:790
           runFromMain at org/jruby/Ruby.java:602
         doRunFromMain at org/jruby/Main.java:415
           internalRun at org/jruby/Main.java:307
                   run at org/jruby/Main.java:234
                  main at org/jruby/Main.java:206

Note

  • If the receiver is shorter than the example, it works.
    • e.g. ruby -e '"aaaa".scan("あああ")' does not raise error.
  • If the argument string size is not 3, it does not raise error.
    • e.g. ruby -e '"aaaaaaaaaa".scan("ああ")' and ruby -e '"aaaaaaaaaa".scan("ああああ")' do not raise error.

An example in the real world

I occurred this error in natto gem's test. https://github.com/buruzaemon/natto
The test cases are failed in JRuby 9.2.5.0.

# In natto gem directory
$ bundle exec rake
/home/pocke/.rbenv/versions/jruby-9.2.5.0/bin/jruby  test/test_natto.rb 
Run options: --seed 45048

# Running:

[INFO] setup: could not delete test.dic, you might want to remove manually.
reading /home/pocke/ghq/github.com/buruzaemon/natto/test/natto/test_userdic.csv ... 1
emitting double-array: 100% |###########################################| 

done!
.[INFO] setup: could not delete test.dic, you might want to remove manually.
reading /home/pocke/ghq/github.com/buruzaemon/natto/test/natto/test_userdic.csv ... 1
emitting double-array: 100% |###########################################| 

done!
.[INFO] setup: could not delete test.dic, you might want to remove manually.
reading /home/pocke/ghq/github.com/buruzaemon/natto/test/natto/test_userdic.csv ... 1
emitting double-array: 100% |###########################################| 

done!
.[INFO] setup: could not delete test.dic, you might want to remove manually.
reading /home/pocke/ghq/github.com/buruzaemon/natto/test/natto/test_userdic.csv ... 1
emitting double-array: 100% |###########################################| 

done!
.......:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
........E...E.......:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
..............

Finished in 1.511805s, 29.1043 runs/s, 398.8610 assertions/s.

  1) Error:
TestMeCab#test_parse_tostr_feature_constraints:
Natto::MeCabError: 
    /home/pocke/ghq/github.com/buruzaemon/natto/lib/natto/natto.rb:339:in `block in initialize'
    /home/pocke/ghq/github.com/buruzaemon/natto/lib/natto/natto.rb:479:in `parse'
    /home/pocke/ghq/github.com/buruzaemon/natto/test/natto/tc_mecab.rb:476:in `test_parse_tostr_feature_constraints'

  2) Error:
TestMeCab#test_parse_tonodes_feature_constraints:
Natto::MeCabError: 
    /home/pocke/ghq/github.com/buruzaemon/natto/lib/natto/natto.rb:420:in `block in initialize'
    org/jruby/RubyGenerator.java:102:in `each'
    org/jruby/RubyEnumerator.java:326:in `each'
    org/jruby/RubyEnumerator.java:332:in `each'
    /home/pocke/ghq/github.com/buruzaemon/natto/test/natto/tc_mecab.rb:645:in `test_parse_tonodes_feature_constraints'

44 runs, 603 assertions, 0 failures, 2 errors, 0 skips
rake aborted!
Command failed with status (1): [/home/pocke/.rbenv/versions/jruby-9.2.5.0/...]
/home/pocke/ghq/github.com/buruzaemon/natto/Rakefile:11:in `block in <main>'
/home/pocke/.rbenv/versions/jruby-9.2.5.0/bin/bundle:23:in `<main>'
Tasks: TOP => default => test
(See full trace by running task with --trace)

It calls ruby -e '"心の中で3回唱え、 ヒーロー見参!ヒーロー見参!ヒーロー見参!".scan("ヒーロー見参")' in the test, and it's failed.

@headius

This comment has been minimized.

Copy link
Member

headius commented Dec 13, 2018

Thank you for the report!

@headius headius added this to the JRuby 9.2.6.0 milestone Dec 13, 2018

@headius

This comment has been minimized.

Copy link
Member

headius commented Dec 13, 2018

Confirmed on master.

[] ~/projects/jruby $ ruby -e '"aaaaaaaaaa".scan("あああ")'
Unhandled Java exception: java.lang.ArrayIndexOutOfBoundsException: -1342547898
java.lang.ArrayIndexOutOfBoundsException: -1342547898
  rb_memsearch_qs_utf8 at org/jruby/util/StringSupport.java:2505
             memsearch at org/jruby/util/StringSupport.java:2050
           strseqIndex at org/jruby/RubyString.java:3275
         patternSearch at org/jruby/RubyString.java:4402
              scanOnce at org/jruby/RubyString.java:4362
                  scan at org/jruby/RubyString.java:4330
                  call at org/jruby/RubyString$INVOKER$i$1$0$scan.gen:-1
                  call at org/jruby/internal/runtime/methods/JavaMethod.java:399
          cacheAndCall at org/jruby/runtime/callsite/CachingCallSite.java:346
                  call at org/jruby/runtime/callsite/CachingCallSite.java:172
     invokeOther2:scan at -e:1
                <main> at -e:1
   invokeWithArguments at java/lang/invoke/MethodHandle.java:627
                  load at org/jruby/ir/Compiler.java:94
             runScript at org/jruby/Ruby.java:850
           runNormally at org/jruby/Ruby.java:773
           runNormally at org/jruby/Ruby.java:791
           runFromMain at org/jruby/Ruby.java:603
         doRunFromMain at org/jruby/Main.java:415
           internalRun at org/jruby/Main.java:307
                   run at org/jruby/Main.java:234
                  main at org/jruby/Main.java:206

lopex added a commit that referenced this issue Dec 13, 2018

@lopex

This comment has been minimized.

Copy link
Member

lopex commented Dec 13, 2018

The problem was that mri casts last return to unsigned char

@lopex

This comment has been minimized.

Copy link
Member

lopex commented Dec 13, 2018

Also, mri seems to overflow multiple times using unsigned int so we dont match intermediate h values

@lopex

This comment has been minimized.

Copy link
Member

lopex commented Dec 13, 2018

@headius
if we want to match that h results we should keep h int but operate using long opcodes, so ultimately we shuld always widen it back then via #xffffffff

@lopex

This comment has been minimized.

Copy link
Member

lopex commented Dec 13, 2018

ok, after widening the bytes we match mri.

@lopex

This comment has been minimized.

Copy link
Member

lopex commented Dec 13, 2018

Closing, when travis wakes up and spots any issues, we can reopen it.

@lopex lopex closed this Dec 13, 2018

@lopex

This comment has been minimized.

Copy link
Member

lopex commented Dec 13, 2018

relevant commit e883c84

@headius

This comment has been minimized.

Copy link
Member

headius commented Dec 13, 2018

@pocke Could you submit specs for this case either to our spec/ruby dir or to https://github.com/ruby/spec please?

@pocke

This comment has been minimized.

Copy link
Author

pocke commented Dec 15, 2018

Thank you for the quickly commit!

@pocke Could you submit specs for this case either to our spec/ruby dir or to https://github.com/ruby/spec please?

Sure, I'll create a pull request for specs.

pocke added a commit to pocke/spec that referenced this issue Dec 15, 2018

eregon added a commit to ruby/spec that referenced this issue Dec 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.