String#byteslice can raise an inappropriate ArrayIndexOutOfBoundsException #886

Closed
jrochkind opened this Issue Jul 15, 2013 · 8 comments

Projects

None yet

3 participants

@jrochkind

...or return the wrong byte. What reproduces this for me is a split on a UTF-8 encoded string including control characters (which are legal UTF-8). Which appears to make jruby's internal string representation confused about byte lengths and offsets of the string's internal buffer.

Again, the result can be a wrong result from String#byteslice, and/or an inappropriate nonsensical ArrayIndexOutOfBoundsException being raised. In my reproducible test case below, it's reproducing the ArrayIndexOutOfBounds.

Encoding and byte count issues are really hard to talk about, rather than try to explain in words I'll explain with a failing Test::Unit, annotated.

This works in MRI ((ruby 1.9.3p448 (2013-06-27 revision 41675) [x86_64-darwin12.4.0])), but raises a really weird exception you'll see in jruby (jruby 1.7.4 (1.9.3p392) 2013-05-16 2390d3b on Java HotSpot(TM) 64-Bit Server VM 1.6.0_51-b11-457-11M4509 [darwin-x86_64])

require 'test/unit'


# jruby 1.7.4 (1.9.3p392) 2013-05-16 2390d3b on Java HotSpot(TM) 64-Bit Server VM 1.6.0_51-b11-457-11M4509 [darwin-x86_64]
class TestField < Test::Unit::TestCase

  def test_confused_bytecount





    string_with_ctrl = "hello\x1fhello".force_encoding("UTF-8")
    # control chars like \x1F ARE legal UTF-8, this is correct:
    assert string_with_ctrl.valid_encoding?

    # It's even considered ascii_only? -- this is correct, both MRI and jruby
    assert string_with_ctrl.ascii_only?


    # For reasons I can't explain, I can only reproduce the 
    # problem right now by doing a split, on the control char
    # (this does represent my actual use case)
    # Whether the split operand is tagged ASCII or UTF-8 does not matter,
    # case is identical either way. 
    elements = string_with_ctrl.split("\x1F".force_encoding("UTF-8"))  

    # For some reason weirdness only happens on the second one in the split
    # in this case. 
    second = elements[1]


    # For a string composed of all one-byte wide ascii, as this one is...
    assert_equal "hello", second
    assert second.ascii_only?

    # string[0] and string.byteslice(0) shoudl be identical. They are
    # different when the string contains multi-byte chars. 
    # using #[], we're okay
    assert_equal "h", second[0]

    # But on jruby, this following actually raises an exception!
    assert_equal "h", second.byteslice(0)
    # That one up there actually just raised!!!
    # Java::JavaLang::ArrayIndexOutOfBoundsException: 12
    #  org.jruby.util.ByteList.equal(ByteList.java:960)

    # In other cases I saw in my real app, it didn't raise, but
    # did return the WRONG bytes. Ie, not a 'h' above as expected, or
    # not:


    assert_equal second[0], second.byteslice(0)
    # but in jruby we never even get here, we raise. 

    # In MRI, we pass ALL these tests with no exceptions. 
    # (ruby 1.9.3p448 (2013-06-27 revision 41675) [x86_64-darwin12.4.0])
  end

end

(interested parties include @BillDueber and @adamj)

@adjam
adjam commented Jul 15, 2013

Can confirm same exception occurs on jruby 1.7.4 (1.9.3p392) 2013-05-16 2390d3b on Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15 [darwin-x86_64]

@jrochkind

Anything I can do to get any attention here? Maybe my title and description are too hard to understand -- it is very challenging to describe such a weird bug succinctly. But I do have a reproducible failing test case which is clearly a bug. Okay, I'm going to try changing the name of hte issue!

@atambo
Member
atambo commented Jul 22, 2013

Are you able to test this out on jruby-head? I just tried reproducing this against master and it did not raise an exception. So I'm thinking this is fixed.

@atambo
Member
atambo commented Jul 22, 2013

I think c7ed4ab fixed this.

@jrochkind

What's the easiest/best way to install jruby-head, if I don't use rvm?

On Mon, Jul 22, 2013 at 6:43 PM, Alex Tambellini
notifications@github.comwrote:

Are you able to test this out on jruby-head? I just tried reproducing this
against master and it did not raise an exception. So I'm thinking this is
fixed.


Reply to this email directly or view it on GitHubhttps://github.com/jruby/jruby/issues/886#issuecomment-21381448
.

@atambo
Member
atambo commented Jul 22, 2013

If you have maven and git then you could just checkout this git repository and run mvn at the top level directory which will allow you to then run ./bin/jruby -S irb to try things out.

@jrochkind

Running jruby-head, the test case posted in this issue above does pass, while it failed in jruby 1.7.4

I can't say whether it resolves all problems, but it does resolve the one demonstrated in the test case it took me a few hours to isolate and post here, so if you think you understand the issue and it's solved, that's probably a good sign.

@atambo
Member
atambo commented Jul 23, 2013

I'm going to close this issue then. If you hit another issue with byteslice when using jruby 1.7.5 then just open another issue.

@atambo atambo closed this Jul 23, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment