Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

JRuby 1.7.3 ignoring :undef option for String#encode? (UndefinedConversionError) #616

Closed
korny opened this Issue Mar 30, 2013 · 4 comments

Comments

Projects
None yet
3 participants

korny commented Mar 30, 2013

When I want to convert some input to UTF-8 that includes undefined characters, MRI and JRuby throw an UndefinedConversionError, as expected:

ruby-1.9.3-p392 -e 'p "\xC3".encode("utf-8", "binary")'
-e:1:in `encode': "\xC3" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)
    from -e:1:in `<main>'

I can prevent this in MRI by settings :undef => :replace:

ruby-1.9.3-p392 -e 'p "\xC3".encode("utf-8", "binary", :undef => :replace)'
"�"

But JRuby still throws the same error:

jruby-1.7.3 -e 'p "\xC3".encode("utf-8", "binary", :undef => :replace)'
Encoding::UndefinedConversionError: "\xC3" from ASCII-8BIT to UTF-8
  encode at org/jruby/RubyString.java:7589
  (root) at -e:1

The :invalid => :replace options doesn't help either.

My environment: JRuby 1.7.3, Java HotSpot(TM) 64-Bit Server VM 1.7.0_07-b10, OS X 10.8.3, LANG=en_US.UTF-8.

Owner

BanzaiMan commented Apr 1, 2013

This is basically a duplicate of #375.

@BanzaiMan BanzaiMan closed this Apr 1, 2013

Owner

headius commented Jul 12, 2013

I don't think this is a dupe of #375 anymore. We have some hardcoded logic to prevent ASCII from transcoding that needs to take into account the :undef option.

@headius headius reopened this Jul 12, 2013

Owner

headius commented Jul 12, 2013

I have a patch that makes this work correctly, but only for the invalid option...not the undef option.This is due to the fact that in order to get the Java transcoding logic to throw out 8-bit bytes, I need to tell it to decode from US-ASCII to UTF-16 (Java internal encoding), and the error it raises in that setup is "malformed" rather than "undefined mapping" because...well...it is malformed.

I'm trying to sort out the logic behind treating it as undefined.

@headius headius closed this in 74a57ef Jul 12, 2013

korny commented Jul 13, 2013

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment