Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling converting UTF-32 to UTF-8 is broken [9k] [lotus] #2581

Closed
PragTob opened this Issue Feb 8, 2015 · 6 comments

Comments

Projects
None yet
3 participants
@PragTob
Copy link

commented Feb 8, 2015

The UTF-32 encoding seems to be broken when converting to UTF-8. The input doesn't seem to matter as long as it is not an empty string.

I have jruby-head from today:

tobi@tobi-desktop ~/github/lotus_components/utils $ ruby -v
jruby 9.0.0.0-SNAPSHOT (2.2.0p0) 2015-02-08 cc00fd4 OpenJDK 64-Bit Server VM 24.75-b04 on 1.7.0_75-b13 +jit [linux-amd6

The problem seems to be that as soon as there is a character in a string and it is converted to UTF-32 it then throws an error when converting to UTF_8

jruby-head:

jruby-head :013 > "a".encode("UTF-32")
 => "\uFEFFa" 
jruby-head :014 > "a".encode("UTF-32").encode(Encoding::UTF_8)
Encoding::InvalidByteSequenceError: "\x00\x00\xFE\xFF" on UTF-32
    from org/jruby/RubyString.java:5671:in `encode'
    from (irb):14:in `evaluate'
    from org/jruby/RubyKernel.java:1000:in `eval'
    from org/jruby/RubyKernel.java:1310:in `loop'
    from org/jruby/RubyKernel.java:1120:in `catch'
    from org/jruby/RubyKernel.java:1120:in `catch'
    from /home/tobi/.rvm/rubies/jruby-head/bin/irb:13:in `__script__'

2.2:

2.2.0 :016 > "a".encode("UTF-32")
 => "\uFEFFa" 
2.2.0 :017 > "a".encode("UTF-32").encode(Encoding::UTF_8)
 => "a"

Discovered on lotus utils

Tobi

@headius

This comment has been minimized.

Copy link
Member

commented Mar 12, 2015

Wow, that's unexpected. This logic should be using the MRI transcoding subsystem pretty much as-is.

@headius

This comment has been minimized.

Copy link
Member

commented Mar 12, 2015

Seems to be something wrong with the "dummy" encodings:

irb(main):001:0> "a".encode("UTF-32").encode(Encoding::UTF_8)
Encoding::InvalidByteSequenceError: "\x00\x00\xFE\xFF" on UTF-32
    from org/jruby/RubyString.java:5669:in `encode'
    from (irb):1:in `<eval>'
    from org/jruby/RubyKernel.java:1005:in `eval'
    from org/jruby/RubyKernel.java:1315:in `loop'
    from org/jruby/RubyKernel.java:1125:in `catch'
    from org/jruby/RubyKernel.java:1125:in `catch'
    from /Users/headius/projects/jruby/bin/jirb:13:in `<top>'
irb(main):002:0> "a".encode("UTF-32BE").encode(Encoding::UTF_8)
=> "a"
irb(main):003:0> "a".encode("UTF-32").encoding
=> #<Encoding:UTF-32 (dummy)>
irb(main):004:0> "a".encode("UTF-32BE").encoding
=> #<Encoding:UTF-32BE>

@lopex What are these dummy encodings for?

@headius

This comment has been minimized.

Copy link
Member

commented Mar 12, 2015

It appears that at some point "dummy" encodings became "replicate" encodings, so I'm trying to make that change to our encoding list too.

@headius headius closed this in 3f5a605 Mar 12, 2015

@headius

This comment has been minimized.

Copy link
Member

commented Mar 13, 2015

Bleh, opened a can of worms. Additional fixes coming in.

@headius

This comment has been minimized.

Copy link
Member

commented Mar 13, 2015

Multiple fixes to jcodings and I think we're back in business. Your case works and all previous passing cases work. Will explore tags/excludes now.

@headius headius closed this Mar 13, 2015

@headius headius added this to the 9.0.0.0.pre2 milestone Mar 13, 2015

@PragTob

This comment has been minimized.

Copy link
Author

commented Mar 14, 2015

👍 Thanks a lot Charlie!

headius added a commit to jruby/jcodings that referenced this issue Mar 16, 2015

Dummy UTF-32 and UTF-16 need to be replicas with dummy flag.
Dummy flag is used in various places, so these replicas can't be
perfect replicas. See jruby/jruby#2581.

headius added a commit to jruby/jcodings that referenced this issue Mar 16, 2015

headius added a commit that referenced this issue Mar 16, 2015

headius added a commit that referenced this issue Mar 16, 2015

Revert "Fix jcodings mapping for UTF-32 and UTF-16 (to BE). Fixes #2581
…."

This reverts commit 3f5a605.

Conflicts:
	core/pom.xml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.