UnmarshalStream: Look for encoding information in all instance variables #1345

Merged
merged 1 commit into from Dec 19, 2013

Projects

None yet

4 participants

@DavidEGrayson

Unfortunately, I think the solution to issue #939 did not go far enough. When unmarshaling a string, MRI actually checks every instance variable for encoding information and uses the last encoding it found.

We can see this from r_ivar in MRI 2.0.0p353 and 1.9.3p484:
https://github.com/ruby/ruby/blob/v2_0_0_353/marshal.c#L1369-1380
https://github.com/ruby/ruby/blob/v1_9_3_484/marshal.c#L1255-1266

I also checked the latest version of MRI's master branch.

Here is some example code where we unmarshal a string and the encoding is specified as UTF-16BE in the second instance variable:

puts Marshal.load("\x04\x08I\"\x06a\x07:\x07@aT:\x0Dencoding\"\x0DUTF-16BE").encoding

Here is a shell session showing how MRI respects the encoding but JRuby does not, giving the string an encoding of ASCII-8BIT:

$ jruby -v && jruby pr2_repro.rb
jruby 1.7.9 (1.9.3p392) 2013-12-06 87b108a on Java HotSpot(TM) 64-Bit Server VM
1.7.0_45-b18 [Windows 8-amd64]
ASCII-8BIT
$ source use_mri2.sh
$ ruby -v && ruby pr2_repro.rb
ruby 2.0.0p0 (2013-02-24) [x64-mingw32]
UTF-16BE

The code in this pull request fixes this issue. All I did was remove the "i==0" condition in defaultVariablesUnmarshal and adjusted the indentation accordingly.

The reason I am touching this code is because I plan to refactor the encoding-determination logic out of it into another method so I can use it for symbols for issue #1329, or maybe I will just call the existing method.

@DavidEGrayson DavidEGrayson UnmarshalStream: Changed it to be like MRI and look for encoding data…
… in all the instance variables, and only use the last one. All I did was remove the "i==0" condition in defaultVariablesUnmarshal and adjust the indentation accordingly.
3020ca2
@headius
Member
headius commented Dec 19, 2013

Looks good, I'll merge.

Whitespace-free diff for anyone following along at home:

$ git diff -w jruby-1_7
diff --git a/core/src/main/java/org/jruby/runtime/marshal/UnmarshalStream.java b/core/src/main/java/org/jruby/runtime/marshal/UnmarshalStream.java
index 6adc71d..d26e169 100644
--- a/core/src/main/java/org/jruby/runtime/marshal/UnmarshalStream.java
+++ b/core/src/main/java/org/jruby/runtime/marshal/UnmarshalStream.java
@@ -375,8 +375,6 @@ public class UnmarshalStream extends InputStream {

             IRubyObject key = unmarshalObject(false);

-            if (i == 0) { // first ivar provides encoding
-                
             if (runtime.is1_9() && object instanceof EncodingCapable) {

                 EncodingCapable strObj = (EncodingCapable)object;
@@ -407,7 +405,6 @@ public class UnmarshalStream extends InputStream {

                 } // else fall through as normal ivar
             }
-            }

             String name = key.asJavaString();
             IRubyObject value = unmarshalObject();
@jrubyci jrubyci merged commit 3020ca2 into jruby:jruby-1_7 Dec 19, 2013

1 check failed

Details default The Travis CI build failed
@DavidEGrayson DavidEGrayson deleted the DavidEGrayson:jruby_pull_request_2 branch Jan 19, 2014
@enebo enebo modified the milestone: JRuby 1.7.10, JRuby 1.7.11 Feb 24, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment