Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnmarshalStream: Look for encoding information in all instance variables #1345

Merged
merged 1 commit into from Dec 19, 2013

Conversation

Projects
None yet
4 participants
@DavidEGrayson
Copy link
Contributor

commented Dec 17, 2013

Unfortunately, I think the solution to issue #939 did not go far enough. When unmarshaling a string, MRI actually checks every instance variable for encoding information and uses the last encoding it found.

We can see this from r_ivar in MRI 2.0.0p353 and 1.9.3p484:
https://github.com/ruby/ruby/blob/v2_0_0_353/marshal.c#L1369-1380
https://github.com/ruby/ruby/blob/v1_9_3_484/marshal.c#L1255-1266

I also checked the latest version of MRI's master branch.

Here is some example code where we unmarshal a string and the encoding is specified as UTF-16BE in the second instance variable:

puts Marshal.load("\x04\x08I\"\x06a\x07:\x07@aT:\x0Dencoding\"\x0DUTF-16BE").encoding

Here is a shell session showing how MRI respects the encoding but JRuby does not, giving the string an encoding of ASCII-8BIT:

$ jruby -v && jruby pr2_repro.rb
jruby 1.7.9 (1.9.3p392) 2013-12-06 87b108a on Java HotSpot(TM) 64-Bit Server VM
1.7.0_45-b18 [Windows 8-amd64]
ASCII-8BIT
$ source use_mri2.sh
$ ruby -v && ruby pr2_repro.rb
ruby 2.0.0p0 (2013-02-24) [x64-mingw32]
UTF-16BE

The code in this pull request fixes this issue. All I did was remove the "i==0" condition in defaultVariablesUnmarshal and adjusted the indentation accordingly.

The reason I am touching this code is because I plan to refactor the encoding-determination logic out of it into another method so I can use it for symbols for issue #1329, or maybe I will just call the existing method.

UnmarshalStream: Changed it to be like MRI and look for encoding data…
… in all the instance variables, and only use the last one. All I did was remove the "i==0" condition in defaultVariablesUnmarshal and adjust the indentation accordingly.
@headius

This comment has been minimized.

Copy link
Member

commented Dec 19, 2013

Looks good, I'll merge.

Whitespace-free diff for anyone following along at home:

$ git diff -w jruby-1_7
diff --git a/core/src/main/java/org/jruby/runtime/marshal/UnmarshalStream.java b/core/src/main/java/org/jruby/runtime/marshal/UnmarshalStream.java
index 6adc71d..d26e169 100644
--- a/core/src/main/java/org/jruby/runtime/marshal/UnmarshalStream.java
+++ b/core/src/main/java/org/jruby/runtime/marshal/UnmarshalStream.java
@@ -375,8 +375,6 @@ public class UnmarshalStream extends InputStream {

             IRubyObject key = unmarshalObject(false);

-            if (i == 0) { // first ivar provides encoding
-                
             if (runtime.is1_9() && object instanceof EncodingCapable) {

                 EncodingCapable strObj = (EncodingCapable)object;
@@ -407,7 +405,6 @@ public class UnmarshalStream extends InputStream {

                 } // else fall through as normal ivar
             }
-            }

             String name = key.asJavaString();
             IRubyObject value = unmarshalObject();

@jrubyci jrubyci merged commit 3020ca2 into jruby:jruby-1_7 Dec 19, 2013

1 check failed

default The Travis CI build failed
Details

@DavidEGrayson DavidEGrayson deleted the DavidEGrayson:jruby_pull_request_2 branch Jan 19, 2014

@enebo enebo modified the milestones: JRuby 1.7.10, JRuby 1.7.11 Feb 24, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.