Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnmarshalStream: Look for encoding information in all instance variables #1345

Merged
merged 1 commit into from Dec 19, 2013

Conversation

@DavidEGrayson
Copy link
Contributor

@DavidEGrayson DavidEGrayson commented Dec 17, 2013

Unfortunately, I think the solution to issue #939 did not go far enough. When unmarshaling a string, MRI actually checks every instance variable for encoding information and uses the last encoding it found.

We can see this from r_ivar in MRI 2.0.0p353 and 1.9.3p484:
https://github.com/ruby/ruby/blob/v2_0_0_353/marshal.c#L1369-1380
https://github.com/ruby/ruby/blob/v1_9_3_484/marshal.c#L1255-1266

I also checked the latest version of MRI's master branch.

Here is some example code where we unmarshal a string and the encoding is specified as UTF-16BE in the second instance variable:

puts Marshal.load("\x04\x08I\"\x06a\x07:\x07@aT:\x0Dencoding\"\x0DUTF-16BE").encoding

Here is a shell session showing how MRI respects the encoding but JRuby does not, giving the string an encoding of ASCII-8BIT:

$ jruby -v && jruby pr2_repro.rb
jruby 1.7.9 (1.9.3p392) 2013-12-06 87b108a on Java HotSpot(TM) 64-Bit Server VM
1.7.0_45-b18 [Windows 8-amd64]
ASCII-8BIT
$ source use_mri2.sh
$ ruby -v && ruby pr2_repro.rb
ruby 2.0.0p0 (2013-02-24) [x64-mingw32]
UTF-16BE

The code in this pull request fixes this issue. All I did was remove the "i==0" condition in defaultVariablesUnmarshal and adjusted the indentation accordingly.

The reason I am touching this code is because I plan to refactor the encoding-determination logic out of it into another method so I can use it for symbols for issue #1329, or maybe I will just call the existing method.

… in all the instance variables, and only use the last one. All I did was remove the "i==0" condition in defaultVariablesUnmarshal and adjust the indentation accordingly.
@headius
Copy link
Member

@headius headius commented Dec 19, 2013

Looks good, I'll merge.

Whitespace-free diff for anyone following along at home:

$ git diff -w jruby-1_7
diff --git a/core/src/main/java/org/jruby/runtime/marshal/UnmarshalStream.java b/core/src/main/java/org/jruby/runtime/marshal/UnmarshalStream.java
index 6adc71d..d26e169 100644
--- a/core/src/main/java/org/jruby/runtime/marshal/UnmarshalStream.java
+++ b/core/src/main/java/org/jruby/runtime/marshal/UnmarshalStream.java
@@ -375,8 +375,6 @@ public class UnmarshalStream extends InputStream {

             IRubyObject key = unmarshalObject(false);

-            if (i == 0) { // first ivar provides encoding
-                
             if (runtime.is1_9() && object instanceof EncodingCapable) {

                 EncodingCapable strObj = (EncodingCapable)object;
@@ -407,7 +405,6 @@ public class UnmarshalStream extends InputStream {

                 } // else fall through as normal ivar
             }
-            }

             String name = key.asJavaString();
             IRubyObject value = unmarshalObject();
@jrubyci jrubyci merged commit 3020ca2 into jruby:jruby-1_7 Dec 19, 2013
1 check failed
1 check failed
@enebo
default The Travis CI build failed
Details
@DavidEGrayson DavidEGrayson deleted the DavidEGrayson:jruby_pull_request_2 branch Jan 19, 2014
@enebo enebo modified the milestones: JRuby 1.7.10, JRuby 1.7.11 Feb 24, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants