Skip to content

Loading…

Marshal load loses correct encoding for string subclass #939

Closed
jkl1337 opened this Issue · 2 comments

2 participants

@jkl1337

I am getting unexpected behavior when using Marshal.load on a subclass of string that includes an instance variable with a value. In the example below the instance variable is an integer, but it seems to do this with anything other than nil for the instance variable.

Note that Marshal.dump seems to yield identical output in JRuby and MRI (the encoding of the Marshal.dump string is ASCII-8BIT in both cases as expected)

This problem was encountered in a Rails app after attempting to cache an object containing an object with a subclass of string.

The workaround I am using is to customize marshal_dump on the class to return an array with the string data as one element and the ivars as the other.

jruby 1.7.5.dev (1.9.3p392) 2013-08-02 c672591 on Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15 [linux-amd64]

[1] pry(main)> class StringSubclass < String
[1] pry(main)*   attr_accessor :oops
[1] pry(main)* end  
=> nil
[2] pry(main)> s_ok = StringSubclass.new('what')
=> "what"
[3] pry(main)> s_ok.encoding
=> #<Encoding:UTF-8>
[4] pry(main)> Marshal.dump(s_ok)
=> "\x04\bIC:\x13StringSubclass\"\twhat\x06:\x06ET"
[5] pry(main)> Marshal.load(Marshal.dump(s_ok)).encoding
=> #<Encoding:UTF-8>
[6] pry(main)> 
[7] pry(main)> s_oops = StringSubclass.new('what').tap { |s| s.oops = 1; s }
=> "what"
[8] pry(main)> s_oops.encoding
=> #<Encoding:UTF-8>
[9] pry(main)> Marshal.dump(s_oops)
=> "\x04\bIC:\x13StringSubclass\"\twhat\a:\x06ET:\n@oopsi\x06"
[10] pry(main)> Marshal.load(Marshal.dump(s_oops)).encoding
=> #<Encoding:ASCII-8BIT>
ruby 2.0.0p195 (2013-05-14 revision 40734) [x86_64-linux]

[1] pry(main)> class StringSubclass < String
[1] pry(main)*   attr_accessor :oops
[1] pry(main)* end  
=> nil
[2] pry(main)> s_ok = StringSubclass.new('what')
=> "what"
[3] pry(main)> s_ok.encoding
=> #<Encoding:UTF-8>
[4] pry(main)> Marshal.dump(s_ok)
=> "\x04\bIC:\x13StringSubclass\"\twhat\x06:\x06ET"
[5] pry(main)> Marshal.load(Marshal.dump(s_ok)).encoding
=> #<Encoding:UTF-8>
[6] pry(main)> 
[7] pry(main)> s_oops = StringSubclass.new('what').tap { |s| s.oops = 1; s }
=> "what"
[8] pry(main)> s_oops.encoding
=> #<Encoding:UTF-8>
[9] pry(main)> Marshal.dump(s_oops)
=> "\x04\bIC:\x13StringSubclass\"\twhat\a:\x06ET:\n@oopsi\x06"
[10] pry(main)> Marshal.load(Marshal.dump(s_oops)).encoding
=> #<Encoding:UTF-8>
@headius
JRuby Team member

Reproduced. Investigating.

@headius
JRuby Team member

Turned out to be a fairly simple problem; the unmarshaling logic was buggy, looking for encoding in the last instance variable coming off the stream, rather than the first. I rewrote the logic to be less confusing and set it up to use the first variable, as in MRI.

@headius headius closed this in e6121a4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.