Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marshal load loses correct encoding for string subclass #939

Closed
jkl1337 opened this Issue Aug 2, 2013 · 2 comments

Comments

Projects
None yet
2 participants
@jkl1337
Copy link

commented Aug 2, 2013

I am getting unexpected behavior when using Marshal.load on a subclass of string that includes an instance variable with a value. In the example below the instance variable is an integer, but it seems to do this with anything other than nil for the instance variable.

Note that Marshal.dump seems to yield identical output in JRuby and MRI (the encoding of the Marshal.dump string is ASCII-8BIT in both cases as expected)

This problem was encountered in a Rails app after attempting to cache an object containing an object with a subclass of string.

The workaround I am using is to customize marshal_dump on the class to return an array with the string data as one element and the ivars as the other.

jruby 1.7.5.dev (1.9.3p392) 2013-08-02 c672591 on Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15 [linux-amd64]

[1] pry(main)> class StringSubclass < String
[1] pry(main)*   attr_accessor :oops
[1] pry(main)* end  
=> nil
[2] pry(main)> s_ok = StringSubclass.new('what')
=> "what"
[3] pry(main)> s_ok.encoding
=> #<Encoding:UTF-8>
[4] pry(main)> Marshal.dump(s_ok)
=> "\x04\bIC:\x13StringSubclass\"\twhat\x06:\x06ET"
[5] pry(main)> Marshal.load(Marshal.dump(s_ok)).encoding
=> #<Encoding:UTF-8>
[6] pry(main)> 
[7] pry(main)> s_oops = StringSubclass.new('what').tap { |s| s.oops = 1; s }
=> "what"
[8] pry(main)> s_oops.encoding
=> #<Encoding:UTF-8>
[9] pry(main)> Marshal.dump(s_oops)
=> "\x04\bIC:\x13StringSubclass\"\twhat\a:\x06ET:\n@oopsi\x06"
[10] pry(main)> Marshal.load(Marshal.dump(s_oops)).encoding
=> #<Encoding:ASCII-8BIT>
ruby 2.0.0p195 (2013-05-14 revision 40734) [x86_64-linux]

[1] pry(main)> class StringSubclass < String
[1] pry(main)*   attr_accessor :oops
[1] pry(main)* end  
=> nil
[2] pry(main)> s_ok = StringSubclass.new('what')
=> "what"
[3] pry(main)> s_ok.encoding
=> #<Encoding:UTF-8>
[4] pry(main)> Marshal.dump(s_ok)
=> "\x04\bIC:\x13StringSubclass\"\twhat\x06:\x06ET"
[5] pry(main)> Marshal.load(Marshal.dump(s_ok)).encoding
=> #<Encoding:UTF-8>
[6] pry(main)> 
[7] pry(main)> s_oops = StringSubclass.new('what').tap { |s| s.oops = 1; s }
=> "what"
[8] pry(main)> s_oops.encoding
=> #<Encoding:UTF-8>
[9] pry(main)> Marshal.dump(s_oops)
=> "\x04\bIC:\x13StringSubclass\"\twhat\a:\x06ET:\n@oopsi\x06"
[10] pry(main)> Marshal.load(Marshal.dump(s_oops)).encoding
=> #<Encoding:UTF-8>
@headius

This comment has been minimized.

Copy link
Member

commented Aug 27, 2013

Reproduced. Investigating.

@headius

This comment has been minimized.

Copy link
Member

commented Aug 27, 2013

Turned out to be a fairly simple problem; the unmarshaling logic was buggy, looking for encoding in the last instance variable coming off the stream, rather than the first. I rewrote the logic to be less confusing and set it up to use the first variable, as in MRI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.