-
-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Marshal.load encoding error #456
Comments
There must be a discrepancy between the 1.8 and 1.9 mode Marshal:
If you can run your code as 1.8 mode, that might be a quick workaround. |
Opening the marshaled data in MRI 1.9.3p327 and marshaling it again produces a file that can be read by JRuby 1.7.0 in 1.9 mode. This suggests the problem is with the 1.8 MRI Marshal format in JRuby 1.9 mode. I thought that Marshal data wasn't supported across different versions? |
This isn't quite right:
From the docs:
|
And in this case, examining the Marshal major/minor version numbers are the same (from examining the first two bytes of |
The test for encoding when unmarshaling strings seems suspicious to me. It tests for I've changed that locally to be |
Here's a smaller reproduction: In Ruby 1.9.1-p431, via x = {"a" => 1.0, "b" => 2.0}
File.open('191-hash.dmp', 'w') { |f| Marshal.dump(x, f) } In JRuby 1.7.1 (1.9 mode) File.open('191-hash.dmp') { |f| Marshal.load(f) }
# => ArgumentError: invalid encoding in marshaling stream: f |
I had to go back to Ruby 1.9.1 because 1.9.2 introduced a shortcut for UTF-8 / US-ASCII encoding ( |
You also need to have two items in the hash. I'm not super familiar with the Marshal format, but my guess from looking at the hex is that the second key doesn't specify the encoding to save space, and it's expected to just pick it up from the previous one. Pure speculation on my part though. |
The existing tests pass with the change to make it |
OK, my failed tests led me to dig deeper. Here's what I think is happening now: When MRI writes an encoding that is not US-ASCII / UTF-8, it writes it as a normal String. This means that it can use the JRuby does not treat the encoding equivalent to a marshaled String, instead it ignores the string type marker This means you can reproduce with a modern MRI, like x = ['a', 'b'].map {|s| s.force_encoding('Shift_JIS')}
File.open('193-array', 'w') {|f| Marshal.dump(x, f)} It's important that the two strings not be the same, otherwise they will be marked as total duplicates and will be handled correctly. |
Both marshaling and unmarshaling are affected in the same way. This means JRuby cannot read certain MRI data, but MRI should be able to read JRuby data, although it will be bigger than necessary. I did not test this however. |
I tested this on master and it seems to work ok:
So I'm going to optimistically mark this as fixed (unsure of release, so I'll go with 1.7.5). |
IIRC, this was fixed by commit 82cda78, which doesn't seem to have any merge notes. Maybe it was cherry-picked? If so, should be fixed back to 1.7.3. |
When trying using the
tactful_tokenizer
gem, you get this error.I've confirmed that it works fine in MRI 1.9.3-p327.
tactful_tokenizer
: https://github.com/SlyShy/Tactful_Tokenizer (though it's unclear exactly which version was shipped as 0.0.2, so you're probably better off with the gem sources themselves)The text was updated successfully, but these errors were encountered: