-
-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding issue after v1.7.6 #2419
Comments
I've reproduced this on 1.7.18, but it works as intended on HEAD as of fb9199f. |
This is an ugly one to fix in 1.7. The logic for StringIO#write in MRI calls rb_str_conv_env, which calls rb_str_conv_enc_opts, which calls rb_econv_convert. Given the result from rb_econv_convert, it either increases the buffer size for the target buffer (if the result is "destination buffer full"), or it returns the newly-transcoded buffer (if the result is "finished"), or it returns the original string and ignores the result altogether (all other results, including errors). This results in behavior that seems incorrect on MRI, but which raises an error on JRuby: Code: sio = StringIO.new('123'.force_encoding('US-ASCII'))
p sio.string.encoding
sio.write('øøø')
p sio.string
p sio.string.encoding Output on MRI:
Clearly this string is no longer US-ASCII because of the multibyte UTF-8 characters, but this is how MRI handles the situation. The basic problem is that our transcoder is set up to always raise an error if there's a result from transcoding. Our StringIO#write calls roughly equivalent logic in Transcoder.strConvEnc/Opts, but the logic differs in two important ways:
The challenge of fixing this is that most of the transcoding consumers in our codebase depend on the transcoder raising errors on its own. I tried to just remove the The safest way to fix this would be to make a separate path for our strConvEncOpts that handles the error appropriately outside CharsetTranscoder, and make sure all paths using strConvEncOpts are using it properly. |
When there's an error from encoding, MRI's version of this function just returns the incoming bytes as its result. Ours used a path that always raises an error when transcoding fails. The new version uses a path more in line with MRI that requires an out buffer and returns a result, so we can handle it appropriately. Fixes #2419.
I have pushed a fix to branch test-fix-2419 for you to try out. It appears to pass MRI tests and RubySpec locally. |
I am testing it now, I'll know tomorrow if there are still any issues, will let you know. |
So far so good. |
@headius that worked, no more errors. You can close this. |
So I've been having problems with encodings after upgrading from JRuby 1.7.6 (I think the next version I tried after that was 1.7.8, but surely 1.7.11).
Reproduce in 1.7.18:
It works fine with 1.7.6 (also works on MRI 2.1.2). I'm not sure if 1.7.11 had the exact same issue, I just know that I had issues with encodings.
Note:
.to_html
is crucial to trigger the error.The text was updated successfully, but these errors were encountered: