The logic for StringIO#write in MRI calls rb_str_conv_env, which calls rb_str_conv_enc_opts, which calls rb_econv_convert. Given the result from rb_econv_convert, it either increases the buffer size for the target buffer (if the result is "destination buffer full"), or it returns the newly-transcoded buffer (if the result is "finished"), or it returns the original string and ignores the result altogether (all other results, including errors). This results in behavior that seems incorrect on MRI, but which raises an error on JRuby:
Clearly this string is no longer US-ASCII because of the multibyte UTF-8 characters, but this is how MRI handles the situation.
The basic problem is that our transcoder is set up to always raise an error if there's a result from transcoding. Our StringIO#write calls roughly equivalent logic in Transcoder.strConvEnc/Opts, but the logic differs in two important ways:
It never sees the result of the encoding and therefore does not fall back on the input string as in MRI.
The downstream code it calls in CharsetTranscoder always raises an error for failed conversion.
The challenge of fixing this is that most of the transcoding consumers in our codebase depend on the transcoder raising errors on its own. I tried to just remove the throw lastError in CharsetTranscoder.transcode, but many tests failed because they no longer errored.
The safest way to fix this would be to make a separate path for our strConvEncOpts that handles the error appropriately outside CharsetTranscoder, and make sure all paths using strConvEncOpts are using it properly.
When there's an error from encoding, MRI's version of this
function just returns the incoming bytes as its result. Ours used
a path that always raises an error when transcoding fails. The new
version uses a path more in line with MRI that requires an out
buffer and returns a result, so we can handle it appropriately.
So I've been having problems with encodings after upgrading from JRuby 1.7.6 (I think the next version I tried after that was 1.7.8, but surely 1.7.11).
Reproduce in 1.7.18:
It works fine with 1.7.6 (also works on MRI 2.1.2). I'm not sure if 1.7.11 had the exact same issue, I just know that I had issues with encodings.
.to_htmlis crucial to trigger the error.
The text was updated successfully, but these errors were encountered: