> Encoding::Converter.new(Encoding::UTF_8, Encoding::UTF_8_MAC)
Encoding::ConverterNotFoundError: code converter not found (UTF-8 to UTF8-MAC)
from org/jruby/RubyConverter.java:162:in `initialize'
from org/jruby/RubyConverter.java:135:in `initialize'
The previous implementation was quite slow. This leverages some of the
transcoding abilities built into Ruby 1.9 instead. It is roughly 96%
The roundtrip through UTF_8_MAC here is because ruby won't let you
transcode from UTF_8 to UTF_8. I chose the closest encoding I could
find as an intermediate.
In order to support UTF_8_MAC we'll need to port the whole transcoding subsystem. Currently we're using Java's Charset logic to transcode, and it does not support UTF_8_MAC.
My understanding of UTF_8_MAC is that it prefers to use combining characters rather than single codepoints, so UTF_8 to UTF_8_MAC and back is not likely to round-trip in all cases.
I would suggest that instead of this hack, Rails should use some version of the pure-Ruby String#scrub I implemented (and I think @YorickPeterse improved) from this issue: rubinius/rubinius#2912
Note that this version does not successfully handle all bad characters on JRuby due to incompatibilities in the Charset-based transcoding pipeline (#1459), but for strings with malformed input or no errors, it will work fine and not have the error above.
I will mark this as a bug for JRuby 9k, since by then we should have a proper port of MRI's transcoding logic.