Support for ISO-8859-16 #1214

Closed
ftomassetti opened this Issue Nov 10, 2013 · 9 comments

Projects

None yet

3 participants

@ftomassetti

Running:

code = IO.read('text_iso_8859_16',{ :encoding => 'ISO-8859-16', :mode => 'rb'})
code = code.encode('UTF-8')

On Jruby 1.7.6 gives me:

Encoding::ConverterNotFoundError: code converter not found for ISO-8859-16
   encode at org/jruby/RubyString.java:7597
   (root) at bugtest.rb:2

While on Ruby 2.0 runs flawlessly...

@ftomassetti

It gives me an error with ruby 1.9.3 p 448

bugtest.rb:2:in `encode': code converter not found (ISO-8859-16 to UTF-8) (Encoding::ConverterNotFoundError)
from bugtest.rb:2:in `<main>'

I would say the ruby error is more clear.

I am still confused: the file is loaded but it can not be converted to UTF-8? Does it make any sense?

@headius
Member
headius commented Nov 11, 2013

First off, it does not appear that OpenJDK has support for ISO-8859-16 in its charset subsystem, which we use for character transcoding. The only way to get support for that encoding would be for us to incorporate a third-party implementation of 8859-16.

We can certainly improve the error but I'd like to actually add support somehow. That may mean finally implementing (porting) the transcoding logic from MRI, so we have identical encoding support (not going to happen until JRuby 9k), or by pulling in some third-party 8859-16 charset impl.

@headius headius closed this in 8bd4963 Nov 11, 2013
@headius
Member
headius commented Nov 11, 2013

Please test out jruby/jruby@jruby-1_7 or jruby/jruby@master where I have implemented an ISO-8859-16 charset.

@ftomassetti

I cloned the repo but I got an error while running maven. I will try again checking out your commit.

@ftomassetti

Ok, checkout out the exact commit (8bd4963) I can build JRuby. Then I ran bin/irb but I got the same error. Should I somewhat specify to use the standard libraries from?

Putting the code in a script and running bin/jruby test.rb I got instead:
ISO_8859_16.java:73:in decodeLoop': java.lang.ArrayIndexOutOfBoundsException: -4 from CharsetDecoder.java:561:indecode'
from CharsetTranscoder.java:484:in transcode' from CharsetTranscoder.java:319:inprimitiveConvert'
from CharsetTranscoder.java:280:in transcode' from CharsetTranscoder.java:236:intranscode'
from EncodingUtils.java:873:in transcodeLoop' from EncodingUtils.java:801:instrTranscode0'
from EncodingUtils.java:736:in strTranscode' from EncodingUtils.java:707:instrEncode'
from RubyString.java:7599:in encode' from RubyString$INVOKER$i$encode.gen:-1:incall'
from CachingCallSite.java:326:in cacheAndCall' from CachingCallSite.java:170:incall'
from test.rb:3:in __file__' from test.rb:-1:inload'
from Ruby.java:811:in runScript' from Ruby.java:804:inrunScript'
from Ruby.java:673:in runNormally' from Ruby.java:522:inrunFromMain'
from Main.java:395:in doRunFromMain' from Main.java:290:ininternalRun'
from Main.java:217:in run' from Main.java:197:inmain'

Let me know If I can help with testing this

@mkristian
Member

on master the commit ecd3f56 corrupted a
method :(
jruby-1_7 branch seems OK

@headius
Member
headius commented Nov 12, 2013

I'm looking into the breakage.

@headius
Member
headius commented Nov 12, 2013

Signed bytes strike again... I have pushed an additional fix to both branches for byte values > 127 which should fix the additional issue you found.

I'll see if I can improve the tests for this encoding.

@ftomassetti

Running fine for me. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment