String encoding #1198

Closed
carlosmrce opened this Issue Oct 31, 2013 · 20 comments

Projects

None yet

4 participants

@carlosmrce

Can someone tell me what's wrong with the following code? I'm developing an app on torquebox, but i'm running into weird encoding errors!

My system encoding is Windows-1252.

C:>jruby -v
jruby 1.7.4 (1.9.3p392) 2013-05-16 2390d3b on Java HotSpot(TM) Client VM 1.6.0_1
6-b01 [Windows 7-x86]

--Test.rb

encoding: utf-8

s = "OoaAçÇãÚú$%()"

puts s
puts s.encode("UTF-8")
puts s.encode("ISO-8859-1")

--Output
C:\Users\t0665011\testes>jruby test.rb
OoaAçÇãÚú$%()
OoaAçÇãÚú$%()
OoaAþÃÒ┌·$%()

Thanks!

@headius
Member
headius commented Oct 31, 2013

I'd recommend testing against 1.7.5+, whwere we did a bunch of encoding work.

@carlosmrce

@headius I tested against 1.7.5 and 1.7.6 and got the same results :(

Any ideas?

--1.7.5
C:\Users\t0665011\testes>jruby -v
jruby 1.7.5 (1.9.3p392) 2013-10-07 74e9291 on Java HotSpot(TM) Client VM 1.6.0_1
6-b01 [Windows 7-x86]

C:\Users\t0665011\testes>jruby test.rb
OoaAçÇãÚú$%()
OoaAçÇãÚú$%()
OoaAþÃÒ┌·$%()
OoaAþÃÒ┌·$%()

--1.7.6
C:\Users\t0665011\testes>jruby -v
jruby 1.7.6 (1.9.3p392) 2013-10-22 6004147 on Java HotSpot(TM) Client VM 1.6.0_1
6-b01 [Windows 7-x86]

C:\Users\t0665011\testes>jruby test.rb
OoaAçÇãÚú$%()
OoaAçÇãÚú$%()
OoaAþÃÒ┌·$%()
OoaAþÃÒ┌·$%()

Thanks!

@headius
Member
headius commented Oct 31, 2013

Well the output is coming through as garbage on this bug report. Perhaps you can set up a repository that reproduces this and we can try on our end?

@headius
Member
headius commented Nov 4, 2013

Ok, I'm seeing the exact same output from MRI 2.0.0 as from JRuby on your example script (JRuby master, but 1.7.5+ should be the same. Not sure if this will paste right, but...

system ~/projects/jruby/tmp/jruby_encoding $ ruby2.0.0 test.rb 
OoaAçÇãÚú$%()
OoaAçÇãÚú$%()
OoaA?????$%()

system ~/projects/jruby/tmp/jruby_encoding $ jruby test.rb 
OoaAçÇãÚú$%()
OoaAçÇãÚú$%()
OoaA?????$%()

If you are actually seeing a difference between JRuby and MRI, perhaps you can add a screenshot to that repository? I can't reproduce here.

My system: OS X 10.8.x, JRuby 9000, Java 7u40, system encoding = UTF-8.

@carlosmrce

@headius You actually got the output i expected! I installed ruby 2.0 on my windows machine and i got the correct results.

Here's the print screen for the MRI
https://github.com/carlosmrce/jruby_encoding/blob/master/ruby.png

and here's the print screen for JRuby
https://github.com/carlosmrce/jruby_encoding/blob/master/jruby.png

I'm getting all sorts of errors on my app when i use String methods(unpack, gsub ...) and i think it's all related to this issue.

Thanks.

@carlosmrce

@headius I tried Java 1.7.0_45-b18 and got the same result :-(

@enebo
Member
enebo commented Nov 4, 2013

@carlosmrce Can you add -J-Dfile.encoding=UTF-8 to your command-line? It might be that on windows console the default encoding is CP{something} and it is trying to transcode the UTF-8 strings into the windows codepage encoding?

@carlosmrce

@enebo Same result! I had already set JAVA_OPTS to "-Dfile.encoding=UTF-8" and the output is the same.

Thanks.

@carlosmrce

@headius @enebo I can put together a VM with a exact configuration i have on my system. Would that help?

@messivanio

@carlosmrce Try to set JAVA_TOOL_OPTIONS to -Dfile.encoding=UTF8 .

@carlosmrce

@messivanio Same result! I have tested my app on Linux and works fine. I guess JRuby isn't meant to run on Windows :-( Thanks!

@headius
Member
headius commented Nov 26, 2013

@carlosmrce JRuby is definitely meant to run on Windows. Unfortunately we have not been able to reproduce, which makes it hard for us to fix it. Please help us find a way to reproduce, so we can fix the issue for you!

@carlosmrce

@headius Would it help if i give you access to a WinXP VM? I can install TeamViewer or Hamachi.

@enebo
Member
enebo commented Nov 27, 2013

I can reproduce this on Windows 7. I will see what I can figure out. We also have some other issue open on Jira (cannot find it) which has been open for a long time which seems eerily similar to this one.

@enebo
Member
enebo commented Dec 4, 2013

Ok so I am just throwing this out there since I went about this all wrong...

If I capture the output to the file and compare against JRuby and MRI on both Windows and MacOS those chars are identical. If I run it without capturing it then I see that on Windows all three lines look exactly the same whereas viewing the saved output in an editor capable of viewing UTF-8 then I see only the bottom one rendering properly.
If I run this on non-windows linux/macos I see identical output to how it is saved as a file. So up to this point the only difference is how JRuby and MRI both render to a terminal only on windows (whether mingw bash or cmd).

So I am convinced this is purely a terminal affordance thing. It is clearly doing something else because if I redirect MRI output on windows to a file and then cat it and I cat what JRuby generates they are identical as well. Sleuthing in MRI code now.

@enebo
Member
enebo commented Dec 4, 2013

Aha rb_w32_write_console

@enebo enebo closed this in 5bd0798 Dec 4, 2013
@enebo
Member
enebo commented Dec 4, 2013

Amazing if this is totally fixed but it seems to work and logically it seems like it should work. I discovered System.console() which seems capable of taking a Java String and converting it to the underlying codepage of the windows console. I suspect the part which will fail is the facility for what to do on trancoding error (Java likes to print '?'). That can be a followup bug if someone can make that happen.

Note in case this fails utterly...we can use WriteConsoleW and a couple of Windows methods using jnr-posix but that seems like an ugly set of code. Let's hope we don't need to go there.

@carlosmrce

@enebo Thanks a lot! I'll test as soon as 1.7.9 is out. Thanks again.

@carlosmrce

@enebo @headius Just tested on my system and it's fixed! Thanks a lot! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment