Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String encoding #1198

Closed
carlosmrce opened this Issue Oct 31, 2013 · 20 comments

Comments

Projects
None yet
4 participants
@carlosmrce
Copy link

commented Oct 31, 2013

Can someone tell me what's wrong with the following code? I'm developing an app on torquebox, but i'm running into weird encoding errors!

My system encoding is Windows-1252.

C:>jruby -v
jruby 1.7.4 (1.9.3p392) 2013-05-16 2390d3b on Java HotSpot(TM) Client VM 1.6.0_1
6-b01 [Windows 7-x86]

--Test.rb

encoding: utf-8

s = "OoaAçÇãÚú$%()"

puts s
puts s.encode("UTF-8")
puts s.encode("ISO-8859-1")

--Output
C:\Users\t0665011\testes>jruby test.rb
OoaAçÇãÚú$%()
OoaAçÇãÚú$%()
OoaAþÃÒ┌·$%()

Thanks!

@headius

This comment has been minimized.

Copy link
Member

commented Oct 31, 2013

I'd recommend testing against 1.7.5+, whwere we did a bunch of encoding work.

@carlosmrce

This comment has been minimized.

Copy link
Author

commented Oct 31, 2013

@headius I tested against 1.7.5 and 1.7.6 and got the same results :(

Any ideas?

--1.7.5
C:\Users\t0665011\testes>jruby -v
jruby 1.7.5 (1.9.3p392) 2013-10-07 74e9291 on Java HotSpot(TM) Client VM 1.6.0_1
6-b01 [Windows 7-x86]

C:\Users\t0665011\testes>jruby test.rb
OoaAçÇãÚú$%()
OoaAçÇãÚú$%()
OoaAþÃÒ┌·$%()
OoaAþÃÒ┌·$%()

--1.7.6
C:\Users\t0665011\testes>jruby -v
jruby 1.7.6 (1.9.3p392) 2013-10-22 6004147 on Java HotSpot(TM) Client VM 1.6.0_1
6-b01 [Windows 7-x86]

C:\Users\t0665011\testes>jruby test.rb
OoaAçÇãÚú$%()
OoaAçÇãÚú$%()
OoaAþÃÒ┌·$%()
OoaAþÃÒ┌·$%()

Thanks!

@headius

This comment has been minimized.

Copy link
Member

commented Oct 31, 2013

Well the output is coming through as garbage on this bug report. Perhaps you can set up a repository that reproduces this and we can try on our end?

@carlosmrce

This comment has been minimized.

Copy link
Author

commented Oct 31, 2013

@headius

This comment has been minimized.

Copy link
Member

commented Nov 4, 2013

Ok, I'm seeing the exact same output from MRI 2.0.0 as from JRuby on your example script (JRuby master, but 1.7.5+ should be the same. Not sure if this will paste right, but...

system ~/projects/jruby/tmp/jruby_encoding $ ruby2.0.0 test.rb 
OoaAçÇãÚú$%()
OoaAçÇãÚú$%()
OoaA?????$%()

system ~/projects/jruby/tmp/jruby_encoding $ jruby test.rb 
OoaAçÇãÚú$%()
OoaAçÇãÚú$%()
OoaA?????$%()

If you are actually seeing a difference between JRuby and MRI, perhaps you can add a screenshot to that repository? I can't reproduce here.

My system: OS X 10.8.x, JRuby 9000, Java 7u40, system encoding = UTF-8.

@carlosmrce

This comment has been minimized.

Copy link
Author

commented Nov 4, 2013

@headius You actually got the output i expected! I installed ruby 2.0 on my windows machine and i got the correct results.

Here's the print screen for the MRI
https://github.com/carlosmrce/jruby_encoding/blob/master/ruby.png

and here's the print screen for JRuby
https://github.com/carlosmrce/jruby_encoding/blob/master/jruby.png

I'm getting all sorts of errors on my app when i use String methods(unpack, gsub ...) and i think it's all related to this issue.

Thanks.

@carlosmrce

This comment has been minimized.

Copy link
Author

commented Nov 4, 2013

@headius I tried Java 1.7.0_45-b18 and got the same result :-(

@enebo

This comment has been minimized.

Copy link
Member

commented Nov 4, 2013

@carlosmrce Can you add -J-Dfile.encoding=UTF-8 to your command-line? It might be that on windows console the default encoding is CP{something} and it is trying to transcode the UTF-8 strings into the windows codepage encoding?

@carlosmrce

This comment has been minimized.

Copy link
Author

commented Nov 4, 2013

@enebo Same result! I had already set JAVA_OPTS to "-Dfile.encoding=UTF-8" and the output is the same.

Thanks.

@carlosmrce

This comment has been minimized.

Copy link
Author

commented Nov 7, 2013

@headius @enebo I can put together a VM with a exact configuration i have on my system. Would that help?

@messivanio

This comment has been minimized.

Copy link

commented Nov 19, 2013

@carlosmrce Try to set JAVA_TOOL_OPTIONS to -Dfile.encoding=UTF8 .

@carlosmrce

This comment has been minimized.

Copy link
Author

commented Nov 20, 2013

@messivanio Same result! I have tested my app on Linux and works fine. I guess JRuby isn't meant to run on Windows :-( Thanks!

@headius

This comment has been minimized.

Copy link
Member

commented Nov 26, 2013

@carlosmrce JRuby is definitely meant to run on Windows. Unfortunately we have not been able to reproduce, which makes it hard for us to fix it. Please help us find a way to reproduce, so we can fix the issue for you!

@carlosmrce

This comment has been minimized.

Copy link
Author

commented Nov 27, 2013

@headius Would it help if i give you access to a WinXP VM? I can install TeamViewer or Hamachi.

@enebo

This comment has been minimized.

Copy link
Member

commented Nov 27, 2013

I can reproduce this on Windows 7. I will see what I can figure out. We also have some other issue open on Jira (cannot find it) which has been open for a long time which seems eerily similar to this one.

@enebo

This comment has been minimized.

Copy link
Member

commented Dec 4, 2013

Ok so I am just throwing this out there since I went about this all wrong...

If I capture the output to the file and compare against JRuby and MRI on both Windows and MacOS those chars are identical. If I run it without capturing it then I see that on Windows all three lines look exactly the same whereas viewing the saved output in an editor capable of viewing UTF-8 then I see only the bottom one rendering properly.
If I run this on non-windows linux/macos I see identical output to how it is saved as a file. So up to this point the only difference is how JRuby and MRI both render to a terminal only on windows (whether mingw bash or cmd).

So I am convinced this is purely a terminal affordance thing. It is clearly doing something else because if I redirect MRI output on windows to a file and then cat it and I cat what JRuby generates they are identical as well. Sleuthing in MRI code now.

@enebo

This comment has been minimized.

Copy link
Member

commented Dec 4, 2013

Aha rb_w32_write_console

@enebo enebo closed this in 5bd0798 Dec 4, 2013

@enebo

This comment has been minimized.

Copy link
Member

commented Dec 4, 2013

Amazing if this is totally fixed but it seems to work and logically it seems like it should work. I discovered System.console() which seems capable of taking a Java String and converting it to the underlying codepage of the windows console. I suspect the part which will fail is the facility for what to do on trancoding error (Java likes to print '?'). That can be a followup bug if someone can make that happen.

Note in case this fails utterly...we can use WriteConsoleW and a couple of Windows methods using jnr-posix but that seems like an ugly set of code. Let's hope we don't need to go there.

@carlosmrce

This comment has been minimized.

Copy link
Author

commented Dec 4, 2013

@enebo Thanks a lot! I'll test as soon as 1.7.9 is out. Thanks again.

@carlosmrce

This comment has been minimized.

Copy link
Author

commented Dec 7, 2013

@enebo @headius Just tested on my system and it's fixed! Thanks a lot! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.