Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
JRuby does not Handle UTF-8 Source Files #480
#!/bin/bash S_NONASCII_SCRIPT="puts 'Tsitaatsõne'" FILE_A="./a.rb" FILE_B="./b.rb" echo "$S_NONASCII_SCRIPT" > $FILE_A echo "" > $FILE_B echo "puts 'Start of test.'" >> $FILE_B echo "require './a.rb'" >> $FILE_B echo "puts 'End of test.\n'" >> $FILE_B echo "" echo "Running ./a.rb with plain ruby:" echo "" `which ruby` -Ku ./a.rb echo "" echo "Running ./b.rb with plain ruby:" echo "" `which ruby` -Ku ./b.rb echo "" echo "" echo "Running jruby with non-ASCII script directly from console:" echo "" `which jruby` -Ku --1.9 -e "$S_NONASCII_SCRIPT" echo "" echo "" echo "Running ./a.rb with jruby:" echo "" `which jruby` -Ku --1.9 ./a.rb echo "" echo "" echo "Running ./b.rb with jruby:" echo "" `which jruby` -Ku --1.9 ./b.rb echo "" echo "" # The console output of this script, if the bash comments and # a single space were removed from the start of the lines: # # Running ./a.rb with plain ruby: # # Tsitaatsõne # # Running ./b.rb with plain ruby: # # Start of test. # Tsitaatsõne # End of test.\n # # # Running jruby with non-ASCII script directly from console: # # Tsitaatsõne # # # Running ./a.rb with jruby: # # SyntaxError: ./a.rb:2: invalid multibyte char (US-ASCII) # # # Running ./b.rb with jruby: # # Start of test. # SyntaxError: /home/ts2/tmp/xx5/test_case/./a.rb:2: invalid multibyte char (US-ASCII) # require at org/jruby/RubyKernel.java:991 # (root) at /home/ts2/m_local/bin_p/Ruby/JRuby/paigaldatult/v_1_7_0/lib/ruby/shared/rubygems/custom_require.rb:1 # (root) at ./b.rb:3
What regards to the addition of encoding header, then my answer is that thank You for the workaround suggestion. I'll keep it in mind as a measure of last resort.
I already moved back to the "vanilla Ruby". I find it easier to re-compile the classical, tried and tested, Ruby rather than to risk that considerable amount of work ends up being wasted due to lack of basics. Besides, the Java has 2 byte Char's and the stabilization of Unicode support quirk-s will probably take some time even for the Java VM main language, the Java.
referenced this issue
Dec 20, 2013
I fixed this for the cases you present. There are a few other paths not obeying -K encoding for parsing, but I'm not sure if they should get the same treatment (evals, mostly...some of which transcode to a Java String before parsing and others which do not and may be getting wrong encoding as a result). If someone wants to explore those possibilties, it would be really excellent.
In any case, I did not come up with a test case for this because -K is deprecated (and warns you of that in verbose mode) and because command-line stuff is a pain to test. If someone here would like to contribute a test, perhaps in test/test_command_line_switches.rb, we'd be happy to incorporate it.
Not merged to master yet, but I'll do that now.