Description
While running the tests in the ruby library 'mustache' (link: https://github.com/defunkt/mustache), one test in particular is failing:
https://github.com/defunkt/mustache/blob/master/test/mustache_test.rb#L510-522
JRuby dies calling StringScanner#scan_until here:
https://github.com/defunkt/mustache/blob/master/lib/mustache/parser.rb#L231
You can reproduce the issue with the following:
require 'strscan'
regex = /(^[ \t]*)?\{\{/
text = "<h1>中文 test</h1>\n\n{{> utf8_partial}}\n"
text.force_encoding 'BINARY'
scanner = StringScanner.new(text)
scanner.scan_until(regex) # Fans spin up, and this method never returns.
This seems to happen regardless of whether or not JRuby is in 1.8 or 1.9 mode. I am running this test like so:
JRUBY_OPTS=--1.9 ruby -I"lib:test" test/mustache_test.rb -n test_utf8 -v
I've also run it with: JRUBY_OPTS="--1.9 LC_ALL=en_US.UTF-8"
It appears that this affects UTF-8 characters. If I replace the chinese characters with "foo bar", then there is no problem.
I moved this issue here, as JIRA was butchering the UTF-8:
http://jira.codehaus.org/browse/JRUBY-6668?jwupdated=35361&focusedCommentId=299014#comment-299014