[JRUBY-6668] StringScanner#scan_until spins forever on UTF-8 data #174

sgonyea opened this Issue May 17, 2012 · 8 comments

2 participants


While running the tests in the ruby library 'mustache' (link: https://github.com/defunkt/mustache), one test in particular is failing:


JRuby dies calling StringScanner#scan_until here:


You can reproduce the issue with the following:

require 'strscan'
regex = /(^[ \t]*)?\{\{/
text = "<h1>中文 test</h1>\n\n{{> utf8_partial}}\n"
text.force_encoding 'BINARY'
scanner = StringScanner.new(text)
scanner.scan_until(regex) # Fans spin up, and this method never returns.

This seems to happen regardless of whether or not JRuby is in 1.8 or 1.9 mode. I am running this test like so:

JRUBY_OPTS=--1.9 ruby -I"lib:test" test/mustache_test.rb -n test_utf8 -v

I've also run it with: JRUBY_OPTS="--1.9 LC_ALL=en_US.UTF-8"

It appears that this affects UTF-8 characters. If I replace the chinese characters with "foo bar", then there is no problem.

I moved this issue here, as JIRA was butchering the UTF-8:


JRuby Team member

Confirmed on master.

JRuby Team member

I suspect this is due to missing encoding logic in StringScanner.


Yeah, I traced its execution to inside Joni:


But I wasn't sure if it was a JRuby or a Joni bug. (http://jira.codehaus.org/browse/JRUBY-6668#comment-298976)

Thanks for checking into it.


Though I'm inclined to call this a Joni bug as well. The code in Matcher.java probably shouldn't always assume that enc.length will be positive, given that it seems to return -1 in some cases.

@headius headius added a commit that closed this issue May 17, 2012
@headius headius Fix #174
[JRUBY-6668] StringScanner#scan_until spins forever on UTF-8 data

We were not preparing the regex properly. Added that, and the
given example completes normally.
@headius headius closed this in c550df6 May 17, 2012
JRuby Team member

FWIW, Joni is a little flaky when you feed it data that's not encoded like the pattern expects, and we've run into many cases where it will get stuck looping forever. It probably could use more defensive checks, but they might take away from raw speed...


Wow, you are awesome. That was fast.

JRuby Team member

I got lucky :)

JRuby Team member

I didn't think about 1.8 mode with this patch, and ended up causing it to be more strict than it's supposed to be. Fix coming to revert the logic for 1.8 mode only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment