[JRUBY-6668] StringScanner#scan_until spins forever on UTF-8 data

While running the tests in the ruby library 'mustache' (link: https://github.com/defunkt/mustache), one test in particular is failing:

https://github.com/defunkt/mustache/blob/master/test/mustache_test.rb#L510-522

JRuby dies calling StringScanner#scan_until here:

https://github.com/defunkt/mustache/blob/master/lib/mustache/parser.rb#L231

You can reproduce the issue with the following:

``` ruby
require 'strscan'
regex = /(^[ \t]*)?\{\{/
text = "<h1>中文 test</h1>\n\n{{> utf8_partial}}\n"
text.force_encoding 'BINARY'
scanner = StringScanner.new(text)
scanner.scan_until(regex) # Fans spin up, and this method never returns.
```

This seems to happen regardless of whether or not JRuby is in 1.8 or 1.9 mode. I am running this test like so:

JRUBY_OPTS=--1.9 ruby -I"lib:test" test/mustache_test.rb -n test_utf8 -v

I've also run it with: JRUBY_OPTS="--1.9 LC_ALL=en_US.UTF-8"

It appears that this affects UTF-8 characters. If I replace the chinese characters with "foo bar", then there is no problem.

I moved this issue here, as JIRA was butchering the UTF-8:

http://jira.codehaus.org/browse/JRUBY-6668?jwupdated=35361&focusedCommentId=299014#comment-299014


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[JRUBY-6668] StringScanner#scan_until spins forever on UTF-8 data #174

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[JRUBY-6668] StringScanner#scan_until spins forever on UTF-8 data #174

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions