Skip to content

[JRUBY-6668] StringScanner#scan_until spins forever on UTF-8 data #174

Closed
@sgonyea

Description

@sgonyea

While running the tests in the ruby library 'mustache' (link: https://github.com/defunkt/mustache), one test in particular is failing:

https://github.com/defunkt/mustache/blob/master/test/mustache_test.rb#L510-522

JRuby dies calling StringScanner#scan_until here:

https://github.com/defunkt/mustache/blob/master/lib/mustache/parser.rb#L231

You can reproduce the issue with the following:

require 'strscan'
regex = /(^[ \t]*)?\{\{/
text = "<h1>中文 test</h1>\n\n{{> utf8_partial}}\n"
text.force_encoding 'BINARY'
scanner = StringScanner.new(text)
scanner.scan_until(regex) # Fans spin up, and this method never returns.

This seems to happen regardless of whether or not JRuby is in 1.8 or 1.9 mode. I am running this test like so:

JRUBY_OPTS=--1.9 ruby -I"lib:test" test/mustache_test.rb -n test_utf8 -v

I've also run it with: JRUBY_OPTS="--1.9 LC_ALL=en_US.UTF-8"

It appears that this affects UTF-8 characters. If I replace the chinese characters with "foo bar", then there is no problem.

I moved this issue here, as JIRA was butchering the UTF-8:

http://jira.codehaus.org/browse/JRUBY-6668?jwupdated=35361&focusedCommentId=299014#comment-299014

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions