Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Invalid UTF-8 not rejected #1989

Closed
headius opened this Issue · 1 comment

2 participants

@headius

I just filed this bug with MRI for allowing invalid UTF-8 to propagate through the parser and several String operations while failing on others. Rubinius appears to behave the same for most of these cases...

https://bugs.ruby-lang.org/issues/7282

Given the following script:

# encoding: UTF-8

p("Hello, \x96 world!")
p("Hello, \x96 world!".encoding)
p(("Hello, \x96 world!".encode("UTF-8") rescue 'FAIL'))
"Hello, \x96 world!".each_char{|x| print x}
puts
p(("Hello, \x96 world!".encode("UTF-16") rescue 'FAIL'))
p(("Hello, \x96 world!".match /.*/ rescue 'FAIL'))

Here's output from Rubinius and other Ruby impls:

system ~/projects/jruby $ ../rubinius/bin/rbx -X19 -v blah.rb
rubinius 2.0.0rc1 (1.9.3 c2bceca6 2012-11-02 JI) [x86_64-apple-darwin11.4.0]
"Hello, \x96 world!"
#<Encoding:UTF-8>
"Hello, \x96 world!"
Hello, ? world!
"FAIL"
#<MatchData "Hello, ? world!">

system ~/projects/jruby $ ruby-1.9.3 -v blah.rb
ruby 1.9.3p253 (2012-07-04 revision 36307) [x86_64-darwin11.4.0]
blah.rb:9: warning: ambiguous first argument; put parentheses or even spaces
"Hello, \x96 world!"
#<Encoding:UTF-8>
"Hello, \x96 world!"
Hello, ? world!
"FAIL"
"FAIL"

system ~/projects/jruby $ ruby-2.0.0 -v blah.rb
ruby 2.0.0dev (2012-11-01 trunk 37415) [x86_64-darwin11.4.0]
blah.rb:9: warning: ambiguous first argument; put parentheses or even spaces
"Hello, \x96 world!"
#<Encoding:UTF-8>
"Hello, \x96 world!"
Hello, ? world!
"FAIL"
"FAIL"

system ~/projects/jruby $ jruby blah.rb
"Hello, \x96 world!"
#<Encoding:UTF-8>
"FAIL"
Hello, ? world!
"FAIL"
"FAIL"
@Peeja

Unless it's very clear what it should do instead (and to me it's not, but I haven't worked with encodings a whole lot), it seems to me we should wait until MRI decides how to handle this case, and then make sure it gets into RubySpec. Thoughts?

@dbussink dbussink closed this issue from a commit
@dbussink dbussink Check string encoding validity before trying to match
Regular expression matching only should happen on validly encoded
strings, so check this state and raise exception if appropriate.

Fixes #1989
c4b9842
@dbussink dbussink closed this in c4b9842
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.