Closed
Description
When compiling mruby with UTF-8 Strings (e.g. setting CFLAGS="-DMRB_UTF8_STRING"
), mruby incorrectly computes the length of strings with invalid UTF-8 byte sequences.
mruby
$ git rev-parse HEAD
69482dbc8e590ed66f0944e9b48c4f9c2f83c873
$ git show
commit 69482dbc8e590ed66f0944e9b48c4f9c2f83c873 (HEAD -> master, origin/master, origin/HEAD)
Merge: 6587269a f7ff4810
Author: Yukihiro "Matz" Matsumoto <matz@ruby.or.jp>
Date: Fri Jan 8 23:10:49 2021 +0900
Merge pull request #5265 from shuujii/reapply-116e128b-because-it-is-back-at-456878ba
Reapply 116e128b because it is back at 456878ba
Reproduction steps
rake clean
CFLAGS="-DMRB_UTF8_STRING" rake
Executing in mirb
:
$ ./bin/mirb
mirb - Embeddable Interactive Ruby Shell
> xs = [192, 128].pack("C*")
=> "��"
> xs.bytes
=> [192, 128]
> xs.length
=> 1
Reference MRI execution
$ irb
[2.6.6] > xs = [192, 128].pack("C*")
=> "\xC0\x80"
[2.6.6] > xs.bytes
=> [192, 128]
[2.6.6] > xs.length
=> 2
With forced UTF-8 encoding:
$ irb
[2.6.6] > xs = [192, 128].pack("C*")
=> "\xC0\x80"
[2.6.6] > xs = xs.force_encoding(Encoding::UTF_8)
=> "\xC0\x80"
[2.6.6] > xs.encoding
=> #<Encoding:UTF-8>
[2.6.6] > xs.length
=> 2
Metadata
Metadata
Assignees
Labels
No labels