Skip to content
This repository

rbx 1.9 mode Regexp#match interprets pos arg in terms of bytes instead of chars. #1965

Closed
microu opened this Issue October 20, 2012 · 0 comments

1 participant

microu
microu

Regexp#match interprets pos arg in terms of bytes instead of chars. See following code.

#encoding : utf-8

s = '一000000' # first char is japanese for 'one' and uses 3 bytes in utf-8
re = /0+/

puts "Encodings #{s.encoding} #{re.encoding}"

m = re.match s, 6 # m[0] => '000' should be '0'
tag = m [0] == '0' ? '' : '[BUG]'
puts "#{tag} MATCH1 #{m.inspect}"

m = re.match s, 7 # => #<MatchData "00"> should be nil
tag = m  == nil ? '' : '[BUG]'
puts "#{tag} MATCH2 #{m.inspect}"

Tested with rubinius 2.0.0dev (1.9.3 561610b yyyy-mm-dd JI) [x86_64-unknown-linux-gnu]

microu microu closed this October 20, 2012
microu microu reopened this October 20, 2012
Dirkjan Bussink dbussink closed this in a3ce5e3 March 19, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.