rbx 1.9 mode Regexp#match interprets pos arg in terms of bytes instead of chars. #1965

Closed
microu opened this Issue Oct 20, 2012 · 0 comments

Projects

None yet

1 participant

@microu

Regexp#match interprets pos arg in terms of bytes instead of chars. See following code.

#encoding : utf-8

s = '一000000' # first char is japanese for 'one' and uses 3 bytes in utf-8
re = /0+/

puts "Encodings #{s.encoding} #{re.encoding}"

m = re.match s, 6 # m[0] => '000' should be '0'
tag = m [0] == '0' ? '' : '[BUG]'
puts "#{tag} MATCH1 #{m.inspect}"

m = re.match s, 7 # => #<MatchData "00"> should be nil
tag = m  == nil ? '' : '[BUG]'
puts "#{tag} MATCH2 #{m.inspect}"

Tested with rubinius 2.0.0dev (1.9.3 561610b yyyy-mm-dd JI) [x86_64-unknown-linux-gnu]

@microu microu closed this Oct 20, 2012
@microu microu reopened this Oct 21, 2012
@dbussink dbussink closed this in a3ce5e3 Mar 19, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment