StringIO#getc and #getbyte interaction is wrong (compared to MRI) #2282

Closed
solson opened this Issue Apr 16, 2013 · 0 comments

Projects

None yet

1 participant

@solson
Member
solson commented Apr 16, 2013

MRI 1.9.3 and 2.0.0:

>> require 'stringio'
=> true
>> s = "עב"; s.chars.to_a
=> ["ע", "ב"]
>> s.chars.map {|c| c.bytes.to_a }
=> [[215, 162], [215, 145]]
>> io = StringIO.new(s); io.getc
=> "ע"
>> io.getbyte
=> 215
>> io.getbyte
=> 145

Rubinius 1.9 mode:

>> require 'stringio'
=> true
>> s = "עב"; s.chars.to_a
=> ["ע", "ב"]
>> s.chars.map {|c| c.bytes.to_a }
=> [[215, 162], [215, 145]]
>> io = StringIO.new(s); io.getc
=> "ע"
>> io.getbyte
=> 162
>> io.getbyte
=> 215

As you can see, Rubinius continues from the wrong byte position after a call to getc.

@dbussink dbussink pushed a commit that closed this issue Apr 16, 2013
Scott Olson Use byte indexes in StringIO#getc.
Previously it treated d.pos as a character index and indexed into the string,
even though StringIO#getbyte treats d.pos as a byte index.

This makes getc work properly with getbyte (fixes #2282).

This also makes it much more efficient, since string indexing for
non-fixed-width encoded strings such as UTF-8 strings takes linear time
(fixes #2281).
53290f9
@dbussink dbussink closed this in 53290f9 Apr 16, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment