Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

String#chop is not multibyte aware. #2013

Closed
floere opened this Issue Nov 14, 2012 · 3 comments

Comments

Projects
None yet
3 participants

floere commented Nov 14, 2012

Hi all,

While running the specs for Picky on Rubinius, I encountered a problem regarding multibyte chopping in Strings.

Compare Rubinius:

~/temp/picky/server $ ruby -v
rubinius 2.0.0rc1 (1.9.3 2242f14b 2012-11-02 JI) [x86_64-apple-darwin12.2.0]
~/temp/picky/server $ ruby -e 'puts "日本語".chop'
日本?
~/temp/picky/server $ ruby -e 'puts "日本語".chop.chop.chop'
日本

with MRI:

~/temp/picky/server $ ruby -v
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin12.1.0]
~/temp/picky/server $ ruby -e 'puts "日本語".chop'
日本
~/temp/picky/server $ ruby -e 'puts "日本語".chop.chop.chop'

It seems that Rubinius just chops off a byte. See:

~/temp/picky/server $ ruby -e 'p "日本語"'
"\xE6\x97\xA5\xE6\x9C\xAC\xE8\xAA\x9E"
~/temp/picky/server $ ruby -e 'p "日本語".chop'
"\xE6\x97\xA5\xE6\x9C\xAC\xE8\xAA"

My expectation was that a whole Japanese character is chopped off the end (per #chop).

Cheers and thanks,
Florian

ciniglio commented Dec 5, 2012

I wrote up a fix here. Let me know if you'd like me to make changes.

https://gist.github.com/4218251

Owner

jc00ke commented Dec 5, 2012

PR is in #2079

@jc00ke jc00ke closed this Dec 5, 2012

floere commented Dec 6, 2012

Good to know I might be able to use it for Picky soonish :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment