Skip to content

Loading…

String#tr isn't encoding aware in 1.9 mode #2157

Closed
dbussink opened this Issue · 0 comments

1 participant

@dbussink
Rubinius member

Following snippet shows the problem:

# encoding: utf-8

str = "椎名深夏"
a = "\u0080\u0082\u0083\u0084\u0085\u0086\u0087\u0088\u0089\u008A\u008B\u008C\u008E\u0091\u0092\u0093\u0094\u0095\u0096\u0097\u0098\u0099\u009A\u009B\u009C\u009E\u009F"
b = "€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ"

p str.tr(a, b)

Output on MRI:

"椎名深夏"

Output on Rubinius:

"\xe6\xa4\xcb名深夏"

Problem is that characters are replaced based on bytes, not characters so it corrupts the string. Extracted from #2108 and used for example in Builder inside Rails.

@dbussink dbussink closed this in 77ad1e5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.