Add minimal encoding support to String#inspect #1945

Closed
wants to merge 1 commit into
from

3 participants

@ryoqun
Rubinius member

This is not really a pull request, rather a question. Can this kind of incomplete implementation be merged in?

If no, I'm fine.

In my daily use of Rubinius, I need this before @brixen's full encoding support comes, because I often encounter characters outside Ascii.

If yes, I'll continue to work on this.

"a \0二\x7f\xe3\xc7\x61保護 b c".inspect

before: "a \x00\xE4\xBA\x8C\x7F\xE3\xC7a\xE4\xBF\x9D\xE8\xAD\xB7 b c"

after: "a \u0000二\u007F\xE3\xC7a保護 b c" (same behavior with MRI 1.9 in this case)

@brixen brixen and 1 other commented on an outdated diff Oct 5, 2012
kernel/delta/ctype.rb
@@ -32,6 +36,13 @@ def self.toprint(num)
i += 1
end
+ UTF8Printed = Rubinius::Tuple.new 256
+ i = 0
+ while i < 256
+ UTF8Printed[i] = toprint(i, :utf8)
@brixen
Rubinius member
brixen added a line comment Oct 5, 2012

Why is the symbol :utf8 used here? Isn't the flag on toprint boolean?

@ryoqun
Rubinius member
ryoqun added a line comment Oct 6, 2012

This is just a mistake. I looked over it. In my original patch, the approach was more general envisioning to support for other encodings. But in the end, I realized the support isn't actually needed in this time. So I trimmed down the patch. This is the remnant of it....

@ryoqun
Rubinius member
ryoqun added a line comment Oct 6, 2012

By the way, in Japan, we finally almost moved to UTF-8. When we process non-UTF-8 strings, any sane system immediately encodes to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@brixen
Rubinius member

@ryoqun good news about the UTF-8. I'm fine with the patch but could you address the spec failures?

@ryoqun
Rubinius member

@brixen Thanks for accepting this! As I said before, I continued to work on fixing spec failures.

@dbussink
Rubinius member

I think since we have more encoding stuff in place, this should be easier now. Looks like MRI's inspect uses Encoding.default_external if that is an ASCII compatible encoding (which UTF-8) is. We should probably do something similar for that for inspect then and use the current code for all the other encoding cases.

@brixen
Rubinius member

I'll work on this shortly.

@ryoqun
Rubinius member

@brixen Thanks. I'm really excited with the surge of commits for encoding support!

@brixen
Rubinius member

I've got this almost finished. I introduced a Character class so we can work with encoded strings on a character level and use methods like #ascii? or #printable? without putting those on String and dealing with collisions.

@brixen brixen closed this in 16a3156 Nov 19, 2012
@ryoqun
Rubinius member

Hooray! Thanks for correctly fixing this.

@brixen
Rubinius member

@ryoqun no problem, sorry it took so long! Slowly getting encoding completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment