Add minimal encoding support to String#inspect #1945

ryoqun · 2012-10-05T14:41:02Z

This is not really a pull request, rather a question. Can this kind of incomplete implementation be merged in?

If no, I'm fine.

In my daily use of Rubinius, I need this before @brixen's full encoding support comes, because I often encounter characters outside Ascii.

If yes, I'll continue to work on this.

"a \0二\x7f\xe3\xc7\x61保護 b c".inspect

before: "a \x00\xE4\xBA\x8C\x7F\xE3\xC7a\xE4\xBF\x9D\xE8\xAD\xB7 b c"

after: "a \u0000二\u007F\xE3\xC7a保護 b c" (same behavior with MRI 1.9 in this case)

brixen · 2012-10-05T17:33:24Z

kernel/delta/ctype.rb

+  UTF8Printed = Rubinius::Tuple.new 256
+  i = 0
+  while i < 256
+    UTF8Printed[i] = toprint(i, :utf8)


Why is the symbol :utf8 used here? Isn't the flag on toprint boolean?

This is just a mistake. I looked over it. In my original patch, the approach was more general envisioning to support for other encodings. But in the end, I realized the support isn't actually needed in this time. So I trimmed down the patch. This is the remnant of it....

By the way, in Japan, we finally almost moved to UTF-8. When we process non-UTF-8 strings, any sane system immediately encodes to it.

brixen · 2012-10-07T20:49:34Z

@ryoqun good news about the UTF-8. I'm fine with the patch but could you address the spec failures?

ryoqun · 2012-10-08T04:16:08Z

@brixen Thanks for accepting this! As I said before, I continued to work on fixing spec failures.

dbussink · 2012-11-07T12:28:38Z

I think since we have more encoding stuff in place, this should be easier now. Looks like MRI's inspect uses Encoding.default_external if that is an ASCII compatible encoding (which UTF-8) is. We should probably do something similar for that for inspect then and use the current code for all the other encoding cases.

brixen · 2012-11-07T18:17:36Z

I'll work on this shortly.

ryoqun · 2012-11-09T05:29:06Z

@brixen Thanks. I'm really excited with the surge of commits for encoding support!

brixen · 2012-11-17T04:55:18Z

I've got this almost finished. I introduced a Character class so we can work with encoded strings on a character level and use methods like #ascii? or #printable? without putting those on String and dealing with collisions.

ryoqun · 2012-11-29T06:37:26Z

Hooray! Thanks for correctly fixing this.

brixen · 2012-11-29T19:04:16Z

@ryoqun no problem, sorry it took so long! Slowly getting encoding completed.

brixen reviewed Oct 5, 2012
View reviewed changes

Add minimal encoding support to String#inspect

6f04480

brixen closed this in 16a3156 Nov 19, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add minimal encoding support to String#inspect #1945

Add minimal encoding support to String#inspect #1945

ryoqun commented Oct 5, 2012

brixen Oct 5, 2012

ryoqun Oct 6, 2012

ryoqun Oct 6, 2012

brixen commented Oct 7, 2012

ryoqun commented Oct 8, 2012

dbussink commented Nov 7, 2012

brixen commented Nov 7, 2012

ryoqun commented Nov 9, 2012

brixen commented Nov 17, 2012

ryoqun commented Nov 29, 2012

brixen commented Nov 29, 2012

Add minimal encoding support to String#inspect #1945

Add minimal encoding support to String#inspect #1945

Conversation

ryoqun commented Oct 5, 2012

brixen Oct 5, 2012

Choose a reason for hiding this comment

ryoqun Oct 6, 2012

Choose a reason for hiding this comment

ryoqun Oct 6, 2012

Choose a reason for hiding this comment

brixen commented Oct 7, 2012

ryoqun commented Oct 8, 2012

dbussink commented Nov 7, 2012

brixen commented Nov 7, 2012

ryoqun commented Nov 9, 2012

brixen commented Nov 17, 2012

ryoqun commented Nov 29, 2012

brixen commented Nov 29, 2012