Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Add minimal encoding support to String#inspect #1945

Closed
wants to merge 1 commit into from

3 participants

@ryoqun
Collaborator

This is not really a pull request, rather a question. Can this kind of incomplete implementation be merged in?

If no, I'm fine.

In my daily use of Rubinius, I need this before @brixen's full encoding support comes, because I often encounter characters outside Ascii.

If yes, I'll continue to work on this.

"a \0二\x7f\xe3\xc7\x61保護 b c".inspect

before: "a \x00\xE4\xBA\x8C\x7F\xE3\xC7a\xE4\xBF\x9D\xE8\xAD\xB7 b c"

after: "a \u0000二\u007F\xE3\xC7a保護 b c" (same behavior with MRI 1.9 in this case)

kernel/delta/ctype.rb
@@ -32,6 +36,13 @@ def self.toprint(num)
i += 1
end
+ UTF8Printed = Rubinius::Tuple.new 256
+ i = 0
+ while i < 256
+ UTF8Printed[i] = toprint(i, :utf8)
@brixen Owner
brixen added a note

Why is the symbol :utf8 used here? Isn't the flag on toprint boolean?

@ryoqun Collaborator
ryoqun added a note

This is just a mistake. I looked over it. In my original patch, the approach was more general envisioning to support for other encodings. But in the end, I realized the support isn't actually needed in this time. So I trimmed down the patch. This is the remnant of it....

@ryoqun Collaborator
ryoqun added a note

By the way, in Japan, we finally almost moved to UTF-8. When we process non-UTF-8 strings, any sane system immediately encodes to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@brixen
Owner

@ryoqun good news about the UTF-8. I'm fine with the patch but could you address the spec failures?

@ryoqun
Collaborator

@brixen Thanks for accepting this! As I said before, I continued to work on fixing spec failures.

@dbussink
Owner

I think since we have more encoding stuff in place, this should be easier now. Looks like MRI's inspect uses Encoding.default_external if that is an ASCII compatible encoding (which UTF-8) is. We should probably do something similar for that for inspect then and use the current code for all the other encoding cases.

@brixen
Owner

I'll work on this shortly.

@ryoqun
Collaborator

@brixen Thanks. I'm really excited with the surge of commits for encoding support!

@brixen
Owner

I've got this almost finished. I introduced a Character class so we can work with encoded strings on a character level and use methods like #ascii? or #printable? without putting those on String and dealing with collisions.

@brixen brixen closed this in 16a3156
@warrenseen warrenseen referenced this pull request from a commit
Commit has since been removed from the repository and is no longer available.
@Gibheer Gibheer referenced this pull request from a commit
Commit has since been removed from the repository and is no longer available.
@ryoqun
Collaborator

Hooray! Thanks for correctly fixing this.

@brixen
Owner

@ryoqun no problem, sorry it took so long! Slowly getting encoding completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Oct 8, 2012
  1. @ryoqun
This page is out of date. Refresh to see the latest.
View
4 kernel/common/string.rb
@@ -446,10 +446,6 @@ def insert(index, other)
ControlCharacters = [10, 9, 7, 11, 12, 13, 27, 8]
ControlPrintValue = ["\\n", "\\t", "\\a", "\\v", "\\f", "\\r", "\\e", "\\b"]
- def inspect
- "\"#{transform(Rubinius::CType::Printed, true)}\""
- end
-
def ljust(width, padstr=" ")
justify(width, :left, padstr)
end
View
4 kernel/common/string18.rb
@@ -685,4 +685,8 @@ def []=(index, replacement, three=undefined)
end
return replacement
end
+
+ def inspect
+ "\"#{transform(Rubinius::CType::Printed, true)}\""
+ end
end
View
30 kernel/common/string19.rb
@@ -810,4 +810,34 @@ def []=(index, replacement, three=undefined)
end
return replacement
end
+
+ def inspect
+ current_encoding = encoding
+ desired_encoding = Encoding.default_internal || Encoding.default_external
+
+ string = generate_inspected_string(current_encoding, desired_encoding)
+
+ "\"#{string}\""
+ end
+
+ def generate_inspected_string(current_encoding, desired_encoding)
+ if current_encoding == desired_encoding and
+ current_encoding == Encoding::UTF_8
+ table = Rubinius::CType::UTF8Printed
+
+ inspected_string = each_char.collect do |char|
+ if not char.valid_encoding? or char.ord < 256
+ table[char.force_encoding(Encoding::BINARY).ord]
+ else
+ char
+ end
+ end.join
+ inspected_string.gsub!(/(#[$@{])/, '\\\\\1')
+
+ Rubinius::Type.infect(inspected_string, self)
+ inspected_string
+ else
+ transform(Rubinius::CType::Printed, true)
+ end
+ end
end
View
22 kernel/delta/ctype.rb
@@ -1,7 +1,7 @@
# -*- encoding: us-ascii -*-
module Rubinius::CType
- def self.toprint(num)
+ def self.toprint(num, utf8=false)
# The character literals (?x) are Fixnums in 1.8 and Strings in 1.9
# so we use literal values instead so this is 1.8/1.9 compatible.
case num
@@ -14,11 +14,20 @@ def self.toprint(num)
when 13; '\r'
when 27; '\e'
when 34; '\"'
- when 35; Rubinius::Tuple['#$', '\#$', '#@', '\#@', '#{', '\#{', '#', '#']
+ when 35;
+ unless utf8
+ Rubinius::Tuple['#$', '\#$', '#@', '\#@', '#{', '\#{', '#', '#']
+ else
+ '#' # TODO: '#' escaping is handled by String#generate_inspected_string
+ end
when 92; '\\\\'
else
if num < 32 || num > 126
- unprintable_chr(num)
+ unless utf8
+ unprintable_chr(num)
+ else
+ unprintable_utf8_chr(num)
+ end
else
num.chr
end
@@ -32,6 +41,13 @@ def self.toprint(num)
i += 1
end
+ UTF8Printed = Rubinius::Tuple.new 256
+ i = 0
+ while i < 256
+ UTF8Printed[i] = toprint(i, true)
+ i += 1
+ end
+
def toprint
Printed[self]
end
View
6 kernel/delta/ctype18.rb
@@ -6,4 +6,8 @@ def self.unprintable_chr(num)
c = num.to_s 8
str.copy_from c, 0, c.size, 4-c.size
end
-end
+
+ def self.unprintable_utf8_chr(num)
+ unprintable_chr(num)
+ end
+end
View
14 kernel/delta/ctype19.rb
@@ -6,4 +6,16 @@ def self.unprintable_chr(num)
c = num.to_s(16).upcase
str.copy_from c, 0, c.size, 4-c.size
end
-end
+
+ def self.unprintable_utf8_chr(num)
+ if num <= 0x7f
+ str = "\\u0000"
+ str.modify!
+
+ c = num.to_s(16).upcase
+ str.copy_from c, 0, c.size, 6-c.size
+ else
+ unprintable_chr(num)
+ end
+ end
+end
Something went wrong with that request. Please try again.