Skip to content
This repository

Add minimal encoding support to String#inspect #1945

Closed
wants to merge 1 commit into from

3 participants

Ryo Onodera Brian Shirai Dirkjan Bussink
Ryo Onodera
Collaborator

This is not really a pull request, rather a question. Can this kind of incomplete implementation be merged in?

If no, I'm fine.

In my daily use of Rubinius, I need this before @brixen's full encoding support comes, because I often encounter characters outside Ascii.

If yes, I'll continue to work on this.

"a \0二\x7f\xe3\xc7\x61保護 b c".inspect

before: "a \x00\xE4\xBA\x8C\x7F\xE3\xC7a\xE4\xBF\x9D\xE8\xAD\xB7 b c"

after: "a \u0000二\u007F\xE3\xC7a保護 b c" (same behavior with MRI 1.9 in this case)

kernel/delta/ctype.rb
... ... @@ -32,6 +36,13 @@ def self.toprint(num)
32 36 i += 1
33 37 end
34 38
  39 + UTF8Printed = Rubinius::Tuple.new 256
  40 + i = 0
  41 + while i < 256
  42 + UTF8Printed[i] = toprint(i, :utf8)
3
Brian Shirai Owner
brixen added a note

Why is the symbol :utf8 used here? Isn't the flag on toprint boolean?

Ryo Onodera Collaborator
ryoqun added a note

This is just a mistake. I looked over it. In my original patch, the approach was more general envisioning to support for other encodings. But in the end, I realized the support isn't actually needed in this time. So I trimmed down the patch. This is the remnant of it....

Ryo Onodera Collaborator
ryoqun added a note

By the way, in Japan, we finally almost moved to UTF-8. When we process non-UTF-8 strings, any sane system immediately encodes to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Brian Shirai
Owner

@ryoqun good news about the UTF-8. I'm fine with the patch but could you address the spec failures?

Ryo Onodera
Collaborator

@brixen Thanks for accepting this! As I said before, I continued to work on fixing spec failures.

Dirkjan Bussink
Owner

I think since we have more encoding stuff in place, this should be easier now. Looks like MRI's inspect uses Encoding.default_external if that is an ASCII compatible encoding (which UTF-8) is. We should probably do something similar for that for inspect then and use the current code for all the other encoding cases.

Brian Shirai
Owner

I'll work on this shortly.

Ryo Onodera
Collaborator

@brixen Thanks. I'm really excited with the surge of commits for encoding support!

Brian Shirai
Owner

I've got this almost finished. I introduced a Character class so we can work with encoded strings on a character level and use methods like #ascii? or #printable? without putting those on String and dealing with collisions.

Warren Seen warrenseen referenced this pull request from a commit
Commit has since been removed from the repository and is no longer available.
Gibheer Gibheer referenced this pull request from a commit
Commit has since been removed from the repository and is no longer available.
Ryo Onodera
Collaborator

Hooray! Thanks for correctly fixing this.

Brian Shirai
Owner

@ryoqun no problem, sorry it took so long! Slowly getting encoding completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 1 unique commit by 1 author.

Oct 08, 2012
Ryo Onodera ryoqun Add minimal encoding support to String#inspect 6f04480
This page is out of date. Refresh to see the latest.
4 kernel/common/string.rb
@@ -446,10 +446,6 @@ def insert(index, other)
446 446 ControlCharacters = [10, 9, 7, 11, 12, 13, 27, 8]
447 447 ControlPrintValue = ["\\n", "\\t", "\\a", "\\v", "\\f", "\\r", "\\e", "\\b"]
448 448
449   - def inspect
450   - "\"#{transform(Rubinius::CType::Printed, true)}\""
451   - end
452   -
453 449 def ljust(width, padstr=" ")
454 450 justify(width, :left, padstr)
455 451 end
4 kernel/common/string18.rb
@@ -685,4 +685,8 @@ def []=(index, replacement, three=undefined)
685 685 end
686 686 return replacement
687 687 end
  688 +
  689 + def inspect
  690 + "\"#{transform(Rubinius::CType::Printed, true)}\""
  691 + end
688 692 end
30 kernel/common/string19.rb
@@ -810,4 +810,34 @@ def []=(index, replacement, three=undefined)
810 810 end
811 811 return replacement
812 812 end
  813 +
  814 + def inspect
  815 + current_encoding = encoding
  816 + desired_encoding = Encoding.default_internal || Encoding.default_external
  817 +
  818 + string = generate_inspected_string(current_encoding, desired_encoding)
  819 +
  820 + "\"#{string}\""
  821 + end
  822 +
  823 + def generate_inspected_string(current_encoding, desired_encoding)
  824 + if current_encoding == desired_encoding and
  825 + current_encoding == Encoding::UTF_8
  826 + table = Rubinius::CType::UTF8Printed
  827 +
  828 + inspected_string = each_char.collect do |char|
  829 + if not char.valid_encoding? or char.ord < 256
  830 + table[char.force_encoding(Encoding::BINARY).ord]
  831 + else
  832 + char
  833 + end
  834 + end.join
  835 + inspected_string.gsub!(/(#[$@{])/, '\\\\\1')
  836 +
  837 + Rubinius::Type.infect(inspected_string, self)
  838 + inspected_string
  839 + else
  840 + transform(Rubinius::CType::Printed, true)
  841 + end
  842 + end
813 843 end
22 kernel/delta/ctype.rb
... ... @@ -1,7 +1,7 @@
1 1 # -*- encoding: us-ascii -*-
2 2
3 3 module Rubinius::CType
4   - def self.toprint(num)
  4 + def self.toprint(num, utf8=false)
5 5 # The character literals (?x) are Fixnums in 1.8 and Strings in 1.9
6 6 # so we use literal values instead so this is 1.8/1.9 compatible.
7 7 case num
@@ -14,11 +14,20 @@ def self.toprint(num)
14 14 when 13; '\r'
15 15 when 27; '\e'
16 16 when 34; '\"'
17   - when 35; Rubinius::Tuple['#$', '\#$', '#@', '\#@', '#{', '\#{', '#', '#']
  17 + when 35;
  18 + unless utf8
  19 + Rubinius::Tuple['#$', '\#$', '#@', '\#@', '#{', '\#{', '#', '#']
  20 + else
  21 + '#' # TODO: '#' escaping is handled by String#generate_inspected_string
  22 + end
18 23 when 92; '\\\\'
19 24 else
20 25 if num < 32 || num > 126
21   - unprintable_chr(num)
  26 + unless utf8
  27 + unprintable_chr(num)
  28 + else
  29 + unprintable_utf8_chr(num)
  30 + end
22 31 else
23 32 num.chr
24 33 end
@@ -32,6 +41,13 @@ def self.toprint(num)
32 41 i += 1
33 42 end
34 43
  44 + UTF8Printed = Rubinius::Tuple.new 256
  45 + i = 0
  46 + while i < 256
  47 + UTF8Printed[i] = toprint(i, true)
  48 + i += 1
  49 + end
  50 +
35 51 def toprint
36 52 Printed[self]
37 53 end
6 kernel/delta/ctype18.rb
@@ -6,4 +6,8 @@ def self.unprintable_chr(num)
6 6 c = num.to_s 8
7 7 str.copy_from c, 0, c.size, 4-c.size
8 8 end
9   -end
  9 +
  10 + def self.unprintable_utf8_chr(num)
  11 + unprintable_chr(num)
  12 + end
  13 +end
14 kernel/delta/ctype19.rb
@@ -6,4 +6,16 @@ def self.unprintable_chr(num)
6 6 c = num.to_s(16).upcase
7 7 str.copy_from c, 0, c.size, 4-c.size
8 8 end
9   -end
  9 +
  10 + def self.unprintable_utf8_chr(num)
  11 + if num <= 0x7f
  12 + str = "\\u0000"
  13 + str.modify!
  14 +
  15 + c = num.to_s(16).upcase
  16 + str.copy_from c, 0, c.size, 6-c.size
  17 + else
  18 + unprintable_chr(num)
  19 + end
  20 + end
  21 +end

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.