-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add minimal encoding support to String#inspect #1945
Conversation
UTF8Printed = Rubinius::Tuple.new 256 | ||
i = 0 | ||
while i < 256 | ||
UTF8Printed[i] = toprint(i, :utf8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the symbol :utf8 used here? Isn't the flag on toprint
boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a mistake. I looked over it. In my original patch, the approach was more general envisioning to support for other encodings. But in the end, I realized the support isn't actually needed in this time. So I trimmed down the patch. This is the remnant of it....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, in Japan, we finally almost moved to UTF-8. When we process non-UTF-8 strings, any sane system immediately encodes to it.
@ryoqun good news about the UTF-8. I'm fine with the patch but could you address the spec failures? |
@brixen Thanks for accepting this! As I said before, I continued to work on fixing spec failures. |
I think since we have more encoding stuff in place, this should be easier now. Looks like MRI's inspect uses Encoding.default_external if that is an ASCII compatible encoding (which UTF-8) is. We should probably do something similar for that for inspect then and use the current code for all the other encoding cases. |
I'll work on this shortly. |
@brixen Thanks. I'm really excited with the surge of commits for encoding support! |
I've got this almost finished. I introduced a Character class so we can work with encoded strings on a character level and use methods like |
Hooray! Thanks for correctly fixing this. |
@ryoqun no problem, sorry it took so long! Slowly getting encoding completed. |
This is not really a pull request, rather a question. Can this kind of incomplete implementation be merged in?
If no, I'm fine.
In my daily use of Rubinius, I need this before @brixen's full encoding support comes, because I often encounter characters outside Ascii.
If yes, I'll continue to work on this.
"a \0二\x7f\xe3\xc7\x61保護 b c".inspect
before:
"a \x00\xE4\xBA\x8C\x7F\xE3\xC7a\xE4\xBF\x9D\xE8\xAD\xB7 b c"
after:
"a \u0000二\u007F\xE3\xC7a保護 b c"
(same behavior with MRI 1.9 in this case)