Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minimal encoding support to String#inspect #1945

Closed
wants to merge 1 commit into from

Conversation

ryoqun
Copy link
Member

@ryoqun ryoqun commented Oct 5, 2012

This is not really a pull request, rather a question. Can this kind of incomplete implementation be merged in?

If no, I'm fine.

In my daily use of Rubinius, I need this before @brixen's full encoding support comes, because I often encounter characters outside Ascii.

If yes, I'll continue to work on this.

"a \0二\x7f\xe3\xc7\x61保護 b c".inspect

before: "a \x00\xE4\xBA\x8C\x7F\xE3\xC7a\xE4\xBF\x9D\xE8\xAD\xB7 b c"

after: "a \u0000二\u007F\xE3\xC7a保護 b c" (same behavior with MRI 1.9 in this case)

UTF8Printed = Rubinius::Tuple.new 256
i = 0
while i < 256
UTF8Printed[i] = toprint(i, :utf8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the symbol :utf8 used here? Isn't the flag on toprint boolean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a mistake. I looked over it. In my original patch, the approach was more general envisioning to support for other encodings. But in the end, I realized the support isn't actually needed in this time. So I trimmed down the patch. This is the remnant of it....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, in Japan, we finally almost moved to UTF-8. When we process non-UTF-8 strings, any sane system immediately encodes to it.

@brixen
Copy link
Member

brixen commented Oct 7, 2012

@ryoqun good news about the UTF-8. I'm fine with the patch but could you address the spec failures?

@ryoqun
Copy link
Member Author

ryoqun commented Oct 8, 2012

@brixen Thanks for accepting this! As I said before, I continued to work on fixing spec failures.

@dbussink
Copy link
Contributor

dbussink commented Nov 7, 2012

I think since we have more encoding stuff in place, this should be easier now. Looks like MRI's inspect uses Encoding.default_external if that is an ASCII compatible encoding (which UTF-8) is. We should probably do something similar for that for inspect then and use the current code for all the other encoding cases.

@brixen
Copy link
Member

brixen commented Nov 7, 2012

I'll work on this shortly.

@ryoqun
Copy link
Member Author

ryoqun commented Nov 9, 2012

@brixen Thanks. I'm really excited with the surge of commits for encoding support!

@brixen
Copy link
Member

brixen commented Nov 17, 2012

I've got this almost finished. I introduced a Character class so we can work with encoded strings on a character level and use methods like #ascii? or #printable? without putting those on String and dealing with collisions.

@brixen brixen closed this in 16a3156 Nov 19, 2012
@ryoqun
Copy link
Member Author

ryoqun commented Nov 29, 2012

Hooray! Thanks for correctly fixing this.

@brixen
Copy link
Member

brixen commented Nov 29, 2012

@ryoqun no problem, sorry it took so long! Slowly getting encoding completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants