-
-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to access unicode bytes or alternative ord
formats
#851
Comments
I don't think there should be an |
I opted for a |
These functionalities sound useful, but implementing them as options of the
It's also worthwhile to give |
I can see the point of moving the builtin BTW, I can't see anything in the Elvish documentation that makes it crystal clear that text strings are assumed to be Unicode. Are these commands expected to work if the locale is something like GB2312? |
@krader1961 niche maybe, but converting a string from and to codepoints and bytes are pretty fundamental operations that can be building blocks for higher level functions. Go treats UTF-8 specially; here is a a good introduction. Elvish doesn't work well with other encodings. |
I read the Go article you linked to long ago. Albeit before I had contributed any Go language changes to any project -- private or public. It seems like this is a "here be dragons" situation that would benefit from some clarification in the documentation. |
Move builtin string function ord and chr to the str module and rename to to str:to-codepoints and str:from-codepoints respectively as suggested in elves#851.
Move builtin string function ord and chr to the str module and rename to to str:to-codepoints and str:from-codepoints respectively as suggested in elves#851.
Add from-utf8-bytes and to-utf8-bytes functions to the str module. This functions differ from their *-codepoints in that they handle utf8 bytes instead of whole codepoints. Closes elves#851
Move builtin string function ord and chr to the str module and rename to to str:to-codepoints and str:from-codepoints respectively as suggested in elves#851.
Add from-utf8-bytes and to-utf8-bytes functions to the str module. This functions differ from their *-codepoints in that they handle utf8 bytes instead of whole codepoints. Closes elves#851
Move builtin string function ord and chr to the str module and rename to to str:to-codepoints and str:from-codepoints respectively as suggested in elves#851.
Add from-utf8-bytes and to-utf8-bytes functions to the str module. This functions differ from their *-codepoints in that they handle utf8 bytes instead of whole codepoints. Closes elves#851
Move builtin string function ord and chr to the str module and rename to to str:to-codepoints and str:from-codepoints respectively as suggested in elves#851.
Add from-utf8-bytes and to-utf8-bytes functions to the str module. This functions differ from their *-codepoints in that they handle utf8 bytes instead of whole codepoints. Closes elves#851
Hi,
it would be nice to be able to either directly access byte values of unicode runes or have
ord
support utf-8 encoding.e.g.
ord
currently outputs utf-32how about to add an option to output utf-8? e.g.
ord &encoding=utf8 🐈€ ▶ [0xf0 0x9f 0x90 0x88] ▶ [0xe2 0x82 0xac]
or
ord &encoding=utf8 🐈€ ▶ 0xf09f9088 ▶ 0xe282ac
The text was updated successfully, but these errors were encountered: