-
Notifications
You must be signed in to change notification settings - Fork 68
Closed
Labels
Description
Unicode is good, unicode is great, but it has a few undesirable pitfalls sometimes:
- Byte strings: low-level C users know what I'm talking about, a Nit
Text
object is necessarily Unicode, which leaves the guy wanting to use a string as a byte sequence sadly coping with invalid byte sequences (i.e. cannot do shellcoding in nit without expressing all, including text, in a hexdigest) - Chars used as their value as explained in We need an easy service to get the ascii code point from a literal char #1718
For these uses, no real easy solution exists now in Nit.
As such, this issue proposes the introduction of prefixed strings and chars.
It will first be recognized grammatically, and some prefixes will be implemented in the compiler later.
Some candidate prefixes on strings are:
- b"Byte\xfe" => for a byte string with potentially invalid unicode chars in it.
- re"\w*?+" => for Regular Expressions
- raw"\n\rEscaped" => For raw strings, where escape sequences are treated as normal characters
Some candidate prefixes on chars are:
- a'\n' => ASCII value of char '\n' as a Byte
- u'𐏓' => Unicode code-point of char '𐏓' as an Int
This issue is also a feature request for people wishing for more specific or exotic prefixes, this is the place and the moment to express your needs :)