You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The descriptions of the escape sequences include regular expressions to describe all possible octal, hexadecimal, and Unicode patterns. Curly braces { and } are used with different meanings, making it unclear how to use these.
For octal and hexadecimal, the curly braces { and } are part of the regular expression. They indicate the number of allowed occurrences from the preceding set. So for octal, 1 to 3 characters from the set [0-7], and for hexadecimal, 1 to 2 characters from the set [0-9A-Fa-f].
However, for Unicode it shows \u{[0-9A-Fa-f]+} . The curly braces { and } are NOT part of the regular expression. They are required as part of the sequence when it is written in the string. This is not clear, and there are also no examples on that page, even in the comments.
I think we should clarify which part is the regular expression by repeating it in the description, and give examples of each.
My suggestion:
Octal: the sequence of characters matching the regular expression [0-7]{1,3} is a character in octal notation (e.g. "\101" === "A"), which silently overflows to fit in a byte (e.g. "\400" === "\000")
Hexadecimal: the sequence of characters matching the regular expression [0-9A-Fa-f]{1,2} is a character in hexadecimal notation (e.g. "\x41" === "A")
Unicode: the sequence of characters matching the regular expression[0-9A-Fa-f]+ is a Unicode codepoint, which will be output to the string as that codepoint's UTF-8 representation. The braces are required in the sequence. E.g. "\u{41}" === "A"
The text was updated successfully, but these errors were encountered:
While the suggestion seems sensible, I have no idea how you would want to implement/render it right now.
I figure either we (a) go straight PCRE and escape the literal {s, which would be a very easy thing to do to close this out, or (b) consider that many people aren't particularly well versed in regular expressions and so adopt a more human-friendly ABNF-style syntax instead.
The latter being, of course, a little more involved: the docs use regexes for lots of these things (not that they'd all have to be fixed at once), nevermind that it should probably get a discussion and consensus before someone dives into changing everything. But that's where my vote would go.
@Girgias I made a pull request here. #2793
I haven't worked with this file format before, so I hope that this is useful.
@damianwadley Maybe there is value in improving this piece of documentation on its own as it is an early topic in learning the language. There may be a more human-friendly syntax for these sequences, but I believe that some examples would show it best. Perhaps a fuller example like this:
echo "The \"banknote\" emoji\n\t\u{1F4B5}\n has a \$ symbol on it.";
// Output:
//The "banknote" emoji
// 💵
// has a $ symbol on it.
The descriptions of the escape sequences include regular expressions to describe all possible octal, hexadecimal, and Unicode patterns. Curly braces
{
and}
are used with different meanings, making it unclear how to use these.https://www.php.net/manual/en/language.types.string.php
For octal and hexadecimal, the curly braces
{
and}
are part of the regular expression. They indicate the number of allowed occurrences from the preceding set. So for octal, 1 to 3 characters from the set[0-7]
, and for hexadecimal, 1 to 2 characters from the set[0-9A-Fa-f]
.However, for Unicode it shows
\u{[0-9A-Fa-f]+}
. The curly braces{
and}
are NOT part of the regular expression. They are required as part of the sequence when it is written in the string. This is not clear, and there are also no examples on that page, even in the comments.I think we should clarify which part is the regular expression by repeating it in the description, and give examples of each.
My suggestion:
[0-7]{1,3}
is a character in octal notation (e.g."\101" === "A"
), which silently overflows to fit in a byte (e.g. "\400" === "\000")[0-9A-Fa-f]{1,2}
is a character in hexadecimal notation (e.g."\x41" === "A"
)[0-9A-Fa-f]+
is a Unicode codepoint, which will be output to the string as that codepoint's UTF-8 representation. The braces are required in the sequence. E.g."\u{41}" === "A"
The text was updated successfully, but these errors were encountered: