Strings Escaped characters section uses {} inconsistently, unclear #2778

boxcleverliam · 2023-09-20T22:10:44Z

The descriptions of the escape sequences include regular expressions to describe all possible octal, hexadecimal, and Unicode patterns. Curly braces { and } are used with different meanings, making it unclear how to use these.

https://www.php.net/manual/en/language.types.string.php

For octal and hexadecimal, the curly braces { and } are part of the regular expression. They indicate the number of allowed occurrences from the preceding set. So for octal, 1 to 3 characters from the set [0-7], and for hexadecimal, 1 to 2 characters from the set [0-9A-Fa-f].

However, for Unicode it shows \u{[0-9A-Fa-f]+} . The curly braces { and } are NOT part of the regular expression. They are required as part of the sequence when it is written in the string. This is not clear, and there are also no examples on that page, even in the comments.

I think we should clarify which part is the regular expression by repeating it in the description, and give examples of each.

My suggestion:

Octal: the sequence of characters matching the regular expression [0-7]{1,3} is a character in octal notation (e.g. "\101" === "A"), which silently overflows to fit in a byte (e.g. "\400" === "\000")
Hexadecimal: the sequence of characters matching the regular expression [0-9A-Fa-f]{1,2} is a character in hexadecimal notation (e.g. "\x41" === "A")
Unicode: the sequence of characters matching the regular expression[0-9A-Fa-f]+ is a Unicode codepoint, which will be output to the string as that codepoint's UTF-8 representation. The braces are required in the sequence. E.g. "\u{41}" === "A"

The text was updated successfully, but these errors were encountered:

Girgias · 2023-09-22T15:27:40Z

Providing a PR with the suggestion would make it easier to see exactly what you want.

While the suggestion seems sensible, I have no idea how you would want to implement/render it right now.

damianwadley · 2023-09-22T15:56:23Z

While the suggestion seems sensible, I have no idea how you would want to implement/render it right now.

I figure either we (a) go straight PCRE and escape the literal {s, which would be a very easy thing to do to close this out, or (b) consider that many people aren't particularly well versed in regular expressions and so adopt a more human-friendly ABNF-style syntax instead.

The latter being, of course, a little more involved: the docs use regexes for lots of these things (not that they'd all have to be fixed at once), nevermind that it should probably get a discussion and consensus before someone dives into changing everything. But that's where my vote would go.

boxcleverliam · 2023-09-25T17:39:33Z

@Girgias I made a pull request here. #2793
I haven't worked with this file format before, so I hope that this is useful.

@damianwadley Maybe there is value in improving this piece of documentation on its own as it is an early topic in learning the language. There may be a more human-friendly syntax for these sequences, but I believe that some examples would show it best. Perhaps a fuller example like this:

echo "The \"banknote\" emoji\n\t\u{1F4B5}\n has a \$ symbol on it.";
// Output:
//The "banknote" emoji
//	💵
// has a $ symbol on it.

Girgias added the Category: Strings label Sep 22, 2023

Girgias linked a pull request Sep 26, 2023 that will close this issue

Update descriptions of octal, hex and unicode escape sequences #2793

Merged

Girgias closed this as completed in #2793 Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strings Escaped characters section uses {} inconsistently, unclear #2778

Strings Escaped characters section uses {} inconsistently, unclear #2778

boxcleverliam commented Sep 20, 2023

Girgias commented Sep 22, 2023

damianwadley commented Sep 22, 2023

boxcleverliam commented Sep 25, 2023

Strings Escaped characters section uses {} inconsistently, unclear #2778

Strings Escaped characters section uses {} inconsistently, unclear #2778

Comments

boxcleverliam commented Sep 20, 2023

Girgias commented Sep 22, 2023

damianwadley commented Sep 22, 2023

boxcleverliam commented Sep 25, 2023