Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant Name Regular Expression Note #2772

Open
Sunthief opened this issue Sep 18, 2023 · 4 comments
Open

Constant Name Regular Expression Note #2772

Sunthief opened this issue Sep 18, 2023 · 4 comments

Comments

@Sunthief
Copy link

From manual page: https://php.net/language.constants


I think the note concerning the possible names of constants is incorrect.

Note: For our purposes here, a letter is a-z, A-Z, and the ASCII characters from 128 through 255 (0x80-0xff).

As fair as my knowledge goes, the x80-xFF is not ASCII/UTF code points, but applies to any bytes from 10000000 to 11111111. AT least that is how it works with variables and other names. I also tested it and emojis as names work fine.

@damianwadley
Copy link
Member

The note sounds weird to me, and is out of place given it's related to the "The name of a constant..." sentence a couple paragraphs up, but what it is trying to say is correct: the bytes 0x80-0xFF are allowed. Because PHP doesn't really care about character encoding in source files: accented characters, emoji, whatever, it's the actual bytes that matter and affect validity.

And the precise range of what "ASCII" covers probably depends on the person: it's mostly 0-127, sure, but 128-255 is part of the "extended" range so they kinda count too.

I think the source of confusion here is going to be mostly around the use of the term "characters". That should be dropped entirely and the unambiguous term "bytes" used instead. But my choice would be to remove this note entirely and rephrase the earlier paragraph to something along the lines of

The name of a constant follows the same rules as any label in PHP. A valid constant name starts with an ASCII letter or underscore, followed by any number of ASCII letters, numbers, or underscores. The bytes 0x80 through 0xFF, used by character encodings like UTF-8 and the ISO 8859 family, are allowed anywhere as well. As a regular expression, it would be expressed thusly: ^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$

Perhaps something even more than that which calls out PHP's encoding-agnosticism?

Additionally, the Variables > Basics page and user-defined functions page repeat similar statements. May be another page or two I'm not thinking of too.

@Sunthief
Copy link
Author

You might want to add to your paragraph that this means emojis and accented characters work as well, in case this is not clear enough.

@damianwadley
Copy link
Member

You might want to add to your paragraph that this means emojis and accented characters work as well, in case this is not clear enough.

Can we please not tell people that emoji are supported in names?

@Sunthief
Copy link
Author

I think it is something that most would agree is not advisable, dont get me wrong. It still might be good to know to understand the concept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants