Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially Confusing Function Description of mb_ord() #2963

Closed
Sunthief opened this issue Nov 23, 2023 · 5 comments
Closed

Potentially Confusing Function Description of mb_ord() #2963

Sunthief opened this issue Nov 23, 2023 · 5 comments

Comments

@Sunthief
Copy link

From manual page: https://php.net/function.mb-ord


The description is potentially confusing, as Unicode code point values are hexadecimal, whereas the function returns the decimal value. So for 'A', the return value is 65, not 41.

This:

Returns the Unicode code point value of the given character.

Should look like this:

Returns the Unicode code point value of the given character in decimal notation.

@Sunthief Sunthief changed the title Potentially Confusing Function Description Potentially Confusing Function Description of mb_ord() Nov 23, 2023
@damianwadley
Copy link
Member

A number is a number - there is no "decimal notation" for it. It's like talking about a timezone for a Unix timestamp.

Personally, I'd keep the description, then add a note along the lines of

To convert the code point to a hexadecimal string, use a function like dechex, (s)printf, or base_convert.

and update/add an example like

<?php
printf("This elephant 🐘 is Unicode character U+%04X", mb_ord("🐘"));
?>
This elephant 🐘 is Unicode character U+1F418

@Girgias
Copy link
Member

Girgias commented Nov 24, 2023

Unicode code points are not hexadecimal, they are numbers, representing them as hexadecimals is just a convenience thing, like with a lot of things in computing due to it being a somewhat nicer way to represent binary.

@Girgias Girgias closed this as not planned Won't fix, can't repro, duplicate, stale Nov 24, 2023
@Sunthief
Copy link
Author

Sunthief commented Nov 24, 2023

Decimal notation means just the number in base 10 and not base 16. The comparison with time zones would make more sense if you think of base 10 as one time zone and base 16 as another. Or Unix timestampe and Unix timestamp in hexadecimal base.

I checked the documentation and the convention is to use hexadecimal numbers: https://www.unicode.org/versions/Unicode15.1.0/appA.pdf
So I would argue it is more than a convenience, but the standard, and the function deviates from that standard in this regard, and PHP's usual behaviour also, as \u{...} does expect a hexadecimal number, and an additonal clarification could go a long way.

@Girgias
Copy link
Member

Girgias commented Nov 27, 2023

A number is a number, regardless of its representation. And what you are saying makes no sense.

A codepoint is a number it doesn't care about its representation, be that binary, octal, decimal, hexadecimal, in base 425 it is still a number. And PHP uses the int type to return an integer which is again a number. The fact that the default representation of an int in PHP is base 10 is because most humans use base 10 to count.

The fact that something requires a specific representation of a number is, frankly, none of the business of this function, or any other function returning int.

@Sunthief
Copy link
Author

Sunthief commented Nov 27, 2023

Look, all I was trying to say is that Unicode code points are generally given/represented in a hexadecimal notation, it' the Unicode convention. Someone who reads up on the function might be confused that it returns the value in decimal and I think it makes sense to add a sentence as an explanation to the description. That was my point and to me it makes sense.
If you disagree, that is only fair, but I am looking at it from a learner's perspective. I also appreciate you taking the time to answer, my intention is to be helpful, and I hope I didn't come across the wrong way or upset you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants