Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C API: Add PyUnicode_EqualToUTF8() function #110289

Closed
serhiy-storchaka opened this issue Oct 3, 2023 · 1 comment
Closed

C API: Add PyUnicode_EqualToUTF8() function #110289

serhiy-storchaka opened this issue Oct 3, 2023 · 1 comment
Labels
topic-C-API topic-unicode type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Oct 3, 2023

Feature or enhancement

There is public PyUnicode_CompareWithASCIIString() function. Despite it name, it compares Python string object with ISO-8859-1 encoded C string. it returns -1, 0 or 1 and never sets an error.

There is private _PyUnicode_EqualToASCIIString() function. It only works with ASCII encoded C string and crashes in debug build it it is not ASCII. It returns 0 or 1 and never sets an error.

_PyUnicode_EqualToASCIIString() is more efficient than PyUnicode_CompareWithASCIIString(), because if arguments are not equal it can simply return false instead of determining what is larger. It was the main reason of introducing it. It is also more convenient, because you do not need to add == 0 or != 0 after the call (and if it is not added, it is difficult to read).

I propose to add the latter function to the public C API, but also extend it to support UTF-8 encoded C strings. While most of use cases are ASCII-only, formally almost all C strings in the C API are UTF-8 encoded. PyUnicode_FromString() and PyUnicode_AsUTF8AndSize() used to convert between Python and C strings use UTF-8 encoding. PyTypeObject.tp_name, PyMethodDef.ml_name, PyDescrObject.d_name all are UTF-8 encoded. PyUnicode_CompareWithASCIIString() cannot be used to compare Python string with such names.

For PyASCIIObject objects the new function will be as fast as _PyUnicode_EqualToASCIIString().

Linked PRs

@serhiy-storchaka serhiy-storchaka added type-feature A feature request or enhancement topic-C-API labels Oct 3, 2023
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 3, 2023
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 3, 2023
@serhiy-storchaka serhiy-storchaka changed the title C API: Add PyUnicode_EqualToString() function C API: Add PyUnicode_EqualToUTF8() function Oct 4, 2023
@hugovk
Copy link
Member

hugovk commented Nov 9, 2023

Thanks!

@hugovk hugovk closed this as completed Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-C-API topic-unicode type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants