C API: Add PyUnicode_EqualToUTF8() function #110289

serhiy-storchaka · 2023-10-03T14:35:42Z

Feature or enhancement

There is public PyUnicode_CompareWithASCIIString() function. Despite it name, it compares Python string object with ISO-8859-1 encoded C string. it returns -1, 0 or 1 and never sets an error.

There is private _PyUnicode_EqualToASCIIString() function. It only works with ASCII encoded C string and crashes in debug build it it is not ASCII. It returns 0 or 1 and never sets an error.

_PyUnicode_EqualToASCIIString() is more efficient than PyUnicode_CompareWithASCIIString(), because if arguments are not equal it can simply return false instead of determining what is larger. It was the main reason of introducing it. It is also more convenient, because you do not need to add == 0 or != 0 after the call (and if it is not added, it is difficult to read).

I propose to add the latter function to the public C API, but also extend it to support UTF-8 encoded C strings. While most of use cases are ASCII-only, formally almost all C strings in the C API are UTF-8 encoded. PyUnicode_FromString() and PyUnicode_AsUTF8AndSize() used to convert between Python and C strings use UTF-8 encoding. PyTypeObject.tp_name, PyMethodDef.ml_name, PyDescrObject.d_name all are UTF-8 encoded. PyUnicode_CompareWithASCIIString() cannot be used to compare Python string with such names.

For PyASCIIObject objects the new function will be as fast as _PyUnicode_EqualToASCIIString().

Linked PRs

gh-110289: C API: Add PyUnicode_EqualToUTF8() function #110297

The text was updated successfully, but these errors were encountered:

…F8AndSize() functions (GH-110297)

hugovk · 2023-11-09T21:48:12Z

Thanks!

serhiy-storchaka added type-feature A feature request or enhancement topic-C-API labels Oct 3, 2023

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 3, 2023

pythongh-110289: C API: Add PyUnicode_EqualToString() function

f736862

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 3, 2023

pythongh-110289: C API: Add PyUnicode_EqualToString() function

d39945e

bedevere-app bot mentioned this issue Oct 3, 2023

gh-110289: C API: Add PyUnicode_EqualToUTF8() function #110297

Merged

serhiy-storchaka added the topic-unicode label Oct 3, 2023

serhiy-storchaka changed the title ~~C API: Add PyUnicode_EqualToString() function~~ C API: Add PyUnicode_EqualToUTF8() function Oct 4, 2023

serhiy-storchaka added a commit that referenced this issue Oct 11, 2023

gh-110289: C API: Add PyUnicode_EqualToUTF8() and PyUnicode_EqualToUT…

eb50cd3

…F8AndSize() functions (GH-110297)

hugovk closed this as completed Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C API: Add PyUnicode_EqualToUTF8() function #110289

C API: Add PyUnicode_EqualToUTF8() function #110289

serhiy-storchaka commented Oct 3, 2023 •

edited by bedevere-app bot

hugovk commented Nov 9, 2023

C API: Add PyUnicode_EqualToUTF8() function #110289

C API: Add PyUnicode_EqualToUTF8() function #110289

Comments

serhiy-storchaka commented Oct 3, 2023 • edited by bedevere-app bot

Feature or enhancement

Linked PRs

hugovk commented Nov 9, 2023

serhiy-storchaka commented Oct 3, 2023 •

edited by bedevere-app bot