You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is public PyUnicode_CompareWithASCIIString() function. Despite it name, it compares Python string object with ISO-8859-1 encoded C string. it returns -1, 0 or 1 and never sets an error.
There is private _PyUnicode_EqualToASCIIString() function. It only works with ASCII encoded C string and crashes in debug build it it is not ASCII. It returns 0 or 1 and never sets an error.
_PyUnicode_EqualToASCIIString() is more efficient than PyUnicode_CompareWithASCIIString(), because if arguments are not equal it can simply return false instead of determining what is larger. It was the main reason of introducing it. It is also more convenient, because you do not need to add == 0 or != 0 after the call (and if it is not added, it is difficult to read).
I propose to add the latter function to the public C API, but also extend it to support UTF-8 encoded C strings. While most of use cases are ASCII-only, formally almost all C strings in the C API are UTF-8 encoded. PyUnicode_FromString() and PyUnicode_AsUTF8AndSize() used to convert between Python and C strings use UTF-8 encoding. PyTypeObject.tp_name, PyMethodDef.ml_name, PyDescrObject.d_name all are UTF-8 encoded. PyUnicode_CompareWithASCIIString() cannot be used to compare Python string with such names.
For PyASCIIObject objects the new function will be as fast as _PyUnicode_EqualToASCIIString().
Feature or enhancement
There is public
PyUnicode_CompareWithASCIIString()
function. Despite it name, it compares Python string object with ISO-8859-1 encoded C string. it returns -1, 0 or 1 and never sets an error.There is private
_PyUnicode_EqualToASCIIString()
function. It only works with ASCII encoded C string and crashes in debug build it it is not ASCII. It returns 0 or 1 and never sets an error._PyUnicode_EqualToASCIIString()
is more efficient thanPyUnicode_CompareWithASCIIString()
, because if arguments are not equal it can simply return false instead of determining what is larger. It was the main reason of introducing it. It is also more convenient, because you do not need to add== 0
or!= 0
after the call (and if it is not added, it is difficult to read).I propose to add the latter function to the public C API, but also extend it to support UTF-8 encoded C strings. While most of use cases are ASCII-only, formally almost all C strings in the C API are UTF-8 encoded.
PyUnicode_FromString()
andPyUnicode_AsUTF8AndSize()
used to convert between Python and C strings use UTF-8 encoding.PyTypeObject.tp_name
,PyMethodDef.ml_name
,PyDescrObject.d_name
all are UTF-8 encoded.PyUnicode_CompareWithASCIIString()
cannot be used to compare Python string with such names.For PyASCIIObject objects the new function will be as fast as
_PyUnicode_EqualToASCIIString()
.Linked PRs
The text was updated successfully, but these errors were encountered: