-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document PyUnicode_IsIdentifier() function #83681
Comments
The PyUnicode_IsIdentifier() function should be documented. Attachd PR documents it. |
Since the behavior was changed, I think we need a versionchanged directive. This function was added in 3.0. Prior to 3.3 it was always successful (if you pass an unicode object, that is required for most of PyUnicode API). Py_FatalError was added in 3.3, because not all unicode object are now valid. But I am not sure that it could be triggered in real code without using some broken extensions. It may be safer to return 0 instead of returning -1 or crashing if PyUnicode_READY() fails. If return -1 and set an exception, it can lead to an unexpected behavior if the calling code expects only 0/1, and maybe even to crash in debug build due to set an exception and returning value. |
Done.
_PyUnicode_Ready() can fail in two cases: the Py_UNICODE* string contains a character outside [U+0000; U+10ffff] character range, or a memory allocation failed. Both cases look very unlikely, knowning that _PyUnicode_Ready() is only called if PyUnicode_IsIdentifier() is called on a string using the Python legacy C API (Py_UNICODE*).
Functions which don't use PyObject* as return type have the tradition of returning -1. A few examples:
I don't see why PyUnicode_IsIdentifier() deserves to behave differently.
I updated my PR to document the behavior change in the "Changes in the C API" section of What's New in Python 3.9 documentation. Previously, the code crashed Python (Py_FatalError). So I'm confident that this case never occurred ever in the wild. We got zero bug report about that. |
Other examples are:
They are bad examples, but can't be changed for backward compatibility. I wonder whether PyUnicode_IsIdentifier should also kept unchanged for backward compatibility. There is also a number of *_Check functions which always succeeds. Other example is _PyUnicode_EqualToASCIIString where the behavior is intentional. PyUnicode_IsIdentifier() was not documented before. It makes easier to change its behavior. |
I don't think that we should follow these bad examples :-) IMO ignoring silently bugs is a bad programming practice. I don't expect PyUnicode_IsIdentifier() to be used outside Python. If it's used, I don't see why it would be a "non-ready string" in practice. The risk of regression is very close to zero. If it happens, it's no longer our fault, since I documented the behavior change ;-) Again, right now, Python does crash if this corner case occurs... |
It is not convenient to check the result for error. After we remove support of old PyUnicode API, PyUnicode_IsIdentifier() will be always succeeded. Note that PyUnicode_IsIdentifier() still can crash if you pass a non-PyUnicode object or NULL. It is a prerequisite of this function that the argument must be a PyUnicode object. Add just yet one condition: it must be a ready PyUnicode object. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: