-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promote PyUnicode_AsUTF8AndSize to be available with the limited API (PEP 384) #85950
Comments
This function is incredibly useful for efficient interoperability between Python and other languages with UTF-8 based strings (e.g. Rust). Right now it's not possible to do interop without several copies/allocations if you're trying to build an abi3 wheel. This is tactically relevant to me here: PyO3/pyo3#1125 This API has been stable since it was introduced in Python 3.1, so I think making it stable would be appropriate. Inada, Benjamin suggested I should ask you for your feedback on this. I'm happy to send a pull request. |
+1. It is a very important API. |
What about PyUnicode_AsUTF8? Should it be made public too or left for internal use only? What about third-party implementations of Python? How hard to implement this API on an implementation without reference counts? It is interesting to hear the expert opinion of the core developers of PyPy. |
I think less is more, one API is plenty :-) It looks to me like the API is already supported on PyPy, so I think it's fine from that perspective: https://foss.heptapod.net/pypy/pypy/-/blob/branch/py3.7/pypy/module/cpyext/unicodeobject.py#L493 |
PyUnicode_AsUTF8() is used 3 times more than PyUnicode_AsUTF8AndSize(). $ find -type f -name '*.c' -exec egrep 'PyUnicode_AsUTF8AndSize\(' '{}' + | wc -l
35
$ find -type f -name '*.c' -exec egrep 'PyUnicode_AsUTF8\(' '{}' + | wc -l
101 |
PyUnicode_AsUTF8 is useful "API". But it can be implemented as C macro, C inline function, or functions/macros in any other languages using PyUnicode_AsUTF8AndSize. PyUnicode_AsUTF8AndSize is more important for "ABI". |
I agree about PyUnicode_AsUTF8. But I think it would be worth to ask PyPy team about PyUnicode_AsUTF8AndSize. An alternate C API is PyUnicode_GetUTF8Buffer (bpo-39087). It requires explicit releasing the buffer after use, so it can be used even on implementations with moving garbage collector. |
Py_buffer is not part of the limited API at all, so I don't think it's usable for this. |
Oh, would not be worth to add Py_buffer to the limited API? |
It's a big project I think :-) Py_Buffer is allocated on the stack, so either we'd have to agree to never change it's ABI (size, alignment, etc.) or we'd need to completely change the interface. |
Agreed that there's no way we can make Py_buffer part of the limited ABI. I just looked over the PR and it's missing a What's New entry (e.g. https://docs.python.org/3/whatsnew/3.9.html#c-api-changes). Other than that, looks fine to me. |
Thanks, Alex! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: