-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create_unicode_buffer() fails on non-BMP strings on Windows #64064
Comments
create_unicode_buffer() fails on Windows if the initializer string contains unicode code points outside of the Basic Multilingual Plane and an explicit length is not specified. The problem appears to be rooted in the fact that, since PEP-393, len() returns the number of code points, which does not always correspond to the number of 16-bit wchar words needed for the encoding on Windows. Because of that, the preallocated c_wchar buffer will be too short for the UTF-16 string. The following small snippet demonstrates the problem: from ctypes import create_unicode_buffer
b = create_unicode_buffer("\U00028318\U00028319")
print(b) File "c:\Python33\lib\ctypes\init.py", line 294, in create_unicode_buffer |
I can confirm that this problem still exists so can someone take a look please, thanks. |
When sizeof(c_wchar) == 2, it can just count the number of non-BMP ordinals in the string. Another approach would be to use size = pythonapi.PyUnicode_AsWideChar(init, None, 0), but then the whole function may as well be implemented in the _ctypes extension module. |
I'm still able to reproduce this issue with ctypes under Python 3.7.0 |
I have created a pull request for this issue. Please take a look. |
Thanks Zackery Spytz for the fix. Thanks Gergely Erdélyi for the bug report! Sorry for the long delay. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: