Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SWIG_AsWCharPtrAndSize does not work correctly on Windows with code point > 2 byte #2909

Open
Daniel-da6a opened this issue May 15, 2024 · 0 comments
Labels

Comments

@Daniel-da6a
Copy link

Daniel-da6a commented May 15, 2024

Current state on master brach / Swig v4.2.1:

For wide strings the fragment SWIG_AsWCharPtrAndSize (Lib/python/pywstrings.swg) is used. This function does not return the correct wchar_t array on Windows, if the original UTF-8 string contains code points which need more than two bytes for their representation.

For example, the UTF-8 string in Python is "🤠ABC" will be returned as "🤠AB".

This is caused by the use of PyUnicode_GetSize in combination with PyUnicode_AsWideChar and the fact, that wchar_t is only two bytes on Windows.

PyUnicode_GetSize is used to obtain the size in code units, for the example above this would be 4. The function PyUnicode_AsWideChar reads at most size wchar_t characters. Here the miss match is happening, since wchar_t is only 2 bytes on windows, the number of wchar_t characters (5) is not the same as the numer of code units (4). As a result not all of the characters are read.

https://github.com/swig/swig/blob/7c2b245ceafb49552e559f8056c2618e84aad0b7/Lib/python/pywstrings.swg#L31C1-L44C74

The use of PyUnicode_AsWideCharString might be a solution. Alternatively PyUnicode_AsWideChar(SWIGPY_UNICODE_ARG(obj), NULL, 0) could be used to obtain the correct number of wchar_t elements on Windows.

@Daniel-da6a Daniel-da6a changed the title SWIG_AsWCharPtrAndSize does not work correctly on Windows with code point > 2 bye SWIG_AsWCharPtrAndSize does not work correctly on Windows with code point > 2 byte May 15, 2024
@ojwb ojwb added the Python label Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants