-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solaris: Fix broken Unicode encoding in non-UTF locales #87833
Comments
On Linux, wchar_t values are mapped to their UTF-8 counterparts; however, that does not have to be the case as the standard allows any arbitrary representation to be used, and this is the case for Solaris. In Oracle Solaris, the internal form of wchar_t is specific to a locale; in the Unicode locales, wchar_t has the UTF-32 Unicode encoding form, and other locales have different representations [1]. This is an issue because Python expects wchar_t to correspond with Unicode, which on Oracle Solaris with non-UTF locale results either in errors (values are outside the Unicode range) or in output with different symbols. Unicode locales work as expected, but they are not an acceptable workaround for some Oracle Solaris users that cannot use Unicode encoding for various reasons. Because of that, we fixed it a few months ago with a patch to Is something like this acceptable or should it be fixed on a different place/in a different way? All comments are appreciated. [1] https://docs.oracle.com/cd/E36784_01/html/E39536/gmwkm.html |
I forgot to mention: this affects Oracle Solaris. I tested this on SmartOS, and I cannot reproduce it there as it seems that they are using Unicode representation for all locales. Based on the documentation, this might also affect other systems as well (e.g. HP UIX specifically says: 'These values may not be compatible with values obtained by specifying other locales that are supported'), but it's hard to tell without testing that. This one liner breaks with ValueError: character U+30000069 is not in range [U+0000; U+10ffff] if the issue is present: |
Backport to 3.8 may be more complicated. It's up to you to decide if you want to backport it or not. I merged your 3.9 backport, it looks very close to the change made in the main branch. |
Do you want to attempt to backport the fix to 3.8, or can this issue be closed? |
Sorry for delayed response. Considering that we are not delivering or using 3.8 in any way and this issue doesn't seem to impact anybody else, we can omit the backport to 3.8. I will prepare another PR with a news fragment, and after that, this can be considered solved and closed. |
I close the issue, but you can still reference the bpo issue number for your PR with the changelog (NEWS) entry. |
I merged your PR and backported it to add a NEWS entry, thanks. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: