Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled. #91156
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
assignee = None closed_at = None created_at = <Date 2022-03-13.06:09:52.037> labels = ['expert-unicode', '3.11'] title = 'Make encoding="locale" uses locale encoding even in UTF-8 mode is enabled.' updated_at = <Date 2022-04-04.02:47:05.537> user = 'https://github.com/methane'
activity = <Date 2022-04-04.02:47:05.537> actor = 'methane' assignee = 'none' closed = False closed_date = None closer = None components = ['Unicode'] creation = <Date 2022-03-13.06:09:52.037> creator = 'methane' dependencies =  files =  hgrepos =  issue_num = 47000 keywords = ['patch'] message_count = 16.0 messages = ['415028', '415118', '415146', '415147', '415219', '415240', '415767', '415768', '415769', '415851', '415867', '415922', '416329', '416330', '416332', '416649'] nosy_count = 5.0 nosy_names = ['lemburg', 'vstinner', 'ezio.melotti', 'methane', 'eryksun'] pr_nums = ['32003', '32068'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = None url = 'https://bugs.python.org/issue47000' versions = ['Python 3.11']
The text was updated successfully, but these errors were encountered:
I am not sure that UTF-8 mode becomes the default or not.
So I think
Currently, UTF-8 mode affects to
I created a related topic on discuss.python.org.
If we recommend
If we don't change
There are multiple "locale encodings":
#if defined(__ANDROID__) || defined(__VXWORKS__) // Use UTF-8 as the locale encoding, ignore the LC_CTYPE locale. // See _Py_GetLocaleEncoding(), PyUnicode_DecodeLocale() // and PyUnicode_EncodeLocale(). # define _Py_FORCE_UTF8_LOCALE #endif #if defined(_Py_FORCE_UTF8_LOCALE) || defined(__APPLE__) // Use UTF-8 as the filesystem encoding. // See PyUnicode_DecodeFSDefaultAndSize(), PyUnicode_EncodeFSDefault(), // Py_DecodeLocale() and Py_EncodeLocale(). # define _Py_FORCE_UTF8_FS_ENCODING #endif
See bpo-43552 "Add locale.get_locale_encoding() and locale.get_current_locale_encoding()" (rejected).
Marc-Andre Lemburg dislikes locale.getpreferredencoding(False) API and suggested adding a new function locale.getencoding() with no argument:
I created another topic relating this issue.
If we add another option (e.g. legacy_text_encoding), we do not need to change UTF-8 mode behavior.
FWIW: I don't think the "locale" encoding is a good idea. Instead of
When it comes to encodings, explicit is better than implicit.
If an application wants to work with some user defined locale settings,
There are too many ways this can be done and trying to build
IMO it's a different use case and it should be a different thing. Changing encoding="locale" today is too late, since it's already shipped in Python 3.10 (PEP-597).
I proposed the "current locale" name to distinguish it from the existing "locale":
The unclear part to me is if "current locale" must change if the LC_CTYPE locale is changed, or if it should be read once at startup and then never change.
There *are* use case to really read the *current* LC_CTYPE locale encoding. There is already C API for that:
See also the "current_locale" parameter of the private API _Py_EncodeLocaleEx() and _Py_DecodeLocaleEx().
None of these functions do locale.setlocale(locale.LC_CTYPE, "") to get the user preferred encoding.
Only the locale.getpreferredencoding() function uses locale.setlocale(locale.LC_CTYPE, "").
Usage of locale.getpreferredencoding() should be discouraged in the documentation, but I don't think that it can be deprecated and scheduled for removal right now: too much code rely on it :-(
So we have 3 encodings:
Examples of usage:
Yes, althoguh PYTHONLEGACYWINDOWSFSENCODING takes priority.
Hmm, I don't add it to the PEP-686 because it is not relating to UTF-8 mode nor EncodingWarning.
Note that we have
sys.getlocaleencoding() versus locale.getencoding().
For me, the Python locale module should use the C API to access the Unix locales like LC_CTYPE, nl_langinfo(CODESET), etc.
The sys module are more for things specific to Python, like sys.getfilesystemencoding().
Since sys.getlocaleencoding() would be a fixed value for the whole process life time, I agree that the sys module is a better place.
I can write a PR adding sys.getlocaleencoding() if we agree on the API.
I am not sure about we really need "locale encoding at Python startup".
For this issue, I don't want to change
On the other hand, I know Eryk wants to support locale on Windows. So
Please see https://bugs.python.org/issue47000#msg415769 for what Victor
In particular, the locale module uses the "no underscore" convention.
I would like to reiterate my concern with the "locale" encoding, though.
As mentioned earlier, I believe it adds too much magic. It would be better
It's better to expose easy to use APIs to access the various different
After all, Mojibake potentially corrupts important data, without the
Of course, I read it.
Victor didn't mention about "no underscore" convention.
I don't recommend to use "locale" encoding for users.
In some case, user need to decide "not change the encoding for now".
Changing the default encoding will temporary increase this risk.