-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEP 597: Implemente encoding="locale" option and EncodingWarning #87676
Comments
PEP-597 is accepted. |
I replied to INADA-san message on bpo-43552:
In this case, the PEP-597 statement that open(filename, encoding="locale") is the same than open(filename) is wrong. It would mean that users which got the UTF-8 Mode enabled (implicitly or explicitly) would switch to a legacy encoding like latin1 rather than using the UTF-8 encoding, if they add encoding="locale" to their open() calls? Since the final goal is to move everybody towards to UTF-8, I'm not sure how it's a good thing. |
The final goal (the third motivation of the PEP-597) is changing the default encoding (i.e. encoding used when it is not specified) to UTF-8. But forcing people to use UTF-8 even they specify locale encoding explicitly is not the goal. That's why I want to ignore UTF-8 mode when I think this is almost Windows-only issue, and "mbcs" can be used in Windows already. It is documented in https://docs.python.org/3/using/windows.html#utf-8-mode So this is not a blocker. Just my preference. |
I see different cases when open() is called with no encoding argument: (A) User wants to use UTF-8: add encoding="utf-8" (B) Windows user wants to use the ANSI code page of their computer, local file not intended to be shared with other computers: add encoding="mbcs". This makes the code specific to Windows ("mbcs" alias doesn't exist on Unix). (C) User wants to use the locale encoding and is fine with the UTF-8 Mode: add encoding=getpreferredencoding(False) (D) Unix user wants to use the locale encoding but not the UTF-8 Mode: encoding=get_current_locale_encoding() (function proposed in bpo-43552) or nl_langinfo(CODESET) (should work on any Python version). I don't know if nl_langinfo(CODESET) is available on Windows. (E) User has no idea of what they are doing and don't understand anything to Unicode: please trust us and specify explicitly UTF-8 :-) Apart the encoding="utf-8" case, I understand that they are two main complex cases: (1) "UTF-8" in the UTF-8 Mode, or the locale encoding What I don't expect is the current behavior, before PEP-597. Who uses open() without specifying an encoding but always want to use the locale encoding? (case 2) So this use case is already broken when the UTF-8 Mode is enabled explicitly? |
Yes, it is broken already. So they can not use UTF-8 mode. If That's why it is important If we enable UTF-8 mode by default in the future. |
Yeah! Congrats INADA-san for implementing your PEP! |
I created bpo-43651 to track fixing EncodingError in Python stdlibs. |
In bpo-43651, I found code pattern that it's difficult to use io.text_encoding(): class OpenWrapper:
def __new__(cls, *args, **kwargs):
return open(*args, **kwargs)
I think we should accept |
I'm sorry, I was wrong. Allowing If we use Adding So we must not call |
To me, it sounds really weird to accept an encoding when a file is opened in binary mode. open(filename, "rb", encoding="locale") looks like a bug. |
On 31.03.2021 11:30, STINNER Victor wrote:
Same here. If encoding is used as an argument and then not used, this is a bug, |
encoding="locale"
in binary mode. #25103encoding="locale"
in binary mode." #25108Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: