-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IDLE: make sys.stdxxx.encoding always be utf-8 #85324
Comments
When testing and on Windows, iomenu.encoding and .errors are set to utf-8 and surrogateescape*. When running otherwise, these are set with baroque code I don't understand. (Currently lines 31 to 61.)
|
I think it makes sense if we want to use the locale encoding for IO streams. But on other hand, it may be worth to drop support of locale-depending and configurable IO encoding and always use UTF-8. It is the IO encoding always used on Windows and the encoding of most locales on modern Linux and macOS. |
The PR is for 1. The *nix code is a bit clearer without the Windows code in the middle. Is there a good reason why when encoding is 'utf-8', errors should be 'surrogateescape' on Windows and 'strict' elsewhere? Surrogateescape seems like it is made for when using ascii or other limited encoding. |
The main use for the iomenu settings is for the socket-transport file classes, in run.py. The default encoding='utf-8' and errors='strict' are not used but are overriden with the iomenu values, or for stderr, 'backslashreplace'. Since user code can print any unicode, I think the defaults should used as is to transparently pass on and possibly display anything the user sends. Such a change should have no back-compatibility issues. Thinking more about errors. With utf-8 encoding of proper strings, there should never be any, but Python does allow construction of 'improper' strings with, say, single surrogates. The transport mechanism should never raise, so maybe surrogateescape or backslashreplace should always be used. What do you two think? Another use is for writing bytes to an OutputWindow, as with find-in-files. But I can think of no case where IDLE sends bytes to an OutputWindow. User files are all opened in an editor. I believe these are all the uses of 'iomenu.encoding' outside of iomenu. 'from iomenu ...' is never used. Within iomenu, the only use is part of reading an encoding cookie. So maybe the encoding calculation is not really needed. |
I got the 'within iomenu' part a bit wrong. To open a file to edit, iomenu.IOBinging('IO').open tells filelist to use IO.loadfile. This reads bytes 'so that we can handle end-of-line convention ourselves'. (I suspect that this predates 3.x and might not be needed any more.) IO.loadfile calls IO._decode which looks for a utf-8 BOM, looks for a coding cookie, tries ascii (not needed in 3.x), tries utf-8, and asks the user for an encoding, using iomenu.encoding as the initial value in the query box. This box is deprecated in the sense that for 3.x, a python file should either be utf-8 or have an encoding cookie. |
PR 21214 sets the encoding of stdin/stdout/stderr to 'utf-8'. Error handler is set to 'surrogatepass' or 'surrogateescape' because these error handlers are used when convert strings between Python and Tcl. It guarantees that reading from stdin and writing back to stdout will never fail, even if you paste garbage from clipboard. Printing file paths will never fail too. |
Thank you for this and the next patch. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: