IDLE: make sys.stdxxx.encoding always be utf-8 #85324

terryjreedy · 2020-06-28T17:56:43Z

BPO	41152
Nosy	@terryjreedy, @taleinat, @ned-deily, @serhiy-storchaka, @miss-islington
PRs	bpo-41152: Revise setting idlelib.iomenu.encoding #21206 bpo-41152: IDLE: always use UTF-8 for standard IO streams #21214 [3.9] bpo-41152: IDLE: always use UTF-8 for standard IO streams (GH-21214) #21225 [3.8] bpo-41152: IDLE: always use UTF-8 for standard IO streams (GH-21214) #21226

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/terryjreedy'
closed_at = <Date 2020-06-30.12:23:13.900>
created_at = <Date 2020-06-28.17:56:42.823>
labels = ['expert-IDLE', 'type-bug', '3.8', '3.9', '3.10']
title = 'IDLE: make sys.stdxxx.encoding always be utf-8'
updated_at = <Date 2020-06-30.12:23:13.897>
user = 'https://github.com/terryjreedy'

bugs.python.org fields:

activity = <Date 2020-06-30.12:23:13.897>
actor = 'terry.reedy'
assignee = 'terry.reedy'
closed = True
closed_date = <Date 2020-06-30.12:23:13.900>
closer = 'terry.reedy'
components = ['IDLE']
creation = <Date 2020-06-28.17:56:42.823>
creator = 'terry.reedy'
dependencies = []
files = []
hgrepos = []
issue_num = 41152
keywords = ['patch']
message_count = 10.0
messages = ['372527', '372528', '372540', '372543', '372546', '372567', '372642', '372643', '372644', '372684']
nosy_count = 5.0
nosy_names = ['terry.reedy', 'taleinat', 'ned.deily', 'serhiy.storchaka', 'miss-islington']
pr_nums = ['21206', '21214', '21225', '21226']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue41152'
versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

terryjreedy · 2020-06-28T17:56:43Z

When testing and on Windows, iomenu.encoding and .errors are set to utf-8 and surrogateescape*. When running otherwise, these are set with baroque code I don't understand. (Currently lines 31 to 61.)

Combine the two conditional statements for testing and Windows.
Ned, on my Catalina Macbook, the 30-line 'else' sections sets encoding, errors to 'utf-8', 'strict'. Should there ever be any other result on Mac we care about? If not, I would like to directly set them, as on Windows.
Serhiy, does the 'baroque code' look right to you, for Linux (or *nix in general)?

serhiy-storchaka · 2020-06-28T18:10:28Z

I think it makes sense if we want to use the locale encoding for IO streams.

But on other hand, it may be worth to drop support of locale-depending and configurable IO encoding and always use UTF-8. It is the IO encoding always used on Windows and the encoding of most locales on modern Linux and macOS.

terryjreedy · 2020-06-29T04:11:40Z

The PR is for 1. The *nix code is a bit clearer without the Windows code in the middle.

Is there a good reason why when encoding is 'utf-8', errors should be 'surrogateescape' on Windows and 'strict' elsewhere? Surrogateescape seems like it is made for when using ascii or other limited encoding.

terryjreedy · 2020-06-29T05:02:28Z

The main use for the iomenu settings is for the socket-transport file classes, in run.py. The default encoding='utf-8' and errors='strict' are not used but are overriden with the iomenu values, or for stderr, 'backslashreplace'.

Since user code can print any unicode, I think the defaults should used as is to transparently pass on and possibly display anything the user sends. Such a change should have no back-compatibility issues.

Thinking more about errors. With utf-8 encoding of proper strings, there should never be any, but Python does allow construction of 'improper' strings with, say, single surrogates. The transport mechanism should never raise, so maybe surrogateescape or backslashreplace should always be used.

What do you two think?

Another use is for writing bytes to an OutputWindow, as with find-in-files. But I can think of no case where IDLE sends bytes to an OutputWindow. User files are all opened in an editor.

I believe these are all the uses of 'iomenu.encoding' outside of iomenu. 'from iomenu ...' is never used.

Within iomenu, the only use is part of reading an encoding cookie.
# The only use of 'encoding' below is in _decode as initial value
# of deprecated block asking user for encoding.
I am not sure if this use can be reached now. Even if so, I believe this code duplicates code elsewhere in the stdlib that might be used.

So maybe the encoding calculation is not really needed.

terryjreedy · 2020-06-29T05:36:21Z

I got the 'within iomenu' part a bit wrong. To open a file to edit, iomenu.IOBinging('IO').open tells filelist to use IO.loadfile. This reads bytes 'so that we can handle end-of-line convention ourselves'. (I suspect that this predates 3.x and might not be needed any more.) IO.loadfile calls IO._decode which looks for a utf-8 BOM, looks for a coding cookie, tries ascii (not needed in 3.x), tries utf-8, and asks the user for an encoding, using iomenu.encoding as the initial value in the query box. This box is deprecated in the sense that for 3.x, a python file should either be utf-8 or have an encoding cookie.

serhiy-storchaka · 2020-06-29T12:15:20Z

PR 21214 sets the encoding of stdin/stdout/stderr to 'utf-8'. Error handler is set to 'surrogatepass' or 'surrogateescape' because these error handlers are used when convert strings between Python and Tcl. It guarantees that reading from stdin and writing back to stdout will never fail, even if you paste garbage from clipboard. Printing file paths will never fail too.

terryjreedy · 2020-06-30T00:18:39Z

New changeset 2515a28 by Serhiy Storchaka in branch 'master':
bpo-41152: IDLE: always use UTF-8 for standard IO streams (GH-21214)
2515a28

miss-islington · 2020-06-30T00:36:54Z

New changeset 01638ce by Miss Islington (bot) in branch '3.9':
bpo-41152: IDLE: always use UTF-8 for standard IO streams (GH-21214)
01638ce

miss-islington · 2020-06-30T00:39:06Z

New changeset 00fd04b by Miss Islington (bot) in branch '3.8':
bpo-41152: IDLE: always use UTF-8 for standard IO streams (GH-21214)
00fd04b

terryjreedy · 2020-06-30T12:23:14Z

Thank you for this and the next patch.

terryjreedy added the 3.10 only security fixes label Jun 28, 2020

terryjreedy self-assigned this Jun 28, 2020

terryjreedy added topic-IDLE type-bug An unexpected behavior, bug, or error labels Jun 28, 2020

terryjreedy added 3.8 only security fixes 3.9 only security fixes labels Jun 30, 2020

terryjreedy closed this as completed Jun 30, 2020

terryjreedy changed the title ~~IDLE: revise setting of iomenu.encoding and .errors~~ IDLE: make sys.stdxxx.encoding always be utf-8 Jun 30, 2020

terryjreedy added 3.8 only security fixes 3.9 only security fixes labels Jun 30, 2020

terryjreedy closed this as completed Jun 30, 2020

terryjreedy changed the title ~~IDLE: revise setting of iomenu.encoding and .errors~~ IDLE: make sys.stdxxx.encoding always be utf-8 Jun 30, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDLE: make sys.stdxxx.encoding always be utf-8 #85324

IDLE: make sys.stdxxx.encoding always be utf-8 #85324

terryjreedy commented Jun 28, 2020

terryjreedy commented Jun 28, 2020

serhiy-storchaka commented Jun 28, 2020

terryjreedy commented Jun 29, 2020

terryjreedy commented Jun 29, 2020

terryjreedy commented Jun 29, 2020

serhiy-storchaka commented Jun 29, 2020

terryjreedy commented Jun 30, 2020

miss-islington commented Jun 30, 2020

miss-islington commented Jun 30, 2020

terryjreedy commented Jun 30, 2020

IDLE: make sys.stdxxx.encoding always be utf-8 #85324

IDLE: make sys.stdxxx.encoding always be utf-8 #85324

Comments

terryjreedy commented Jun 28, 2020

terryjreedy commented Jun 28, 2020

serhiy-storchaka commented Jun 28, 2020

terryjreedy commented Jun 29, 2020

terryjreedy commented Jun 29, 2020

terryjreedy commented Jun 29, 2020

serhiy-storchaka commented Jun 29, 2020

terryjreedy commented Jun 30, 2020

miss-islington commented Jun 30, 2020

miss-islington commented Jun 30, 2020

terryjreedy commented Jun 30, 2020