Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[io] Windows legacy mode mistakenly ignores the device encoding #86427

Open
eryksun opened this issue Nov 4, 2020 · 5 comments
Open

[io] Windows legacy mode mistakenly ignores the device encoding #86427

eryksun opened this issue Nov 4, 2020 · 5 comments
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows topic-IO type-bug An unexpected behavior, bug, or error

Comments

@eryksun
Copy link
Contributor

eryksun commented Nov 4, 2020

BPO 42261
Nosy @pfmoore, @vstinner, @tjguk, @zware, @eryksun, @zooba

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2020-11-04.15:20:48.448>
labels = ['interpreter-core', 'type-bug', '3.8', '3.9', 'expert-IO', '3.10', 'OS-windows']
title = 'Windows legacy I/O mode mistakenly ignores the device encoding'
updated_at = <Date 2020-11-04.16:50:47.706>
user = 'https://github.com/eryksun'

bugs.python.org fields:

activity = <Date 2020-11-04.16:50:47.706>
actor = 'eryksun'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core', 'Windows', 'IO']
creation = <Date 2020-11-04.15:20:48.448>
creator = 'eryksun'
dependencies = []
files = []
hgrepos = []
issue_num = 42261
keywords = []
message_count = 5.0
messages = ['380329', '380330', '380333', '380335', '380343']
nosy_count = 6.0
nosy_names = ['paul.moore', 'vstinner', 'tim.golden', 'zach.ware', 'eryksun', 'steve.dower']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue42261'
versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

@eryksun
Copy link
Contributor Author

eryksun commented Nov 4, 2020

In Python 3.8+, legacy standard I/O mode uses the process code page from GetACP instead of the correct device encoding from GetConsoleCP and GetConsoleOutputCP. For example:

C:\>chcp 850
Active code page: 850
C:\>set PYTHONLEGACYWINDOWSSTDIO=1

C:\>py -3.7 -c "import sys; print(sys.stdin.encoding)"
cp850
C:\>py -3.8 -c "import sys; print(sys.stdin.encoding)"
cp1252
C:\>py -3.9 -c "import sys; print(sys.stdin.encoding)"
cp1252

This is based on config_init_stdio_encoding() in Python/initconfig.c, which sets config->stdio_encoding via config_get_locale_encoding(). Cannot config->stdio_encoding be set to NULL for default behavior?

Computing this ahead of time would require separate encodings config->stdin_encoding, config->stdout_encoding, and config->stderr_encoding. And _Py_device_encoding would have to be duplicated as something like config_get_device_encoding(PyConfig *config, int fd, wchar_t **device_encoding).

@eryksun eryksun added 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows topic-IO type-bug An unexpected behavior, bug, or error labels Nov 4, 2020
@eryksun
Copy link
Contributor Author

eryksun commented Nov 4, 2020

There's a related issue that affects opening duplicated file descriptors and opening "CON", "CONIN$", and "CONOUT$" in legacy I/O mode, but this case has always been broken. For Windows, _Py_device_encoding needs to be generalized to use _get_osfhandle and GetNumberOfConsoleInputEvents to detect and differentiate console input and output, instead of using isatty() and hard coding file descriptors 0-2.

@vstinner
Copy link
Member

vstinner commented Nov 4, 2020

This is based on config_init_stdio_encoding() in Python/initconfig.c, which sets config->stdio_encoding via config_get_locale_encoding(). Cannot config->stdio_encoding be set to NULL for default behavior?

I would like to get a PyConfig structure fully populated to make the Python initialization more deterministic and reliable. So PyConfig fully control used encodings.

The solution here is to fix config_init_stdio_encoding() to use GetConsoleCP() and GetConsoleOutputCP() to build a "cpXXX" string.

This issue seems to be a regression that I introduced in Python 3.8 with the PEP-587 (PyConfig). I didn't notice this subtle case during my refactoring. Relying on os.device_encoding() when the encoding is NULL is not obvious. That's why I prefer to get PyConfig full populated ;-)

It would be nice to get an unit test for this case.

@eryksun
Copy link
Contributor Author

eryksun commented Nov 4, 2020

The solution here is to fix config_init_stdio_encoding() to use
GetConsoleCP() and GetConsoleOutputCP() to build a "cpXXX" string.

But, as I mentioned, that's only possible by replacing config->stdio_encoding with three separate settings: config->stdin_encoding, config->stdout_encoding, and config->stderr_encoding.

@eryksun
Copy link
Contributor Author

eryksun commented Nov 4, 2020

It would be nice to get an unit test for this case.

The process code page from GetACP() is either an ANSI code page or CP_UTF8 (65001). It should never be a Western OEM code page such as 850. In that case, a reliable unit test would check that the configured encoding is a particular OEM code page. For example, spawn a new interpreter in a windowless console session (i.e. creationflags=CREATE_NO_WINDOW). Set the session's input code page to 850 via ctypes.WinDLL('kernel32').SetConsoleCP(850). Set os.environ['PYTHONLEGACYWINDOWSSTDIO'] = '1'. Then spawn [sys.executable, '-c', 'import sys; print(sys.stdin.encoding)'], and verify that the output is 'cp850'.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@eryksun eryksun changed the title Windows legacy I/O mode mistakenly ignores the device encoding [io] Windows legacy mode mistakenly ignores the device encoding Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-windows topic-IO type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants