-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix --debug
and setting of PYTEST_CURRENT_TEST
in non-UTF8 locales
#7787
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I agree with the approach here.
For the debug file, I guess forcing it to be UTF-8 isn't too bad - even on Windows, I suppose most text editors will be able to detect and open UTF-8 (though I'm not sure if Windows Notepad does?)
But for the environment variable, this will just shift the error to the reader of the variable - surely the reader won't expect an UTF-8 encoded variable when the system encoding isn't UTF-8. Also I'm not sure how the situation is on Windows, are environment variables text/UTF-8 there? Otherwise this would still fail there, no?
The only other solution I can see is something like:
encoding = sys.getfilesystemencoding()
value = value.encode(encoding, errors='replace').decode(encoding)
|
True: Python 3.6.7 (default, Mar 4 2020, 17:08:00) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.supports_bytes_environ
False
I think so too, but shouldn't we use the default locale then? |
I agree. The debug file is meant for advanced users, so it is fair to encode it using a known encoding and expect those advanced users to be able to open it. Even in situations where the user can't use a text editor that supports UTF-8, all it will happen is a few wrong characters if output contains non-ASCII output, which shouldn't be too bad. |
if we use the default locale then there's information loss |
Hmm TBH I not sure I follow the underlying semantics here, but I will defer to your expertise that this is the correct approach. Other than that, the rest looks great. 👍 |
Agree with the others that UTF-8 is good for As for
that information loss is preferable over bad encoding. So my suggestion would be to use ( |
I'm going to merge this as-is unless there's stronger opposition python3.8+ in the same situation chooses UTF-8 so I think this choice is consistent and appropriate for cpython -- this is really only a bug in older python versions the filesystem encoding also isn't appropriate for the environ, it's not a file -- there are however two equally bad choices: locale or filesystem |
Do you have a reference for that? Looking at the os.py code it seems the same to me. |
actually, it's 3.7+ $ LANG= python3.6 -c 'import os; os.environ["x"] = "☃"'
Unable to decode the command from the command line:
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 30-32: surrogates not allowed
$ LANG= python3.7 -c 'import os; os.environ["x"] = "☃"'
$ LANG= python3.8 -c 'import os; os.environ["x"] = "☃"'
$ |
That's because these versions changed the interpretation of "C" to UTF-8. But doesn't work for other cases:
|
via #10935 |
Resolves #7781
Resolves #7786