-
Notifications
You must be signed in to change notification settings - Fork 382
Description
I'm running into problems with the latest (post #201) encoding handling. A "tl;dr" is at the bottom.
Using invoke to run msbuild (the Visual Studio version of make), I get:
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Program Files (x86)\Python34\lib\threading.py", line 920, in _bootstrap_inner
self.run()
File "C:\Program Files (x86)\Python34\lib\threading.py", line 868, in run
self._target(*self._args, **self._kwargs)
File "C:\...\venv\lib\site-packages\invoke\runner.py", line 211, in display
dst.write(data)
File "C:\Program Files (x86)\Python34\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u201d' in position 101: character maps to <undefined>
The problem is the default value of encoding in invoke.runners.Local.run_direct(). Like @pfmoore, I also have a system where locale.getpreferredencoding() and sys.stdout.encoding differ (cp1252 vs cp850). In #201, the default encoding was chosen to be locale.getpreferredencoding(False), because Python uses it when its stdout is redirected and it can't discern the file descriptor locale via os.device_encoding(sys.stdout.fileno()).
C:\>chcp
Aktive Codepage: 850.
C:\>type encodingdemo.py
#!/usr/bin/python3
import sys
import locale
print('getpreferredencoding:', locale.getpreferredencoding(), file=sys.stderr)
print('sys.stdout.encoding:', sys.stdout.encoding, file=sys.stderr)
C:\>py -3.4 encodingdemo.py
locale.getpreferredencoding: cp1252
sys.stdout.encoding: cp850
C:\>py -3.4 encodingdemo.py > some_file
locale.getpreferredencoding: cp1252
sys.stdout.encoding: cp1252
Note how the stdout encoding changes when redirecting. Since invoke does indeed redirect its childs I/O, it's correct to use locale.getpreferredencoding() to get the appropriate locale.
Unfortunately, msbuild, or more generally, .NET's Console class (documentation, source code) doesn't work this way. When it opens stdout, it uses the console output codepage, even when not writing directly to the console (see InitializeStdOutError in the source code link above):
int codePage = (int) Win32Native.GetConsoleOutputCP();
Encoding encoding = Encoding.GetEncoding(codePage);
In other words, it recovers the console encoding (sys.stdout.encoding in Python) even when redirected, whereas Python falls back to locale.getpreferredencoding().
C:\>chcp
Aktive Codepage: 850.
C:\>msbuild 2>&1 > msbuild.cp850 # German with CP850 chars
C:\>chcp 1250
Aktive Codepage: 1250.
C:\>msbuild 2>&1 > msbuild.cp1250 # English, ASCII
C:\>chcp 65001
Aktive Codepage: 65001.
C:\>msbuild 2>&1 > msbuild.cp65001 # German with UTF-8 chars
(Chcp changes the active codepage but doesn't alter locale.getpreferredencoding()).
This means there is no way to correctly handle both Python apps (which output with locale.getpreferredencoding() encoding when redirected) and .NET apps (which output with the console encoding when redirected). :-(
Possible solutions:
- Replace
dst.write(data)withdst.buffer.write(data.encode(dst.encoding, errors='replace))to avoid theUnicodeEncodeErrorwhen trying to write incorrectly read input to the console. This should definitely be done, irrespective of trying to get the encoding right. - We could try to decode the input data with both possible encodings on Windows and see what works, but this is super hackish.
- We could set the codepage to
locale.getdefaultencoding(), but that's a per-console window setting (not per-processtree), so we'd need to reset it to the previous value on exit (clean or via exception). Also, it's unclear what implications that will have for child processes. As a user, I wouldn't expect invoke to mess with my active codepage.
tl;dr: Python and .NET output with different encodings when run from invoke. Invoke can't force an encoding on them and it can't know which one a called program will use. We're hosed.