New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python Launcher, Windows, fails on scripts w/ non-latin names #64241
Comments
Running E:\>set PYLAUNCH_DEBUG=1 E:\>py юникод.py Note "Called with command line: .py" in output shows that filename was mangled very early on. Invoking The problem lies in Windows handling of command-line arguments and fix to this is simple but non-obvious. Patch attached. |
Sorry, fixed whitespaces in the patch. |
It looks like the wide character strings (wchar_t*) are misused. For example: error(RC_NO_PYTHON, L"Requested Python version (%s) ...", &p[1]); The %s formatter is for byte string (char*), "%ls" should be used instead. + _setmode(_fileno(stdout), _O_WTEXT); Extract of wprintf() documentation: "The wprintf() and vwprintf() functions perform wide-character output to stdout. stdout must not be byte oriented; see fwide(3) for more information." So _setmode() or fwide() should be used if I understood correctly. Or wprintf() should be replaced with printf() (still with "%ls" format)? wprintf("%ls") replaces unencodable character string arguments by ? (U+003F), whereas printf("%ls") and wprintf("%s") truncates the output at the first undecodable/unencodable character: So wprintf() is probably better here. |
I don't care much about debug output though it probably should be fixed. The point is that changing text mode of stdout has a weird side effect of fixing command-line arguments when invoking interactively from cmd.exe. |
There is something weird with my proposed fix. Right after submitting a bug with patch I've updated pythons on my system - 2.7.5 to 2.7.6, 3.3.2 to 3.3.3, and installed 3.4.0b1 - both 32- and 64-bit. Then my fixed py.exe stopped working. Then I've added _setmode for stdin/stdout/stderr and rebuilt both debug/release and x86/x64 versions: E:\>set PYLAUNCH_DEBUG=1 E:\>e:\cpython\PCbuild\py.exe юникод.py E:\>e:\cpython\PCbuild\py_d.exe юникод.py E:\>e:\cpython\PCbuild\amd64\py.exe юникод.py E:\>e:\cpython\PCbuild\amd64\py_d.exe юникод.py Setting wide mode for stderr had fixed error messages in debug output. And looks like x64 debug build has off-by-one error and CRT behavior is wonky regarding command-line handling. So my patch doesn't really fix original problem yet exhibits some underlying crt bug. |
Some more fun stuff with command-line (I'm cutting output to few essential lines for easier reading): e:\cpython\PCbuild\py.exe юникод.py e:\cpython\PCbuild\py.exe e:\юникод.py E:\>e:\cpython\PCbuild\py.exe тест\unicode.py E:\>e:\cpython\PCbuild\py.exe e:\тест\unicode.py E:\>e:\cpython\PCbuild\py.exe "юникод.py" IOW, so long as command-line starts with ASCII character everything is fine. If not, then one or more characters gets mangled. Now I'm not sure whether it's a cmd.exe bug or C runtime one, and whether it's possible to workaround about it. |
Is this actually fixable? I only ask as we seem to have a whole lot of fun with anything involving cmd.exe as epitomized on bpo-1602. |
It should be fixable. In general, Unicode in the console is fine, but the CRT doesn't handle it well (as shown by the _setmode extension being able to fix it). The 'correct' fix for Unicode in the console is at http://www.siao2.com/2010/04/07/9989346.aspx and it basically comes down to "use the Windows API and not the CRT". It's certainly fixable here, though the general fix for Python itself is more difficult because we want/need to expose the bytes interface as well (that said, bpo-1602 seems to have a good fix right now that just happens to be easily distributable as pure Python code, so there's little motivation to merge it in, especially since it will break back-compat). I don't know entirely whether _setmode is a correct fix here, or if the attached patch is sufficient, but it can be fixed. |
The problem is skip_whitespace mistakenly calls isspace instead of iswspace. http://hg.python.org/cpython/file/c0e311e010fc/PC/launcher.c#l48 isspace has undefined behavior when the argument is "not EOF or in the range of 0 through 0xFF": http://msdn.microsoft.com/en-us/library/y13z34da%28v=vs.100%29.aspx The display of debug messages should be handled in its own issue. IMO, setting stderr to _O_WTEXT mode or _O_U16TEXT mode looks reasonable. The launcher already uses wide-character strings and the wprintf family. It's just the default _O_TEXT mode ends up encoding to the console codepage. Regarding msg206744, %s in wide-character format strings is OK for VC++ 10: http://msdn.microsoft.com/en-us/library/hf4y5e3w%28v=vs.100%29.aspx This will be a legacy mode in VC++ 14 (CPython 3.5?): P.S. http://hg.python.org/cpython/file/c0e311e010fc/PC/launcher.c#l262 |
The bug reported in msg225529 has been fixed, but there's another one a few lines up https://hg.python.org/cpython/file/bd656916586f/PC/launcher.c#l265 as there's only one % but two parameters. Although IIRC we'd get away with this the way C works, shouldn't we fix it anyway? |
Is this still relevant or should it be closed? On Win10, I created a short script юникод.py using Save As from IDLE. py -2 юникод.py produces On 3.5 and 3.6, the file runs without issue. The issue was opened with 3.3; 3.5 switched to a much more recent compiler, and I did not see any indication in the messages that this was tested on 3.5 before it was added. So perhaps for 3.5+, this is out-of-date. |
Terry J. Reedy wrote:
Should be closed.
This outcome is expected and error comes from python itself, not from launcher.
Launcher works fine now from my testing. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: