New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use multi-byte variant of CreateProcess on Windows #809
Conversation
This should fix problems when the command line or the working directory of a command on Windows contains non-ASCII characters. Those characters are first converted into a multi-byte string and then passed to CreateProcessW(). Closes geany#784.
Title is misleading, you're actually using the "wide" variant of |
For fun, I converted (most of) the explicit wide stuff to be agnostic of Win32 encoding, and defined https://gist.github.com/codebrainz/fcadf6400f5f4c7263d3 (note: not tested, except that it builds) |
message = g_strdup_printf( | ||
"Failed to convert working directory to multi-byte string: %s", err); | ||
g_free(err); | ||
failed = ""; /* report the message only */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I read the error handler below correctly, it would be shorter, simpler and more consistent to just do
failed = "convert working directory to multi-byte string";
goto leave;
and let the common handler build the full message.
Apart from this, my limited WINAPI knowledge suggests it's mostly OK. Though, the MSDN docs states that MultiByteToWideWhar can also return U+FFFD on invalid input:
Maybe this should be handled too? @zhekov if you don't mind, can we get your opinion on this? |
@codebrainz we probably indeed never should use non-wide versions. Though,is there any harm in specifically using the wide version? e.g. what do we gain by defining |
With
IIUC all functions used to be ASCII, then in Windows NT they added support for Unicode, so they added a |
@codebrainz okay… though, for the non-UNICODE code path, does it really want ASCII, or rather the common Winodws-ish locale code pages? And will the |
glib uses CreateProcessW, and because of that, or of some improper character handling, FiF with non-UTF8 text often fails with "Invalid byte sequence in conversion input" (this is documented in spawn.c). So my first suggestion for this PR would be to test FiF with texts in various encodings. The most likely reason for this failure is as this: CreateProcessA simply passes a sequence of bytes, and grep (make, gcc, whatever) receives them as-is. For CreateProcessW, the chars are first converted to UNICODE, and then back to 8-bit, since grep (and most console programs, especially UNIX tools) expect char *_argv, not wchar_t *_argv. Both conversions may fail, if at least 1 byte is not part of the current 8-bit code page. So, W is not a magic bullet. For #784: the FiF utf8 dir is "converted" to locale with Geany's utils_get_locale_from_utf8(), which is specifically defined to do nothing under Win32. Unless the directory is in UTF-8 encoding, instead of "ANSI", or "UNICODE-not-convertable-to-ANSI" (the standard encodings under Win32), I would suspect this PR will not fix the problem. g_spawn_() specifically expects all text to be passed as UTF-8. Which seems like a good idea, but as soon as you pass any text that is convertable to UNICODE, but not to the current code page, the second W conversion (UNICODE to grep's etc. char *_argv) will fail - or maybe produce strings some replacement characters, so that grep (and the other tools) will be run after all. Which is even worse. TL;DR: The quick fix will be to convert the directory with g_locale_from_utf8(), detect any unconvertable directory names (should be rare), and fail early. The full fix... That will be to implement a proper locale file name support, as much as it's possible under glib (our locale support under POSIX is badly broken). But that is not an easy thing to do, and will require a lot of work. As long as most of the tools started by Geany use char *_argv and not wchar_t *_argv, it would be better to avoid UNICODE, and stick to 8-bit. For fun - a nice excerpt from the g_spawn*() documentation: "For C programs built with Microsoft's tools it is enough to make the [spawned] program have a wmain() instead of main() [1]. wmain() has a wide character argument vector as parameter. At least currently, mingw doesn't support wmain(), so if you use mingw to develop the spawned program, it should call g_win32_get_command_line() to get arguments in UTF-8." Lovely. :) Especially considering that 99.9% of the development tools do not use glib. [1] completely non-standard, of course |
Naw my bad, it's "ANSI" (code pages) for "A" not specifically "ASCII".
It works if it was ASCII, it just replaces all characters > 127 with a question mark (arbitrary replacement character). The code really makes no sense for non ASCII, so I think it would be best to just drop those non-wide code paths. When I get some time I'll try to put together a proper PR instead of hijacking this one :) |
Sorry for the wrong subject, yes, it should have been wide and not multi-byte. To the rest, to my limited knowledge of this stuff, @codebrainz's Gist looks more comprehensive. @zhekov you might be correct and I don't want to argue on that topic because of too limit knowledge on my end, still, the code of this PR worked for my tests with FIF. Anyway, I'll close this PR and hope @codebrainz or @zhekov maybe provide a better one. |
Reminder: I'm not using Geany any more. That being said, implementing reasonable locale support for Geany, both POSIX and Win~1, may be an interesting task by itself. Though it'll require an explanation of what is wrong on the ML, then a discussion how much locale should be supported, a plan how to implement it without breaking too much plugin compatibility, and a significant amount of work and testing. (Note: by "reasonable" I don't mean Win~1 non-ASCII names which are not convertible to "ANSI", because most of the spawned tools won't support them anyway, so the W family of calls will not be required.) |
This should fix problems when the command line or the working directory
of a command on Windows contains non-ASCII characters.
Those characters are first converted into a multi-byte string and then
passed to CreateProcessW().
Closes #784.
I hope this won't have any unwanted side-effects. My tests were successfully and it solves the problem reported in #784.
The error handling seems a bit weird but actually works. The error strings are not marked as translatable on purpose because a) they are very technically and b) unlikely to happen at all.