New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix support for response files & switch the compiler fully to UTF-8 on Windows #3086
Conversation
@rainers: Please have a quick look at the 2nd commit (incl. druntime changes); that was necessary to make |
fa7e4b5
to
cb0f95e
Compare
I suspect _d_run_main is used explicitly by a number of applications. Wouldn't it be easier and less disruptive if _d_run_main is split into two functions, one that does the argument processing and calls the other with UTF8 args?
so that you could call _d_run_main2 with modified arguments. |
The problem with that approach is that |
I don't think you need to change With my suggestion you might have to copy some of the argument handling in dmain2, but I don't think that is too bad. Changing _d_run_main to accept wchar is unlikely to be accepted as it breaks code. |
I could take a step back and only invoke a new |
Wrt. UTF-8, I checked what LLVM does when we use it to start a new child process, such as LDC from LDMD, with narrow args - it uses CreateProcessW and converts the supplied args from UTF-8 to UTF-16. So that's a potential issue with current LDMD/LDC (#611) when forwarding command-line args to a child process, as the source encoding is the current code page. That encoding is expected for the regular narrow MSCRT and Windows APIs (filenames etc.) though, not sure if we (incl. DMD front-end) use those anywhere directly. |
Changes to druntime look much better now. _d_run_wmain doesn't have to be in dmain2.d, maybe just keep it in ldc? BTW: does this handle DFLAGS, too? AFAICT this uses UTF8 filenames internally on Windows, now. Does this work correctly with console output (without telling the user to change the codepage)? |
Thx Rainer, very valid points - reading/setting env variables would need to happen via the wide API + converting to/from UTF8. Wrt. console, I was shocked to see that I've apparently never tried to write non-ASCII stuff to the console (in D). How on earth can a |
757cdac
to
c2cf661
Compare
Seems to work as intended (LDC install dir can contain umlauts, source files in cmdline too etc.); console is switched to UTF8 and then reset before exiting. The only thing I noticed is that |
IIRC switching the console codepage had impact on the full display rendering text shown via the ANSI code page to unreadable characters. I don't see that now, though. I suspect the console has improved with Windows 10. |
Probably related: https://issues.dlang.org/show_bug.cgi?id=1448, but I cannot reproduce on Win10. |
The only negative thing I've noticed is that the whole console switches to some extremely ugly font if the previous one was a raster font, incl. a different window size. Once a proper font is set as default, that's not an issue anymore, but sure, some people will probably wonder.
Thx for digging that out. This should be reproducible: import core.stdc.stdio;
import core.sys.windows.wincon, core.sys.windows.winnls;
void main()
{
const oldCP = SetConsoleOutputCP(CP_UTF8);
scope(exit) SetConsoleOutputCP(oldCP);
fprintf(stdout, "HellöѬ LDC\n");
fflush(stdout);
fprintf(stderr, "HellöѬ LDC\n");
fflush(stderr);
} =>
|
That's what I remember but don't see now. There is a single raster font installed, though.
Works here, Win 10 1809 |
Well that's interesting. I'm at v1803. Such recent improvements in this area after decades of maintaining the crappy status quo would be really surprising. |
Seems like it's happening: https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/ |
Indeed. Running v1903 now (how do they manage to make an OS update that slow?!), and the previously built binary now outputs to stderr correctly. |
druntime addition proposed upstream: dlang/druntime#2701 |
Front-end fixes proposed upstream: dlang/dmd#10288 |
This is required for dub, which appears to always use a response file containing all command-line arguments.
only) This makes _d_wrun_main (cherry-picked from dlang/druntime#2701) use the provided args directly instead of the process's real arguments (on Windows) - if the host D compiler supports it. E.g., this is required when passing --DRT-* options from a response file to _d_wrun_main. As a major change, the encoding of the Windows cmdline arguments is switched from the current codepage to UTF-8. Note that the MinGW-based libs currently only provide narrow CRT entry points.
Conforming to its documentation, although the previous implementation assumed filenames in the current code page.
E.g., this disables it for __entrypoint.d (the C main function(s) in there), which is required. A wmain on Windows is not detected as FuncDeclaration.isCMain() yet, that should be fixed too.
For proper in/output of narrow UTF-8 strings.
There might be more occurrences.
And use the wide API for pure is-env-variable-set checks too, as the first call to a narrow env API function would lead to the C runtime preparing and maintaining both narrow and wide environments.
Thereby preventing to use fopen() on Windows, which expects a narrow string in the current code page, not necessarily UTF-8.
I tried adding a Unicode lit-test, but lit/Python 2.7 (?) doesn't support testfile names with Unicode chars here on my Windows box, so not pursuing that brittle path any further (there was another issue, an LLVM encoding bug when converting |
Previously,
-lowmem
and, on Windows, all--DRT-*
options needed to be direct cmdline args (for DMD too), occurrences in response files were ignored. That's a serious issue for dub, as AFAICT it always uses a response file with all actual args.-lowmem
was an easy fix, as we can expand the response files already in C main and unlike DMD don't need an initialized druntime.The
--DRT-*
options on Windows are more problematic, as_d_run_main
ignored the passed args and instead asked for the process' real arguments (as it needs them in original UTF-16 for proper conversion to UTF-8, as expected for D strings...).