-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output to windows console in Unicode mode crashes #2348
Comments
So? Is this really a Nim bug and not just a limitation of Windows? |
To speak with Sheldon Cooper: "Is that sarcasm?" ;-) I know about the "peculiarities" of the UTF-8 support in Windows' console. But, given the fact, that this (resulting from looking at Nim's generated C code)
works, and Nim's |
Works for me, win32, windows 8.1, used chcp 65001 command to set the codepage. |
Sorry, which program works for you, the first nim? I've got Windows 7, 64 bit and MinGW64 here, where it definitly shows the error described in the initial post. Would like to test that in a VM, but have never cross-compiled with Nim before: How would I cross-compile the source for win32 (my attempt with |
The Nim program works for me.
Something like
|
Doing so gives me:
That's the same result that I got before changing nim.cfg. |
I finally got it to compile with another version of the compiler (x32-4.8.1-release-posix-dwarf-rev5.7z) using
|
Well if fwrite lies with its return value there is little we can do except to disable this check for your particular machine. Note that you simply ignore the return value in your C program. |
Hm, here fwrite returns 3 (that would be correct, speaking about UTF-codepoints, which aren't bytes, I know...), but aside of that, how does echo do it? Btw. just tested this (using the same EXEs) in Windows 8.1: It completely ignored the chcp 65001 setting, just ouput in Windows 1252. So, it's going to become better with Windows 10, yet it's inconsistent within Nim (echo vs write). Again, as you said it worked for you in Windows 8.1, which compiler toolchain do you use? Is there a way to bootstrap with VC? |
--cc:vcc. I tested it with gcc version 4.8.1 (rev5, Built by MinGW-W64 project) |
Just confirming, you're using the command prompt, right? Not Mingw's own console? |
@Varriount: Yes, Windows's console. |
Well echo doesn't check the return value, write does. I don't want to disable this additional check. So ... any ideas? |
I can fully understand that you vote against less checking, but if the output is the console (and not a "real" file), I think one could go w/o these extra checks (esp. if counting fails for UTF-8 characters). Usually the output would be visible immediately, the user know that sth went wrong. |
There are some serious unicode issues with import terminal
let OUT_PAT = "╣"
echo(OUT_PAT) # ok
write(stdout, OUT_PAT) # just a box Other characters like "┨" don't appear at all. It's even worse in Git bash, not sure what differences there are between cmd.exe and it. |
Confirmed,
Any ideas how to proceed from here? Getting Unicode to work correctly is a pretty standard thing in 2016. We want to avoid any association with PHP and the likes of it, don't we? :) |
As far as I'm concerned we should use the Win API directly for our IO layers and get rid of the libc dependency. To answer your question, look at ccgexprs.nim, search for "mEcho" or "genEcho". |
Switching to high priority as it seems there is demand for a fix here. |
Regarding that HN comment, the original poster shared his workaround which raises a few interesting issues: import encodings
var hello1 = convert("Hellø, wørld!", "850", "UTF-8")
# Doesn't work - seems to think current codepage is utf8.
var hello2 = convert("Hellø, wørld!", getCurrentEncoding(), "UTF-8")
# Outputs correct text:
echo hello1
# Outputs corrupted text:
echo hello2 The above code works on my machine as indicated by the comments because my Windows console defaults to codepage 850 (this can be checked with the The reason why The following code prints out the values of all these three code pages, plus the value returned by import encodings
import windows
var inCP = GetConsoleCP()
echo "Console input codepage: " & $inCP
var outCP = GetConsoleOutputCP()
echo "Console output codepage: " & $outCP
var sysCP = GetACP()
echo "System codepage: " & $sysCP
echo "getCurrentEncoding() " & getCurrentEncoding() On my system this gives:
Now, the problem is that proc getCurrentEncoding*(): string =
## retrieves the current encoding. On Unix, always "UTF-8" is returned.
when defined(windows):
result = codePageToName(getACP())
else:
result = "UTF-8" I would say As for our problem, users expect to be able to print an UTF8 string to the console without problems nowadays. I'm not sure that converting to the console output codepage on windows behind the scenes is a good idea, because if you want to use the same Maybe simply setting the console output codepage to UTF-8 would be the best way to keep things simple? I guess this would make the vast majority of users happy. The only problem with this approach is that import windows
discard SetConsoleOutputCP(65001)
echo "Hellø, wørld!" # works OK
stdout.write("Hellø, wørld!") # fails with an exception Also, see this SO for further explanation: And it seems like the Go folks have encountered the exact same problem a while ago: |
It's the system encoding. I fail to see the advantage in renaming it to
No, some programmers might expect that, actual users do not use terminal apps at all on Windows. ;-) The same programmers which use "Git bash" (see above) that does not care about I think we should patch |
👍
Sure, but we should still support it.
How many people actually use this file on Windows? |
It's what the installer puts into the start menu fwiw. |
Okay, I got maybe a little carried away with my proposal... I agree that the UTF8 situation is kinda crap on the Windows console and we won't be able to handle it in a 100% satisfactorily way (e.g. even if Nim handled UTF-8 perfectly, the default Lucida Console font only supports a very small subset of UTF8, and most users don't bother installing a better font...). Setting the console output code page to 65001 is probably the best option, I agree. Doing it in As for @dom96 's question, I'm always using Alternatively, there could be a built-in mechanism in the runtime to set the code page to 65001 at startup and then restore it to the old value at exit. I'm not totally convinced this is a good idea, though. Something like this (tested it and works fine): import windows
var oldCP = GetConsoleOutputCP()
discard SetConsoleOutputCP(65001)
var s = "iÄäÜüß ЯБГДЖЙ"
echo s # ok
stdout.write(s) # crashes
discard SetConsoleOutputCP(oldCP) Oh, and the |
I think thats a mingw/libc bug, not easy to fix. |
The original problem is still not reproducible. Closing. |
The following program crashes with the Windows console set to Unicode output (chcp 65001):
With code page 1252 the program correctly outputs:
The text was updated successfully, but these errors were encountered: