Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong unicode string output #11618

Closed
retsyo opened this issue Jun 29, 2019 · 4 comments

Comments

@retsyo
Copy link

commented Jun 29, 2019

I am using latest cloned-and-built nim in msys2/mingw64 on win7 64 bits

For GCC in mingw64, if we save

// C code
#include <stdio.h>
#include <stdlib.h>

int main()
{printf("你好");}

into a utf8-encoded file, then the generated EXE can display 你好 as expected in both msys2 and windows dos prompt( we have to do chcp 65001 firstly to change the code page into utf8)

But, for nim, if we save

# nim code
echo "你好"

into a utf8-encoded file, then the generated EXE can display 你好 as expected in msys2. But ???好 in windows dos prompt if we change the code page into utf8.

As for V(vlang.io), which is anthor language like nim who uses GCC compiler, if we save

// V code
println('你好')

into a utf8-encoded file, then the generated EXE can display 你好 as expected in both msys2 and windows dos prompt( we have to do chcp 65001 firstly to change the code page into utf8)

So why nim generated EXE displayed wrong characters?

@jangko

This comment has been minimized.

Copy link
Contributor

commented Jun 30, 2019

When dealing with file system on windows, nim convert the string from utf8 to utf16 or from utf16 to utf8 and using windows unicode API.
But the same conversion doesn't happen when echoing to console/terminal and it also use libc API, not windows unicode API.
There must be some settings not correct/missed. Turn out the behavior of libc is not portable.
This is not the first time this bug occurs. It has occurred before, then fixed, then appears again.

I also encountered the same problem when writing my nim-noise. Using windows unicode API+UTF16 seems solve the problem, and let POSIX OSes using libc API+UTF8.

Araq added a commit that referenced this issue Jul 1, 2019

@Araq Araq closed this in 9b94985 Jul 1, 2019

@retsyo

This comment has been minimized.

Copy link
Author

commented Jul 1, 2019

do you mean nim c -d:nimBinaryStdFiles -r a.nim?
The generated EXE behaves like exactly what I mentioned above. Nothing changed.

@Araq

This comment has been minimized.

Copy link
Member

commented Jul 2, 2019

I mean without the -d:nimBinaryStdFiles.

narimiran added a commit that referenced this issue Jul 2, 2019

fixes #11618 (#11631)
(cherry picked from commit 9b94985)
@retsyo

This comment has been minimized.

Copy link
Author

commented Jul 2, 2019

I am confused, nothing seems to be changed

$ nim
Nim Compiler Version 0.20.99 [Windows: amd64]
Compiled at 2019-07-02
Copyright (c) 2006-2019 by Andreas Rumpf

in msys2

$ nim c -r a.nim   # this will output the corrected unicode characters in msys2
......
Hint: R:\a.exe  [Exec]
你好

which is we expected

in dos prompt, no proper unicode characters are displayed:

R:\>rem change code page to utf8
R:\>chcp 65001   
R:\>a
���好

R:\>rem chinese
R:\>chcp 936   
R:\>a
浣犲ソ

R:\>rem utf16
R:\>chcp 10000  
R:\>a
你好





@Araq Araq reopened this Aug 16, 2019

@Araq Araq added this to the v1 milestone Aug 17, 2019

Araq added a commit that referenced this issue Aug 17, 2019

@Araq Araq closed this in 7cb3145 Aug 17, 2019

Araq added a commit that referenced this issue Aug 18, 2019

more enhancements for #11618 (#11976)
* finish the Windows IO layer changes; refs #11618
* added system.getOsFileHandle which is less error-prone on Windows
* make tests green again
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.