Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Wrong charset after upgrade to msys2-runtime-3.1.4 #1974

Closed
nyfair opened this issue May 23, 2020 · 49 comments · Fixed by msys2/msys2-runtime#15
Closed

[BUG] Wrong charset after upgrade to msys2-runtime-3.1.4 #1974

nyfair opened this issue May 23, 2020 · 49 comments · Fixed by msys2/msys2-runtime#15

Comments

@nyfair
Copy link

nyfair commented May 23, 2020

Steps to Reproduce the Problem

python -c 'print("世界")'
荳也阜

Additional Context

locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=

The Following Methods Can All Get Correct Result

  1. Downgrade msys2-runtime to 3.0.7
  2. winpty python -c 'print("世界")'
  3. python -c 'print("世界")' | iconv -f utf-8 -t utf-8
  4. chcp.com 65001 && python -c 'print("世界")'
@nyfair nyfair added the bug label May 23, 2020
@mati865
Copy link
Collaborator

mati865 commented May 23, 2020

Cc @dscho

@dscho
Copy link
Contributor

dscho commented May 23, 2020

Sorry, I have no idea why this is happening (or why I was Cc:ed).

@mati865
Copy link
Collaborator

mati865 commented May 23, 2020

@dscho I though it could be potential issue for Git for Windows users. I suppose there is no correlation between this and your latest patches for the runtime?

@dscho
Copy link
Contributor

dscho commented May 23, 2020

It's possible, but python is not shipped with Git for Windows. It's possible that other software is potentially affected, too, of course.

@lazka
Copy link
Member

lazka commented May 23, 2020

I can't reproduce here

@dscho
Copy link
Contributor

dscho commented May 23, 2020

I can't reproduce here

Neither can I...

$ python -c 'print("世界")'
世界

@nyfair
Copy link
Author

nyfair commented May 25, 2020

I suppose it is a cygwin issue since it changed encoding handing from 3.1
This also affect other programs as well

cargo new a && cd a && sed -i 's/world/世界/' src/main.rs && cargo run
     Created binary (application) `a` package
   Compiling a v0.1.0 (D:\msys64\opt\crawl\a)
    Finished dev [unoptimized + debuginfo] target(s) in 0.44s
     Running `target\debug\a.exe`
Hello, 涓栫晫!

@dscho
Copy link
Contributor

dscho commented May 25, 2020

@nyfair it would be helpful if you could figure out how others can reproduce the problem you are seeing. So far the score is that two others cannot, while nobody else reproduced your issue yet.

@elieux
Copy link
Member

elieux commented May 25, 2020

@nyfair, your original post mentions various methods to get the correct result, but doesn't say if you're running mintty in the failing case. Other useful information would be the result of command -v python and output of chcp.com.

@fasterthanlime
Copy link

fasterthanlime commented May 26, 2020

Hi! I'm able to reproduce the issue (in mintty).

Here's a very simple reproduction, just run the following bash command:

$ chcp.com
Active code page: 437

$ msg="υπολογιστή"; echo $msg; cmd //c echo $msg
υπολογιστή
?π?????στ?

With the chcp.com 65001 workaround, the output is as expected:

$ chcp.com 65001
Active code page: 65001

$ msg="υπολογιστή"; echo $msg; cmd //c echo $msg
υπολογιστή
υπολογιστή

I also discovered this via C/Rust/Go programs, all using a mingw (not msys) toolchain. They all end up using the WriteConsoleW API. MSYS programs (like bash) still handle UTF-8 just fine.

--

System info

Windows version: Version 10.0.18363 Build 18363

msys2-runtime version: 3.1.4-3

Mintty settings: (I can also reproduce on another computer where locale is set to en_US and character set is set to UTF-8 (Unicode)).

image

Powershell Win-GetSystemLocale:

PS C:\Users\amos> Get-WinSystemLocale

LCID             Name             DisplayName
----             ----             -----------
1033             en-US            English (United States)

Region settings (Unicode beta not enabled):

image

chcp.com shows a 437 codepage in all of mintty mingw64 bash, cmd.exe, and PowerShell. (And Windows programs using Windows Unicode APIs show proper output in cmd.exe/PowerShell)

@fasterthanlime
Copy link

Since Cygwin 3.1 changes are a potential suspect, I tried installing the latest base Cygwin set of packages, and I'm not able to reproduce the issue there.

Here's a comparison screenshot - the only affected combo is MSYS2+mintty:

image

@lazka
Copy link
Member

lazka commented May 26, 2020

$ msg="υπολογιστή"; echo $msg; cmd //c echo $msg
υπολογιστή
?π?????στ?

I can reproduce this. And MSYS=enable_pcon fixes things. So maybe cygwin broke the non conpty fallback case which we default to

@dscho
Copy link
Contributor

dscho commented May 28, 2020

So maybe cygwin broke the non conpty fallback case which we default to

Since #1974 (comment) reported that Cygwin is fine, that's unlikely. To be sure, I just tested and can confirm that Cygwin is unaffected.

@dscho
Copy link
Contributor

dscho commented May 28, 2020

Oh, but then, Cygwin always uses ConHost, and that has no problems now, does it? FWIW both MSYS2 and Git for Windows show that problem.

@dscho
Copy link
Contributor

dscho commented May 28, 2020

Wow, I just tried Cygwin's MinTTY with CYGWIN=disable_pcon and the prompt is already broken, it does not echo any command when navigating the command line history.

@dscho
Copy link
Contributor

dscho commented Jun 2, 2020

I just tried Cygwin's MinTTY with CYGWIN=disable_pcon and the prompt is already broken, it does not echo any command when navigating the command line history.

Okay, Cygwin v3.1.5 seems to have fixed MinTTY (it was in their release notes that a segfault was fixed). With this, I can confirm that it actually is Cygwin that broke non-pseudo console mode in MinTTY:

$ msg="υπολογιστή"; echo $msg; cmd /c echo $msg
υπολογιστή
?π?????στ?

@lazka
Copy link
Member

lazka commented Aug 24, 2020

Still broken with 3.1.7 (and pcon disabled)

@Archer73
Copy link

C code to reproduce the bug:

#include <stdio.h>

int main(){
  puts("Привет мир! Hello world!");
  return 0;
}

Output:

$ ./test1.exe
╨Я╤А╨╕╨▓╨╡╤В ╨╝╨╕╤А! Hello world!`
$ ./test1.exe | cat
Привет мир! Hello world!

@dscho
Copy link
Contributor

dscho commented Aug 30, 2020

I think there was a fix for that in Cygwin. I cherry-picked into Git for Windows' fork for testing; @Archer73 would you mind testing the current usr/bin/msys-2.0.dll from https://github.com/git-for-windows/git-sdk-64/?

@lazka
Copy link
Member

lazka commented Aug 30, 2020

Still the same with the copied usr/bin/msys-2.0.dll

@Archer73
Copy link

@dscho Сonfirm. Still the same error

@dscho
Copy link
Contributor

dscho commented Aug 31, 2020

I verified that this is also happening in Cygwin, and reported it to the Pseudo Console developer: https://cygwin.com/pipermail/cygwin-developers/2020-August/011951.html

@dscho
Copy link
Contributor

dscho commented Sep 1, 2020

Okay, we're coming closer. There is a work-around: cmd //c chcp 65001. I think we will essentially want to do that in the MSYS2 runtime already, something along the lines of the patch I proposed in https://cygwin.com/pipermail/cygwin-developers/2020-September/011962.html. I'm still waiting on an explanation from the Pseudo Console support developer, to see whether I am completely off the mark or not.

@dscho
Copy link
Contributor

dscho commented Sep 3, 2020

@tyan0 due to msys2/msys2-runtime@b757a21, the problems occur regardless, whether Pseudo Console support is enabled or not.

I would really like to see some effort to help users better.

@tyan0
Copy link

tyan0 commented Sep 3, 2020 via email

@tyan0
Copy link

tyan0 commented Sep 3, 2020 via email

@dscho
Copy link
Contributor

dscho commented Sep 3, 2020

For the record, reverting that commit works around that problem even when rebasing on top of Cygwin's main branch (i.e. the pre-v3.2.0 version). @lazka what do you think? Should we just write off Pseudo Console support as unsalvageable, revert that commit in the MSYS2 runtime, and be done with it?

@dscho
Copy link
Contributor

dscho commented Sep 3, 2020

Did you really set MSYS=enable_pcon BEFORE starting mintty? Pseudo console is initialized in master open, so it is necessary to set MSYS=enable_pcon before opening pty.

@tyan0 yes, I did that. Or at least a variation (which works, as I have verified): I use the /etc/git-bash.config method implemented here: https://github.com/git-for-windows/MINGW-packages/blob/7e7ea08ebc12ba7e80a1551cd77ea9fcba4d330b/mingw-w64-git/git-wrapper.c#L610-L648. Essentially, when the file /etc/git-bash.config is present, and contains a line starting with MSYS=, that environment variable is augmented (or initialized) accordingly.

@lazka
Copy link
Member

lazka commented Sep 3, 2020

@lazka what do you think? Should we just write off Pseudo Console support as unsalvageable, revert that commit in the MSYS2 runtime, and be done with it?

Can we just skip the conversion in the non-pcon case, so that we get the old behavior without pcon while still being able to enable pcon and get the unchanged upstream behavior?

@dscho
Copy link
Contributor

dscho commented Sep 3, 2020

Can we just skip the conversion in the non-pcon case, so that we get the old behavior without pcon while still being able to enable pcon and get the unchanged upstream behavior?

That's exactly what reverting this commit means ;-)

@dscho
Copy link
Contributor

dscho commented Sep 3, 2020

On Thu, 03 Sep 2020 02:52:00 -0700 Johannes Schindelin wrote: @tyan0 due to msys2/msys2-runtime@b757a21, the problems occur regardless, whether Pseudo Console support is enabled or not.
It should not. That code is not executed if pseudo console is enabled. The code just before that code is as follows.

if (get_ttyp ()->h_pseudo_console) {
  ...
  mb_str_free (buf); 
  continue;
}

I am talking about v3.1.7, not about the upcoming v3.2.0. And there, you will see this call:

https://github.com/cygwin/cygwin/blob/d72ea86d41e1295839de8cd9564bb8b44a8d862a/winsup/cygwin/fhandler_tty.cc#L2231-L2232

This call was introduced by cygwin/cygwin@b757a21 and is distinctly not guarded by any Pseudo Console-specific condition. That is the reason why, at least in my testing, reverting that commit did fix the issue reported in this here ticket (but only with disable_pcon, apparently the code is not even executed in Pseudo Console mode).

@lazka
Copy link
Member

lazka commented Sep 3, 2020

That's exactly what reverting this commit means ;-)

ok, then go ahead

@tyan0
Copy link

tyan0 commented Sep 3, 2020 via email

@tyan0
Copy link

tyan0 commented Sep 3, 2020 via email

@tyan0
Copy link

tyan0 commented Sep 4, 2020 via email

@tyan0
Copy link

tyan0 commented Sep 4, 2020 via email

@dscho
Copy link
Contributor

dscho commented Sep 7, 2020

I am talking about v3.1.7, not about the upcoming v3.2.0. And there, you will see this call:
Both are the same for this code.

That call might be the same for both versions, but it is only hit in the non-pseudo console path.

What I am driving at here is that your patch that made it into v3.2.0 and that creates a new Pseudo Console for every spawned non-Cygwin console process, that patch seems to make it impossible to set a default code page that reflects Cygwin's idea of the locale.

What about just adding SetConsoleCP (get_ttyp ()->term_code_page) SetConsoleOutputCP (get_ttyp ()->term_code_page) at the end of setup_locale()?

I tried that, but in my tests with v3.2.0 and Pseudo Console enabled, that did not fix the issue reported in this ticket.

With this modification, the charset conversion is disabled because GetConsoleOutputCP() returns get_ttyp ()->term_code_page by default. Moreover, if user changes code page by chcp.com, the conversion will be done.

No, it did not fix it in my tests. I hope to find some time to investigate further, as it really looks more and more like there are critical bugs lurking.

On Thu, 03 Sep 2020 06:48:34 -0700 Johannes Schindelin wrote:

Did you really set MSYS=enable_pcon BEFORE starting mintty? Pseudo console is initialized in master open, so it is necessary to set MSYS=enable_pcon before opening pty.
@tyan0 yes, I did that. Or at least a variation (which works, as I have verified): I use the /etc/git-bash.config method implemented here: https://github.com/git-for-windows/MINGW-packages/blob/7e7ea08ebc12ba7e80a1551cd77ea9fcba4d330b/mingw-w64-git/git-wrapper.c#L610-L648. Essentially, when the file /etc/git-bash.config is present, and contains a line starting with MSYS=, that environment variable is augmented (or initialized) accordingly.
No. This cannot enable pseudo console. It seems that MSYS is not set before opening mintty, but opening bash.

You misunderstand. The code I referenced executes before spawning MinTTY. So yes, it enables the Pseudo Console support, as my many tests verified.

@tyan0
Copy link

tyan0 commented Sep 8, 2020 via email

@dscho
Copy link
Contributor

dscho commented Sep 8, 2020

Do you mean the issue with python by "the issue reported in this ticket"?

Try msg="υπολογιστή"; echo $msg; cmd //c echo $msg

Do you mean the case where pseudo console is disabled? Or enabled?

Ideally, I would want to have the expected default code page in both modes. But if all I can get is in disable_pcon mode, then that's what I'll take (and in that case I'd remove the option to enable Pseudo Console support from Git for Windows' installer).

Could you please provide msys-2.0.dll v3.2.0 with the patch I proposed?

Which one?

Or, is there git repository of v3.2.0 where I can clone?

No, the MSYS2 runtime is still based on v3.1.7. What I did was to pull the main branch of https://github.com/cygwin/cygwin into a clone of https://github.com/msys2/msys2-runtime (and resolve the trivial merge conflict).

For v3.2.0, just adding SetConsoleCP (get_ttyp ()->term_code_page); SetConsoleOutputCP (get_ttyp ()->term_code_page); at the end of setup_locale() is enough.

In my experiments, this did exactly nothing when Pseudo Console support was enabled. It did not work around the encoding issues.

Not in v3.2.0.

In v3.1.7, yes. But in v3.2.0, you changed the architecture such that the Pseudo Consoles are created specifically for each spawned console application, and I did not find any way to set the code page for those consoles.

@tyan0
Copy link

tyan0 commented Sep 9, 2020

Do you mean the issue with python by "the issue reported in this ticket"?

Try msg="υπολογιστή"; echo $msg; cmd //c echo $msg

[msys2 v3.0.7]
$ uname -a; echo MSYS=$MSYS; msg="υπολογιστή"; echo $msg; cmd //c "chcp && echo $msg"
MINGW32_NT-10.0-18363 Express5800-S70 3.0.7-338.x86_64 2019-05-23 05:39 UTC x86_64 Msys
MSYS=
υπολογιστή
▒▒▒݂̃R▒[▒h ▒y▒[▒W: 932
▒҃΃̓Ƀ̓▒▒ǃЃ▒?

[msys2 v3.1.7 + patch (pseudo console disabled)]
$ uname -a; echo MSYS=$MSYS; msg="υπολογιστή"; echo $msg; cmd //c "chcp && echo $msg"
MINGW32_NT-10.0-18363 Express5800-S70 3.1.7-340.x86_64 2020-09-08 04:35 UTC x86_64 Msys
MSYS=
υπολογιστή
Active code page: 65001
υπολογιστή

[msys2 v3.1.7 + patch (pseudo console enabled)]
$ uname -a; echo MSYS=$MSYS; msg="υπολογιστή"; echo $msg; cmd //c "chcp && echo $msg"
MINGW32_NT-10.0-18363 Express5800-S70 3.1.7-340.x86_64 2020-09-08 04:35 UTC x86_64 Msys
MSYS=enable_pcon
υπολογιστή
現在のコード ページ: 932
υπολογιστή

Only v3.0.7 cause garbled output. It is very different from the report above.
The difference of the default codepage?

[msys2 v3.0.7]
$ uname -a; echo MSYS=$MSYS; msg="υπολογιστή"; echo $msg; cmd //c "chcp && echo $msg"
MINGW32_NT-10.0-18363 Express5800-S70 3.0.7-338.x86_64 2019-05-23 05:39 UTC x86_64 Msys
MSYS=
υπολογιστή
Active code page: 437
?▒?????▒▒?

[msys2 v3.1.7 + patch (pseudo console enabled)]
$ uname -a; echo MSYS=$MSYS; msg="υπολογιστή"; echo $msg; cmd //c "chcp && echo $msg"
MINGW32_NT-10.0-18363 Express5800-S70 3.1.7-340.x86_64 2020-09-08 04:35 UTC x86_64 Msys
MSYS=enable_pcon
υπολογιστή
Active code page: 437
υπολογιστή

The results are similar even under codepage 437.

Do you mean the case where pseudo console is disabled? Or enabled?

Ideally, I would want to have the expected default code page in both modes. But if all I can get is in disable_pcon mode, then that's what I'll take (and in that case I'd remove the option to enable Pseudo Console support from Git for Windows' installer).

In my environment, the problem does not occur when pseudo console is enabled.

Could you please provide msys-2.0.dll v3.2.0 with the patch I proposed?

Which one?

The msys-2.0.dll v3.2.0 with the patch

+ SetConsoleCP (get_ttyp ()->term_code_page);
+ SetConsoleOutputCP (get_ttyp ()->term_code_page);

with which you experience the issue.

In my experiments, this did exactly nothing when Pseudo Console support was enabled. It did not work around the encoding issues.

Not in v3.2.0.

In v3.1.7, yes. But in v3.2.0, you changed the architecture such that the Pseudo Consoles are created specifically for each spawned console application, and I did not find any way to set the code page for those consoles.

Again, it seems that it is not necessary to change code page if pseudo console is enabled.

@tyan0
Copy link

tyan0 commented Sep 9, 2020

It seems that v3.0.7 also needs chcp 65001 in my environment.

[msys2 v3.0.7]
$ uname -a; echo MSYS=$MSYS; msg="υπολογιστή"; echo $msg; cmd //c "chcp && echo $msg"
MINGW32_NT-10.0-18363 Express5800-S70 3.0.7-338.x86_64 2019-05-23 05:39 UTC x86_64 Msys
MSYS=
υπολογιστή
Active code page: 65001
υπολογιστή

@dscho
Copy link
Contributor

dscho commented Sep 22, 2020

@tyan0 thank you for all your help resolving this rather vexing issue. In my tests of the current Cygwin tip merged into the MSYS2 runtime, the υπολογιστή test passes with and without Pseudo Console support.

@fasterthanlime
Copy link

Hurray! Thanks for the hard work on this all 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants