-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Wrong charset after upgrade to msys2-runtime-3.1.4 #1974
Comments
Cc @dscho |
Sorry, I have no idea why this is happening (or why I was Cc:ed). |
@dscho I though it could be potential issue for Git for Windows users. I suppose there is no correlation between this and your latest patches for the runtime? |
It's possible, but |
I can't reproduce here |
Neither can I... $ python -c 'print("世界")'
世界 |
I suppose it is a cygwin issue since it changed encoding handing from 3.1
|
@nyfair it would be helpful if you could figure out how others can reproduce the problem you are seeing. So far the score is that two others cannot, while nobody else reproduced your issue yet. |
@nyfair, your original post mentions various methods to get the correct result, but doesn't say if you're running mintty in the failing case. Other useful information would be the result of |
Hi! I'm able to reproduce the issue (in mintty). Here's a very simple reproduction, just run the following bash command: $ chcp.com
Active code page: 437
$ msg="υπολογιστή"; echo $msg; cmd //c echo $msg
υπολογιστή
?π?????στ? With the
I also discovered this via C/Rust/Go programs, all using a mingw (not msys) toolchain. They all end up using the -- System infoWindows version: msys2-runtime version: Mintty settings: (I can also reproduce on another computer where locale is set to en_US and character set is set to Powershell Win-GetSystemLocale: PS C:\Users\amos> Get-WinSystemLocale
LCID Name DisplayName
---- ---- -----------
1033 en-US English (United States) Region settings (Unicode beta not enabled):
|
I can reproduce this. And |
Since #1974 (comment) reported that Cygwin is fine, that's unlikely. To be sure, I just tested and can confirm that Cygwin is unaffected. |
Oh, but then, Cygwin always uses ConHost, and that has no problems now, does it? FWIW both MSYS2 and Git for Windows show that problem. |
Wow, I just tried Cygwin's MinTTY with |
Okay, Cygwin v3.1.5 seems to have fixed MinTTY (it was in their release notes that a segfault was fixed). With this, I can confirm that it actually is Cygwin that broke non-pseudo console mode in MinTTY: $ msg="υπολογιστή"; echo $msg; cmd /c echo $msg
υπολογιστή
?π?????στ? |
Still broken with 3.1.7 (and pcon disabled) |
C code to reproduce the bug: #include <stdio.h>
int main(){
puts("Привет мир! Hello world!");
return 0;
} Output:
|
I think there was a fix for that in Cygwin. I cherry-picked into Git for Windows' fork for testing; @Archer73 would you mind testing the current |
Still the same with the copied |
@dscho Сonfirm. Still the same error |
I verified that this is also happening in Cygwin, and reported it to the Pseudo Console developer: https://cygwin.com/pipermail/cygwin-developers/2020-August/011951.html |
Okay, we're coming closer. There is a work-around: |
@tyan0 due to msys2/msys2-runtime@b757a21, the problems occur regardless, whether Pseudo Console support is enabled or not. I would really like to see some effort to help users better. |
On Thu, 03 Sep 2020 02:52:00 -0700 Johannes Schindelin wrote:
@tyan0 due to msys2/msys2-runtime@b757a21, the problems occur regardless, whether Pseudo Console support is enabled or not.
It should not. That code is not executed if pseudo console is enabled.
The code just before that code is as follows.
if (get_ttyp ()->h_pseudo_console)
{
...
mb_str_free (buf);
continue;
}
…--
Takashi Yano <takashi.yano@nifty.ne.jp>
|
On Thu, 03 Sep 2020 02:52:00 -0700 Johannes Schindelin wrote:
@tyan0 due to msys2/msys2-runtime@b757a21, the problems occur regardless, whether Pseudo Console support is enabled or not.
Did you really set MSYS=enable_pcon _BEFORE_ starting mintty?
Pseudo console is initialized in master open, so it is
necessary to set MSYS=enable_pcon before opening pty.
…--
Takashi Yano <takashi.yano@nifty.ne.jp>
|
For the record, reverting that commit works around that problem even when rebasing on top of Cygwin's main branch (i.e. the pre-v3.2.0 version). @lazka what do you think? Should we just write off Pseudo Console support as unsalvageable, revert that commit in the MSYS2 runtime, and be done with it? |
@tyan0 yes, I did that. Or at least a variation (which works, as I have verified): I use the |
Can we just skip the conversion in the non-pcon case, so that we get the old behavior without pcon while still being able to enable pcon and get the unchanged upstream behavior? |
That's exactly what reverting this commit means ;-) |
I am talking about v3.1.7, not about the upcoming v3.2.0. And there, you will see this call: This call was introduced by cygwin/cygwin@b757a21 and is distinctly not guarded by any Pseudo Console-specific condition. That is the reason why, at least in my testing, reverting that commit did fix the issue reported in this here ticket (but only with |
ok, then go ahead |
On Thu, 03 Sep 2020 13:40:04 -0700
Johannes Schindelin wrote:
> > On Thu, 03 Sep 2020 02:52:00 -0700 Johannes Schindelin wrote: @tyan0 due to ***@***.***(msys2/msys2-runtime@b757a21), the problems occur regardless, whether Pseudo Console support is enabled or not.
> It should not. That code is not executed if pseudo console is enabled. The code just before that code is as follows.
> ```c
> if (get_ttyp ()->h_pseudo_console) {
> ...
> mb_str_free (buf);
> continue;
> }
>```
I am talking about v3.1.7, not about the upcoming v3.2.0. And there, you will see this call:
Both are the same for this code.
https://github.com/cygwin/cygwin/blob/d72ea86d41e1295839de8cd9564bb8b44a8d862a/winsup/cygwin/fhandler_tty.cc#L2231-L2232
This call was introduced by cygwin/cygwin@b757a21 and is distinctly _not_ guarded by any Pseudo Console-specific condition. That is the reason why, at least in _my_ testing, reverting that commit did fix the issue reported in this here ticket (but _only_ with `disable_pcon`, apparently the code is not even executed in Pseudo Console mode).
If the pseudo console is enabled, get_ttyp()->h_pseudo_console
is set to non NULL.
Then, there fore the following if block will be executed.
https://github.com/cygwin/cygwin/blob/d72ea86d41e1295839de8cd9564bb8b44a8d862a/winsup/cygwin/fhandler_tty.cc#L2164-L2229
This block is end with "continue" as:
https://github.com/cygwin/cygwin/blob/d72ea86d41e1295839de8cd9564bb8b44a8d862a/winsup/cygwin/fhandler_tty.cc#L2228
therefore,
https://github.com/cygwin/cygwin/blob/d72ea86d41e1295839de8cd9564bb8b44a8d862a/winsup/cygwin/fhandler_tty.cc#L2231-L2232
will not be executed if pseudo console is enabled.
…--
Takashi Yano <takashi.yano@nifty.ne.jp>
|
On Thu, 03 Sep 2020 09:16:03 -0700 Christoph Reiter wrote:
> @lazka what do you think? Should we just write off Pseudo Console support as unsalvageable, revert that commit in the MSYS2 runtime, and be done with it?
Can we just skip the conversion in the non-pcon case, so that we get the old behavior without pcon while still being able to enable pcon and get the unchanged upstream behavior?
What about just adding
SetConsoleCP (get_ttyp ()->term_code_page)
SetConsoleOutputCP (get_ttyp ()->term_code_page)
at the end of setup_locale()?
With this modification, the charset conversion is disabled
because GetConsoleOutputCP() returns get_ttyp ()->term_code_page
by default. Moreover, if user changes code page by chcp.com,
the conversion will be done.
If the conversion code
https://github.com/cygwin/cygwin/blob/d72ea86d41e1295839de8cd9564bb8b44a8d862a/winsup/cygwin/fhandler_tty.cc#L2231-L2232
is deleted, garbled output will occur when user chages the
code page to other than get_ttyp ()->term_code_page.
…--
Takashi Yano <takashi.yano@nifty.ne.jp>
|
On Thu, 03 Sep 2020 06:48:34 -0700 Johannes Schindelin wrote:
> Did you really set MSYS=enable_pcon _BEFORE_ starting mintty? Pseudo console is initialized in master open, so it is necessary to set MSYS=enable_pcon before opening pty.
@tyan0 yes, I did that. Or at least a variation (which works, as I have verified): I use the `/etc/git-bash.config` method implemented here: https://github.com/git-for-windows/MINGW-packages/blob/7e7ea08ebc12ba7e80a1551cd77ea9fcba4d330b/mingw-w64-git/git-wrapper.c#L610-L648. Essentially, when the file `/etc/git-bash.config` is present, and contains a line starting with `MSYS=`, that environment variable is augmented (or initialized) accordingly.
No. This cannot enable pseudo console. It seems that
MSYS is not set before opening mintty, but opening
bash.
Please try:
1) Open command prompt.
2) set MSYS=enable_pcon
3) Start git-bash.exe from that command prompt.
Or set MSYS=enable_pcon in system environment.
…--
Takashi Yano <takashi.yano@nifty.ne.jp>
|
On Fri, 4 Sep 2020 08:29:10 +0900 Takashi Yano ***@***.***> wrote:
What about just adding
SetConsoleCP (get_ttyp ()->term_code_page)
SetConsoleOutputCP (get_ttyp ()->term_code_page)
at the end of setup_locale()?
Sorry, I meant:
if (!get_ttyp->h_pseudo_console)
{
SetConsoleCP (get_ttyp ()->term_code_page);
SetConsoleOutputCP (get_ttyp ()->term_code_page);
}
…--
Takashi Yano <takashi.yano@nifty.ne.jp>
|
That call might be the same for both versions, but it is only hit in the non-pseudo console path. What I am driving at here is that your patch that made it into v3.2.0 and that creates a new Pseudo Console for every spawned non-Cygwin console process, that patch seems to make it impossible to set a default code page that reflects Cygwin's idea of the locale.
I tried that, but in my tests with v3.2.0 and Pseudo Console enabled, that did not fix the issue reported in this ticket.
No, it did not fix it in my tests. I hope to find some time to investigate further, as it really looks more and more like there are critical bugs lurking.
You misunderstand. The code I referenced executes before spawning MinTTY. So yes, it enables the Pseudo Console support, as my many tests verified. |
On Mon, 07 Sep 2020 14:03:56 -0700
Johannes Schindelin <notifications@github.com> wrote:
> > I am talking about v3.1.7, not about the upcoming v3.2.0. And there, you will see this call:
> Both are the same for this code.
That call might be the same for both versions, but it is _only_ hit in the non-pseudo console path.
What I am driving at here is that your patch that made it into v3.2.0 and that creates a new Pseudo Console for every spawned non-Cygwin console process, that patch seems to make it impossible to set a default code page that reflects Cygwin's idea of the locale.
> What about just adding SetConsoleCP (get_ttyp ()->term_code_page) SetConsoleOutputCP (get_ttyp ()->term_code_page) at the end of setup_locale()?
I tried that, but in my tests with v3.2.0 and Pseudo Console enabled, that did not fix the issue reported in this ticket.
Do you mean the issue with python by "the issue reported in this ticket"?
In my environment, that case causes garbled output even in msys2 3.0.7.
Setting PYTHONUTF8=1 seems to be necessary for me.
With this setting, python test case works in:
- msys 3.0.7
- msys 3.1.7 + patch with pseudo console disabled
- msys 3.1.7 + patch with pseudo console enabled
without garbled output in my environment.
> With this modification, the charset conversion is disabled because GetConsoleOutputCP() returns get_ttyp ()->term_code_page by default. Moreover, if user changes code page by chcp.com, the conversion will be done.
No, it did not fix it in my tests. I hope to find some time to investigate further, as it really looks more and more like there are critical bugs lurking.
Do you mean the case where pseudo console is disabled? Or enabled?
Could you please provide msys-2.0.dll v3.2.0 with the patch I proposed?
Or, is there git repository of v3.2.0 where I can clone?
I would like to look into the issue you experienced.
For v3.2.0, just adding
SetConsoleCP (get_ttyp ()->term_code_page);
SetConsoleOutputCP (get_ttyp ()->term_code_page);
at the end of setup_locale() is enough.
(Checking get_ttyp()->h_pseudo_console is not necessary.)
> > On Thu, 03 Sep 2020 06:48:34 -0700 Johannes Schindelin wrote:
> > > Did you really set MSYS=enable_pcon _BEFORE_ starting mintty? Pseudo console is initialized in master open, so it is necessary to set MSYS=enable_pcon before opening pty.
> > @tyan0 yes, I did that. Or at least a variation (which works, as I have verified): I use the `/etc/git-bash.config` method implemented here: https://github.com/git-for-windows/MINGW-packages/blob/7e7ea08ebc12ba7e80a1551cd77ea9fcba4d330b/mingw-w64-git/git-wrapper.c#L610-L648. Essentially, when the file `/etc/git-bash.config` is present, and contains a line starting with `MSYS=`, that environment variable is augmented (or initialized) accordingly.
> No. This cannot enable pseudo console. It seems that MSYS is not set before opening mintty, but opening bash.
You misunderstand. The code I referenced executes _before_ spawning MinTTY. So yes, it enables the Pseudo Console support, as my _many_ tests verified.
Sorry. I was something wrong. I tried /etc/git-bash.config with
MSYS=enable_pcon again, and confirmed that pseudo console is enabled.
…--
Takashi Yano <takashi.yano@nifty.ne.jp>
|
Try
Ideally, I would want to have the expected default code page in both modes. But if all I can get is in
Which one?
No, the MSYS2 runtime is still based on v3.1.7. What I did was to pull the main branch of https://github.com/cygwin/cygwin into a clone of https://github.com/msys2/msys2-runtime (and resolve the trivial merge conflict).
In my experiments, this did exactly nothing when Pseudo Console support was enabled. It did not work around the encoding issues. Not in v3.2.0. In v3.1.7, yes. But in v3.2.0, you changed the architecture such that the Pseudo Consoles are created specifically for each spawned console application, and I did not find any way to set the code page for those consoles. |
[msys2 v3.0.7] [msys2 v3.1.7 + patch (pseudo console disabled)] [msys2 v3.1.7 + patch (pseudo console enabled)] Only v3.0.7 cause garbled output. It is very different from the report above. [msys2 v3.0.7] [msys2 v3.1.7 + patch (pseudo console enabled)] The results are similar even under codepage 437.
In my environment, the problem does not occur when pseudo console is enabled.
The msys-2.0.dll v3.2.0 with the patch
with which you experience the issue.
Again, it seems that it is not necessary to change code page if pseudo console is enabled. |
It seems that v3.0.7 also needs chcp 65001 in my environment. [msys2 v3.0.7] |
@tyan0 thank you for all your help resolving this rather vexing issue. In my tests of the current Cygwin tip merged into the MSYS2 runtime, the υπολογιστή test passes with and without Pseudo Console support. |
Hurray! Thanks for the hard work on this all 🎉 |
Steps to Reproduce the Problem
python -c 'print("世界")'
荳也阜
Additional Context
locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=
The Following Methods Can All Get Correct Result
winpty python -c 'print("世界")'
python -c 'print("世界")' | iconv -f utf-8 -t utf-8
chcp.com 65001 && python -c 'print("世界")'
The text was updated successfully, but these errors were encountered: