Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'é' is shown as a double-width character #1050

Closed
k-takata opened this issue Oct 25, 2020 · 46 comments
Closed

'é' is shown as a double-width character #1050

k-takata opened this issue Oct 25, 2020 · 46 comments
Labels

Comments

@k-takata
Copy link
Contributor

From mintty 3.4.1, 'é' (U+00E9) is shown as a double-width character.

image

Font=Consolas
Term=xterm-256color
Locale=ja_JP
Charset=UTF-8
Language=@
@mintty
Copy link
Owner

mintty commented Oct 25, 2020

Handling of locale and character encoding was historically grown and having a number of inconsistencies, so there was a major revision in 3.4.1. Parameters -o Locale=ja_JP -o Charset=UTF-8 would now invoke locale ja_JP.UTF-8 as might be expected.

The reason for the changed behaviour is actually that CJK locales with UTF-8 encoding have ambiguous-width double-wide property in cygwin/newlib. I have submitted a newlib patch already to change that and make it consistent with the respective Linux/glibc locales, with no response so far; maybe you'd like to report this to the cygwin mailing list.

For now, there are two workarounds in mintty: -o Charwidth=ambig-narrow or -o OldLocale=yes.

@k-takata
Copy link
Contributor Author

When I set Charwidth=ambig-narrow, 'é' is shown as a single-width character, however, a backspace deletes 2 cells.
I'll use OldLocale=yes for now.

@mintty
Copy link
Owner

mintty commented Oct 27, 2020

I cannot reproduce that. With Locale=ja_JP Charset=UTF-8 Charwidth=ambig-narrow mintty sets the locale to ja_JP.UTF-8@cjknarrow which indicates proper single width to the shell. Maybe there is some additional setup in your shell startup scripts that overrides this?

@k-takata
Copy link
Contributor Author

I tried Charwidth=ambig-narrow again, and now it works fine. Not sure why it didn't work.

@mintty
Copy link
Owner

mintty commented Oct 27, 2020

Thanks. Would you report the locale issue to the cygwin mailing list (so I don't have to nag about my own patch...)?

@k-takata
Copy link
Contributor Author

I tried Charwidth=ambig-narrow again, and now it works fine. Not sure why it didn't work.

It seems that I found a problem.

When I do set LANG=ja_JP.UTF-8 on cmd.exe, then I execute mintty -o Charwidth=ambig-narrow, mintty sets $LC_CTYPE to ja_JP.UTF-8@cjknarrow, but $LANG is still ja_JP.UTF-8.
Deleting 'é' on bash works fine.

When I do set LANG= on cmd.exe, then I execute mintty -o Charwidth=ambig-narrow, mintty doesn't set $LC_CTYPE, and $LANG is also empty.
Deleting 'é' on bash doesn't work well.

@mintty
Copy link
Owner

mintty commented Oct 27, 2020

mintty sets $LC_CTYPE to ja_JP.UTF-8@cjknarrow, but $LANG is still ja_JP.UTF-8

That should be OK as by definition of the locale mechanism, specific LC_ variables override LANG for the respective locale category, and LC_ALL overrides them all.

When I do set LANG= on cmd.exe, then I execute mintty -o Charwidth=ambig-narrow, mintty doesn't set $LC_CTYPE, and $LANG is also empty.
Deleting 'é' on bash doesn't work well.

I could reproduce that an hour ago, but I don't reproduce it anymore. Weird.

@mintty
Copy link
Owner

mintty commented Oct 27, 2020

env -i /bin/mintty -o Locale=ja_JP -o Charset=utf8 produces the issue.

@k-takata
Copy link
Contributor Author

Is this condition correct?

mintty/src/charset.c

Lines 457 to 458 in 7573e73

if (lc && strcmp(lc, loc) == 0) // if LANG is not set properly
setenv("LC_CTYPE", loc, true); // set LC_CTYPE

I think LC_CTYPE should be set when LANG is not set or LANG is different from the specified loc value.
i.e.

      if (!lc || strcmp(lc, loc) != 0)

@k-takata
Copy link
Contributor Author

k-takata commented Oct 30, 2020

Hmm, now $LANG becomes unset.
mintty.exe -o Locale=ja_JP -o Charset=UTF-8 -

When I execute this from a command prompt, $LANG becomes ja_JP.UTF-8.
c:\cygwin64\bin\bash.exe --login

@k-takata
Copy link
Contributor Author

It seems that 7dd20ba causes the issue, but I'm not sure why setting LC_CTYPE makes LANG being unset.

@k-takata
Copy link
Contributor Author

Ah, /etc/profile.d/lang.sh has the following lines:

# if no locale variable is set, indicate terminal charset via LANG
test -z "${_LC_ALL_SET_:-${LC_CTYPE:-$LANG}}" && export LANG=$(/usr/bin/locale -uU)

Therefore, if LC_CTYPE is set, LANG won't be exported.
Should LANG also be updated in set_locale_env()?

@mintty
Copy link
Owner

mintty commented Oct 30, 2020

No, the intent is to modify the locale environment as little as possible and only enforce the LC_CTYPE category.
Suppose someone has set LANG=fr_FR to get French messages from applications and then runs mintty with Locale=ja_JP Charset=whatever. The LC_CTYPE category would need to be updated to setup the requested Charset but if LANG were overwritten, all other locale categories would be changed too, including LC_MESSAGES, and the user would get Japanese messages rather than French.

@mintty
Copy link
Owner

mintty commented Oct 30, 2020

Actually, this was different until 3.4.0, see https://github.com/mintty/mintty/blob/master/src/child.c#L288.
However, I believe this approach was wrong as it would patronize all locale settings. Arguably, people might want just that if they use the Locale option but I think mintty should contrain itself to handling the character encoding, other locale stuff is not the concern of a terminal. People can set LANG="${LC_ALL:-${LC_CTYPE:-$LANG}}" in their shell profile if they want.

@k-takata
Copy link
Contributor Author

Then, I have to set LANG explicitly?

@mintty
Copy link
Owner

mintty commented Oct 30, 2020

You don't need to if you don't care about other locale categories.
I have just fixed the documentation to describe the new behaviour, please check.

@k-takata
Copy link
Contributor Author

The problem is that now LANG becomes empty when mintty is used. It doesn't matter whether Locale is explicitly set (in .minttyrc or by -o) or not.

@mintty
Copy link
Owner

mintty commented Oct 30, 2020

LANG does not really "become" empty, mintty does not clear it. It is just not set explicitly anymore.

@mintty
Copy link
Owner

mintty commented Oct 30, 2020

And, as I tried to point out in my comment before, I do not see this as a problem.

@k-takata
Copy link
Contributor Author

Hmm, still unclear to me.

  1. When I execute bash.exe --login without using mintty, LANG is set to the default value by /etc/profile.d/lang.sh. (ja_JP.UTF-8 in my environment.)
  2. When I use mintty 3.4.0 or before with mintty -o Locale=ja_JP -o Charset=UTF-8 -, LANG is set to ja_JP.UTF-8@cjknarrow by mintty (which was wrong, right?).
  3. When I use the latest commit with mintty -o Locale=ja_JP -o Charset=UTF-8 -, LANG is kept unset and LC_CTYPE is set to ja_JP.UTF-8@cjknarrow by mintty.

So, if I want to set LANG to the default value (in my case: ja_JP.UTF-8 (or ja_JP.UTF-8@cjknarrow is also okay)), what should I do? Add LANG="${LC_ALL:-${LC_CTYPE:-$LANG}}" in my shell profile?

It would be nice if LANG is automatically set to the default value without editing my shell profile, though.

@mintty
Copy link
Owner

mintty commented Oct 30, 2020

Why do you want mintty to set LANG? One alternative I can imagine is to use the setlocale API call only and not set any locale variable (unless necessary to fix previous wrong settings by overwriting them).
(However, my quick first attempt to achieve that yields a wrong locale setting effectively...)

@k-takata
Copy link
Contributor Author

I actually don't care whether mintty sets LANG or not. I just want LANG is set to the default value when bash is opened. (But the startup script doesn't seem to allow it.)

@mintty
Copy link
Owner

mintty commented Oct 30, 2020

The cygwin shell startup scripts look a bit weird to me, but I'm pretty confident that what mintty does now (with the fix you suggested above) is correct for locale setup as far as character encoding is concerned.

@mintty
Copy link
Owner

mintty commented Oct 30, 2020

One alternative I can imagine is to use the setlocale API call only and not set any locale variable.

That does not work as /etc/profile.d/lang.sh will not respect such setting but overwrite it from the Windows system locale via locale -uU. So we'd have to get that changed first in cygwin...

@mintty
Copy link
Owner

mintty commented Oct 30, 2020

Trying get problems solved, what do you need LANG for?
And couldn't you set LANG="${LC_ALL:-${LC_CTYPE:-$LANG}}" in ~/.profile?

@k-takata
Copy link
Contributor Author

k-takata commented Oct 31, 2020

Yes, of course, adding that in ~/.bash_profile can also solve it.
But, adding OldLocale=yes to ~/.minttyrc is easier than adding Charwidth=ambig-narrow to it and adding LANG="${LC_ALL:-${LC_CTYPE:-$LANG}}" in ~/.bash_profile to bring back the old behavior.

Edit:
I had to add export LANG="${LC_ALL:-${LC_CTYPE:-$LANG}}" in ~/.bash_profile.

@mintty
Copy link
Owner

mintty commented Oct 31, 2020

I'd still appreciate to know what the benefit of the old behaviour is.
Apart from that, I'm thinking about an alternative solution:
Revert to previous behaviour or approximate it only if parameter Locale is set, stay with current behaviour otherwise.
"Approximate" would mean: if LC_ALL is set, mintty overwrites it and does not touch the others, if it is not set, it sets LANG and clears the others.

@mintty
Copy link
Owner

mintty commented Oct 31, 2020

Or rather like this: if parameter Locale is used, in addition to the current settings, always set LANG to the proper locale value, but do not clear any other LC_ variables in case they were preset on purpose.

@k-takata
Copy link
Contributor Author

Thank you. I think this behavior is good.

@k-takata
Copy link
Contributor Author

k-takata commented Nov 4, 2020

Adding export might be better?

--- a/docs/mintty.1
+++ b/docs/mintty.1
@@ -1485,7 +1485,7 @@ If you prefer basic locale setup for all categories to be affected by
 the LC_CTYPE locale, it is suggested to add the following to the 
 shell startup scripts:
 .br
-	LANG="${LC_ALL:-${LC_CTYPE:-$LANG}}"
+	export LANG="${LC_ALL:-${LC_CTYPE:-$LANG}}"
 
 .TP
 \fBCharacter set\fP (Charset=)

Or, this line might be able to remove, because the behavior is changed by b54f202.

mintty added a commit that referenced this issue Nov 4, 2020
@mintty
Copy link
Owner

mintty commented Nov 4, 2020

Amended the comment and moved it above the now final legacy behaviour paragraph.

@k-takata
Copy link
Contributor Author

k-takata commented Nov 4, 2020

Thank you. LGTM!
So, ready for a new release?

@mintty
Copy link
Owner

mintty commented Nov 4, 2020

Released 3.4.2.

@mintty mintty closed this as completed Nov 4, 2020
@mintty
Copy link
Owner

mintty commented Nov 4, 2020

Recent changes have an unpleasant side effect: default invocation (without Locale option) now leads to setting locale to C.UTF-8 rather than the system locale, e.g. de_DE.UTF-8. Expecting complaints soon, this needs to be fixed.

@mintty
Copy link
Owner

mintty commented Nov 5, 2020

Uploaded another fix. Planning to release today, please check.

@k-takata
Copy link
Contributor Author

k-takata commented Nov 5, 2020

It looks almost okay.
One question: When I don't set Locale and set Charwidth to ambig-narrow, $LANG is set to ja_JP.UTF-8 (on my environment) instead of ja_JP.UTF-8@cjknarrow and $LC_CTYPE is kept unset. Is this intended?
I think that this is acceptable, but it might be better to document that Charwidth may not work when Locale is not set.

@mintty
Copy link
Owner

mintty commented Nov 5, 2020

So mintty -o Charwidth=ambig-narrow with no locale environment variable set (like from a shortcut) leads to ja_JP.UTF-8, probably as set by locale -uU in /etc/profile.d/lang.sh deriving from your Japanese Windows system setting? I cannot easily test this as my setting is different, well I could change lang.sh for testing... Either way, was that different in 3.4.0?

@k-takata
Copy link
Contributor Author

k-takata commented Nov 6, 2020

So mintty -o Charwidth=ambig-narrow with no locale environment variable set (like from a shortcut) leads to ja_JP.UTF-8, probably as set by locale -uU in /etc/profile.d/lang.sh deriving from your Japanese Windows system setting?

Yes, but - option is needed to invoke bash as a login shell. $LANG is empty without the -.

Either way, was that different in 3.4.0?

Ah, I was not aware that behavior, but that is the same as 3.4.0.
I think it is ready for release.

@mintty
Copy link
Owner

mintty commented Nov 8, 2020

There's also quite the opposite problem: If you start mintty from scratch (i.e. with empty locale environment, e.g. from desktop) with a CJK system locale detected from Windows in /etc/profile.d/lang.*, your terminal locale will be UTF-8 narrow while your shell locale will be UTF-8 ambig-wide. So backspacing over é will move left one position too far (not in 3.4.2 with that bug... but again with its "fix").
This problem, however, is not related to recent changes, it has existed for years already. I can imagine 3 solutions:

  • an ugly workaround in mintty that mimics the locale -uU logic, foreseeing what lang.sh will do and setting up mintty to match in advance...
  • changing the cygwin CJK UTF-8 locales to be ambig-narrow (patch already submitted)
  • changing cygwin to move the lang.sh logic into the cygwin dll, so that it affects already the terminal when starting

One of us should present the problem to the cygwin mailing list...

@k-takata
Copy link
Contributor Author

One of us should present the problem to the cygwin mailing list...

I haven't joined the cygwin mailing list. So, I'll leave it to you.

@mintty
Copy link
Owner

mintty commented Nov 10, 2020

Uploaded a workaround for the empty locale situation which is now handled to be consistent with shell initialisation of empty locale (solution 1 above). Testing appreciated.

@k-takata
Copy link
Contributor Author

I summarized the behavior in a table:
https://docs.google.com/spreadsheets/d/1WIrJKRaX2rOoAthyBx0yZgnxS-3uaxXSbEbyBr2VX-c/edit?usp=sharing
It looks almost okay except that $LANG is not set in the cells D5 and F5.
Do you think it's okay? (I don't have any strong opinion about this.)

@mintty
Copy link
Owner

mintty commented Nov 11, 2020

Thanks a lot for making this overview.
Yes, if -o Locale is not used, mintty now sets LC_CTYPE, which causes LANG not to be set in /etc/profile... anymore.
This is on purpose; it enforces a proper LC_CTYPE locale setting without affecting other locale categories.

@k-takata
Copy link
Contributor Author

Thank you for the explanation. I got it.

@mintty
Copy link
Owner

mintty commented Nov 11, 2020

Released 3.4.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants