Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider making ConPTY and Windows Terminal treat all ambiguous-width characters as 1 cell instead of asking the font #2066

Closed
DHowett-MSFT opened this issue Jul 23, 2019 · 15 comments · Fixed by #2928
Assignees
Labels
Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Issue-Task It's a feature request, but it doesn't really need a major design. Needs-Tag-Fix Doesn't match tag requirements Product-Conpty For console issues specifically related to conpty Product-Terminal The new Windows Terminal. Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release.
Milestone

Comments

@DHowett-MSFT
Copy link
Contributor

Note that the SCS escape sequence doesn't work in the Linux text console [...]

You're absolutely right here.

I also realized I was wrong with PuTTY. Up to version 0.70 (which I tested) PuTTY didn't support line drawing in UTF-8 (as per Markus Kuhn's recommendation for UTF-8 being stateless). You either have to have a legacy charset, or version 0.71 with "Window -> Translation -> Enable VT100 line drawing even in UTF-8 mode". I now tried the latter, and it indeed converts the underscore to a space.

So it looks like Windows Terminal and VTE are the buggy ones here. I've just filed VTE 157.

Just for curiosity: Are you aware of any application which emits this? Why would any app do so, given that the regular space is also a space? :)

As for the choice of diamond character, I don't think the width is something that can be "fixed" in the terminal code. I believe the dimensions of an ambiguous width character are decided by the font.

I firmly disagree here. In terminal emulation, apps have to be able to print something and keep track of the cursor, whereas they by design have no idea of the font being used. In many terminals the font can also be changed runtime and it's absolutely not feasible to then rearrange the cells. In some other cases there is no font at all (e.g. the libvterm headless terminal emulation library, or a detached screen/tmux), or there are multiple fonts at once (a screen/tmux attached from multiple graphical emulators).

The only way to do that is via some external agreement on the number of cells, which is typically the Unicode EastAsianWidth, often accessed via wcwidth(). It's not perfect (changes through Unicode versions, has ambiguous characters, etc.) but is still the best we have.

glibc's wcwidth() reports 1 for ambiguous width characters, so the de facto standard is that in terminals they are narrow.

If the glyph is wider then the terminal has to figure out what to do. It could crop it (newer versions of Konsole, as far as I know), overflow to the right (VTE), shrink it (Kitty I believe does this), etc.

Originally posted by @egmontkob in #2049 (comment)

@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Jul 23, 2019
@DHowett-MSFT
Copy link
Contributor Author

From @egmontkob's note above, and from seeing how some other terminal emulators do this, it looks like this might be the correct choice. There's some affordances in certain projects for supporting "legacy" ambiguous character widths, but by and large terminals have agreed that they should be a single cell wide.

@DHowett-MSFT
Copy link
Contributor Author

And for what it's worth, here's what I get when I try it:

before after
image image
image image

@DHowett-MSFT DHowett-MSFT added Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Issue-Task It's a feature request, but it doesn't really need a major design. Product-Conpty For console issues specifically related to conpty Product-Terminal The new Windows Terminal. labels Jul 23, 2019
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Jul 23, 2019
@DHowett-MSFT DHowett-MSFT added this to the Terminal v1.0 milestone Jul 23, 2019
@zadjii-msft
Copy link
Member

@DHowett-MSFT how does this play with emoji? Aren't they usually ambiguous, but actually double wide?

@DHowett-MSFT
Copy link
Contributor Author

Nah, emoji are specifically double-width:
image

@egmontkob
Copy link

@DHowett-MSFT DHowett-MSFT removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Jul 29, 2019
@minonl
Copy link

minonl commented Aug 4, 2019

This is good approach, it seems to solve part of the unicode rendering issue, which might solve Chinese/double-width character issues, quite a lot emoji issues. but I wonder if it only solves some issues. as unicode 9 is soon a headache

@minonl
Copy link

minonl commented Aug 4, 2019

VS Code and hyper.js use xterm.js as terminal engine, as they are working on similar Unicode
handling solution here. They had a long history with only wcwidth-ish solution, and now UTS#51 is a big issues, especially missing Unicode 8/9(till latest 13) and Unicode modifier/sequence.

Also, iterm2 a popular terminal app on Mac OS made a lot changes years back to suppor Unicode.

Since terminal/console/wsl is system app, I hope a more mature and overall solution is discussed, proposed, reviewed and implemented for further extension. Current Unicode support is partial and kind of bugfix only

@reli-msft
Copy link

@DHowett-MSFT
Maybe you can make an option to run WT in “old far east application mode” to keep CP 932/936/949/950 compatibilty:

  • Characters width (count in cells) is identical to how many bytes used in these code pages;
  • For CP 932, turn \ and ~ into ¥ and ;
  • For CP 949, turn \ into ;

@DHowett-MSFT
Copy link
Contributor Author

No.

@DHowett-MSFT DHowett-MSFT reopened this Aug 9, 2019
@ghost ghost added the Needs-Tag-Fix Doesn't match tag requirements label Aug 9, 2019
@reli-msft
Copy link

@DHowett-MSFT
So keep all the weird CP things in CONHOST (V1)?

@DHowett-MSFT
Copy link
Contributor Author

Codepages have proven, almost without exception, to be an unmitigable disaster. They complicate the text buffer, they complicate the handling of DBCS characters, they provide little to no value in modern UTF-8-aware applications.

The codepage stuff will stay on the far side of ConPTY and be rendered to the terminal in nice good and clean UTF-8. 😄

@reli-msft
Copy link

@DHowett-MSFT Well what I mean is that, some far east console applications may assume that characters' width follows the code page byte count, so turning them into single-width may break these applications (though... you can still throw them into ConHost V1).

Another issue may include:

  • Characters like : Many fonts (afaik Pragmata Pro) will make it double-width since they are “complex”.

@DHowett-MSFT
Copy link
Contributor Author

DHowett-MSFT commented Aug 9, 2019

I get that, but to quote the initial post that spawned this issue:

I firmly disagree here. In terminal emulation, apps have to be able to print something and keep track of the cursor, whereas they by design have no idea of the font being used.

@reli-msft
Copy link

reli-msft commented Aug 9, 2019

@DHowett-MSFT
Hmmm, can we make use of OpenType tags?

  • If a text run is considered only having 0 or 1-cell characters, we apply hwid to them, so font makers can switch their glyphs to a narrower one.
  • For a text run considered only having 0 or 2-cell characters, apply fwid instead.

This is somehow like how UAX #50 works: Analyze runs first, then apply vert on upright runs and vrtr on rotated runs.

@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Sep 24, 2019
DHowett-MSFT pushed a commit that referenced this issue Sep 27, 2019
From Egmont Koblinger:
> In terminal emulation, apps have to be able to print something and
keep track of the cursor, whereas they by design have no idea of the
font being used. In many terminals the font can also be changed runtime
and it's absolutely not feasible to then rearrange the cells. In some
other cases there is no font at all (e.g. the libvterm headless terminal
emulation library, or a detached screen/tmux), or there are multiple
fonts at once (a screen/tmux attached from multiple graphical
emulators).

> The only way to do that is via some external agreement on the number
of cells, which is typically the Unicode EastAsianWidth, often accessed
via wcwidth(). It's not perfect (changes through Unicode versions, has
ambiguous characters, etc.) but is still the best we have.

> glibc's wcwidth() reports 1 for ambiguous width characters, so the de
facto standard is that in terminals they are narrow.

> If the glyph is wider then the terminal has to figure out what to do.
It could crop it (newer versions of Konsole, as far as I know), overflow
to the right (VTE), shrink it (Kitty I believe does this), etc.

See Also:
https://bugzilla.gnome.org/show_bug.cgi?id=767529
https://gitlab.freedesktop.org/terminal-wg/specifications/issues/9
https://www.unicode.org/reports/tr11/tr11-34.html

Salient point from proposed update to Unicode Standard Annex #11:
> Note: The East_Asian_Width property is not intended for use by modern
terminal emulators without appropriate tailoring on a case-by-case
basis.

Fixes #2066
Fixes #2375
@ghost ghost added the In-PR This issue has a related PR label Sep 27, 2019
DHowett-MSFT pushed a commit that referenced this issue Oct 15, 2019
From Egmont Koblinger:
> In terminal emulation, apps have to be able to print something and
keep track of the cursor, whereas they by design have no idea of the
font being used. In many terminals the font can also be changed runtime
and it's absolutely not feasible to then rearrange the cells. In some
other cases there is no font at all (e.g. the libvterm headless terminal
emulation library, or a detached screen/tmux), or there are multiple
fonts at once (a screen/tmux attached from multiple graphical
emulators).

> The only way to do that is via some external agreement on the number
of cells, which is typically the Unicode EastAsianWidth, often accessed
via wcwidth(). It's not perfect (changes through Unicode versions, has
ambiguous characters, etc.) but is still the best we have.

> glibc's wcwidth() reports 1 for ambiguous width characters, so the de
facto standard is that in terminals they are narrow.

> If the glyph is wider then the terminal has to figure out what to do.
It could crop it (newer versions of Konsole, as far as I know), overflow
to the right (VTE), shrink it (Kitty I believe does this), etc.

See Also:
https://bugzilla.gnome.org/show_bug.cgi?id=767529
https://gitlab.freedesktop.org/terminal-wg/specifications/issues/9
https://www.unicode.org/reports/tr11/tr11-34.html

Salient point from proposed update to Unicode Standard Annex 11:
> Note: The East_Asian_Width property is not intended for use by modern
terminal emulators without appropriate tailoring on a case-by-case
basis.

Fixes #2066
Fixes #2375 

Related to #900
@ghost ghost added Needs-Tag-Fix Doesn't match tag requirements Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. and removed In-PR This issue has a related PR labels Oct 15, 2019
@ghost
Copy link

ghost commented Oct 23, 2019

🎉This issue was addressed in #2928, which has now been successfully released as Windows Terminal Preview v0.6.2951.0.:tada:

Handy links:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Issue-Task It's a feature request, but it doesn't really need a major design. Needs-Tag-Fix Doesn't match tag requirements Product-Conpty For console issues specifically related to conpty Product-Terminal The new Windows Terminal. Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants