-
-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dual text/emoji presentation characters change width with VS15 (U+FE0E) and VS16 (U+FE0F) #3998
Comments
This is by design. wcwidth() is utterly broken. Any terminal or terminal Fish needs to be fixed to use the actual unicode standard for character |
Using In any case, the use of If Kitty insists on allowing VS15 and VS16 to actually change the width of the previous character, then it disagrees with other terminal emulators and demands that all CLI tools that need to measure strings have two separate implementations and selects between them based on I will admit that it's already not possible to do this accurately in all cases as programs do not know if e.g. the terminal emulator supports emoji fitzpatrick modifiers (Terminal.app does, Alacritty doesn't). But that's much more of an edge case than using these dual text/emoji characters. For example, FWIW Fish has already considered switching to a Question: Are there any other terminal emulators that you know of that matches Kitty's behavior of having U+FE0E/U+FE0F actually affect layout and not just glyph rendering? |
I'm afraid that your argument, which is, X terminal emulators do it The width of text cannot be determined a character at a time. That is The only way terminal programs and terminal emulators can agree on And yes unicode does change over time, however the widths of 99% of |
Then users of Kitty will experience broken text width calculations by any program which relies on being able to calculate text, unless you can convince the program authors to special-case Kitty. The width of text cannot be determined a character at a time. That is simply reality. Terminal programs need to face that reality. We dont live in an ASCII only world anymore. It pretty much can, if you define it to be so. Most combining characters do not affect text width. The ones that do can typically just be considered to have an intrinsic width of 1 (or 2 in certain cases, such as U+0DDE SINHALA VOWEL SIGN KOMBUVA HAA GAYANUKITTA). The dual text/emoji characters are the odd ones out, in which the VS16 causes the character to grow if and only if it's one of a set of text presentation characters, and VS15 causes the character to shrink if and only if it's one of set of Emoji_Presentation characters. For these dual emoji/text characters, other terminals have decided the solution is to treat VS15 and VS16 as having no effect on layout, just on glyph selection, and to have these dual text/emoji characters have a width of 1 or 2 depending on the Emoji_Presentation property. That solution is predictable, maintains layout between terminals, and is very straightforward for CLI programs to handle. Really the place where things get weird is when you get into ZWJ emoji sequences or things like fitzpatrick modifiers. For example, Alacritty doesn't seem to support combining characters at all for some reason, so it renders [U+1F44D U+1F3FB] (👍🏻) as 👍🏻 instead. I'm not sure what a portable solution for this is; it does not seem right for terminals to explicitly avoid supporting things like fitzpatrick modifiers, or ZWJ sequences (e.g. my terminal should not choose to not support 🏳️⚧️). However, these are cases where characters that otherwise have widths are joined together in a sequence that causes the width to be discounted, versus VS15 and VS16 which have no intrinsic width at all and are considered combining characters and yet Kitty gives them conditionally width or even negative width. Speaking of 🏳️⚧️, Kitty actually has the worst behavior here in that it renders the glyph in 2 columns (as is expected) and yet it considers it to take 4 columns. Which means Kitty's own width calculations don't understand the ZWJ sequences that its text renderer handles. This is not a case where all terminals agree though; Terminal.app and iTerm2 both give it 1 column (and render it correctly), whereas Alacritty gives it 2 columns (but doesn't render it correctly, it renders 🏳 and ⚧ separately, though both only get one column). Kitty's behavior is most likely explained as it thinks it's rendering as 🏳⚧, which would take 4 columns, but it actually renders as 🏳️⚧️.
Unicode does not define the width of characters for terminal emulators. It can be used to classify characters into "narrow" or "wide" but that's it, and even that narrow/wide split is really just for CJK stuff. The only thing we have that's even close to standard right now is "look what the other terminal emulators do, and copy that when there seems to be consensus". In this case, the consensus is that VS15 and VS16 have no effect on layout. |
On Fri, Sep 10, 2021 at 10:46:00PM -0700, Lily Ballard wrote:
> I'm afraid that your argument, which is, X terminal emulators do it wrong, therefore everyone must continue to do it wrong forever, holds absolutely no water with me.
Then users of Kitty will experience broken text width calculations by any program which relies on being able to calculate text, unless you can convince the program authors to special-case Kitty.
Let me worry about that.
The width of text _cannot_ be determined a character at a time. That is simply reality. Terminal programs need to face that reality. We dont live in an ASCII only world anymore.
It pretty much can, if you define it to be so. Most combining characters do not affect text width. The ones that do can typically just be considered to have an intrinsic width of 1 (or 2 in certain cases, such as U+0DDE SINHALA VOWEL SIGN KOMBUVA HAA GAYANUKITTA). The dual text/emoji characters are the odd ones out, in which the VS16 causes the character to grow if and only if it's one of a set of text presentation characters, and VS15 causes the character to shrink if and only if it's one of set of Emoji_Presentation characters.
No, it pretty much cannot, as the rest of your own post amply
demonstrates.
For these dual emoji/text characters, other terminals have decided the solution is to treat VS15 and VS16 as having no effect on layout, just on glyph selection, and to have these dual text/emoji characters have a width of 1 or 2 depending on the Emoji_Presentation property. That solution is predictable, maintains layout between terminals, and is very straightforward for CLI programs to handle.
And also wrong, and leading to text layout that differs between
terminals and every other program on the planet just because a handful
of terminal authors are too lazy to do it right.
Really the place where things get weird is when you get into ZWJ emoji sequences or things like fitzpatrick modifiers. For example, Alacritty doesn't seem to support combining characters at all for some reason, so it renders [U+1F44D U+1F3FB] (👍🏻) as 👍🏻 instead. I'm not sure what a portable solution for this is; it does not seem right for terminals to explicitly avoid supporting things like fitzpatrick modifiers, or ZWJ sequences (e.g. my terminal should not choose to not support 🏳️⚧️). However, these are cases where characters that otherwise have widths are joined together in a sequence that causes the width to be discounted, versus VS15 and VS16 which have no intrinsic width at all and are considered combining characters and yet Kitty gives them conditionally width or even negative width.
kitty does not give them width, kitty gives strings width. And VS15/16
change the width of strings. Deal with it.
Speaking of 🏳️⚧️, Kitty actually has the worst behavior here in that it renders the glyph in 2 columns (as is expected) and yet it considers it to take 4 columns. Which means Kitty's own width calculations don't understand the ZWJ sequences that its text renderer handles.
You are most welcome to open a bug report for it. Though its likely to
be a duplicate of #3810
This is not a case where all terminals agree though; Terminal.app and iTerm2 both give it 1 column (and render it correctly), whereas Alacritty gives it 2 columns (but doesn't render it correctly, it renders 🏳 and ⚧ separately, though both only get one column). Kitty's behavior is most likely explained as it thinks it's rendering as 🏳⚧, which would take 4 columns, but it actually renders as 🏳️⚧️.
> The _only_ way terminal programs and terminal emulators can agree on widths is if they both follow some standard. The only standard available is unicode.
Unicode does not define the width of characters for terminal emulators. It can be used to classify characters into "narrow" or "wide" but that's it, and even that narrow/wide split is really just for CJK stuff. The only thing we have that's even close to standard right now is "look what the other terminal emulators do, and copy that when there seems to be consensus". In this case, the consensus is that VS15 and VS16 have no effect on layout.
Unicode defines how text should be clustered into graphemes and it
defines the nature of those graphemes. East asian and emoji presentation
graphemes are rendered with width two in terminal emulators. VS15/16
change the emoji presentation nature of graphemes as per the unicode
standard, ergo they change the width those strings.
And no a bunch of terminal emulators that dont bother adressing a
question and do the easiest thing are not a consensus, they are
simply a bunch of people that havent thought about the problem.
You want to claim there is some consensus about this issue, point to
some actual discussion of it that leads to an actual intention to do
something, by an actual group of terminal developers/spec body.
Most of these terminals were written before VS15/16 existed and simply
have never been updated. And my goal with kitty is to move this shitshow
of an ecosystem forward. And forward in this instance means getting
terminal text rendering into the 21st century.
|
If you're deliberately attempting to influence the handling of emoji in terminals, then you really need to document the behavior Kitty has that's intentional, the behavior that is a bug (such as the incorrect layout of ZWJ sequences), the goal here (i.e. "Kitty behaves differently than other emulators for these reasons"), and the fact that you think other emulators should match Kitty. And then this document can be taken and presented to other terminal emulator developers (whether by you or by users who agree with you). In the absence of this document, it's impossible to tell where you've made a decision to behave differently versus what's a bug, it's rather difficult for other emulators to match Kitty. All I can find right now is a FAQ entry that says
This isn't quite accurate. Unicode does not concern itself with layout in most cases, and does not concern itself with terminal cells either. AFAICT the only place it cares about actual horizontal text layout (as opposed to grapheme/word/line breaking) is UAX #11 East Asian Width, which defines the concept of "narrow" and "wide" characters for East Asian text. Digging through this, I finally found the spot in UAX #11 §5 Recommendations that says
This is finally an explicit recommendation that says the sequence In any case, a document as described above that explains the way in which Kitty handles layout for emoji and how it intentionally differs from existing terminals would be very much appreciated. This would not just be something that could be given to other terminal emulator developers, but also used by CLI tools (such as Fish) that need to measure string width. |
Oh hey, UAX #11 §2 Scope also says
So this is explicitly addressing the terminal emulator case as saying that the behavior I've described of other emulators is not necessarily wrong. |
On Wed, Sep 15, 2021 at 03:57:12PM -0700, Lily Ballard wrote:
> And my goal with kitty is to move this shitshow of an ecosystem forward. And forward in this instance means getting terminal text rendering into the 21st century.
If you're deliberately attempting to influence the handling of emoji in terminals, then you really need to document the behavior Kitty has that's intentional, the behavior that is a bug (such as the incorrect layout of ZWJ sequences), the goal here (i.e. "Kitty behaves differently than other emulators for these reasons"), and the fact that you think other emulators should match Kitty. And then this document can be taken and presented to other terminal emulator developers (whether by you or by users who agree with you). In the absence of this document, it's impossible to tell where you've made a decision to behave differently versus what's a bug, it's rather difficult for other emulators to match Kitty.
It's on my TODO list, contributions are most welcome.
|
Hey guys, I do not want to artificially revive a dead kitten, but there are a few things to say. So... Let's go: Kitty TE is NOT alone. Contour does support correct handling of grapheme clusters including ZWJ and VS15/VS16 too with the same intuitive (my opinion) reasons as the Kitty author does. This may not be to your liking and I apologize for that. But the trend (thank god for that) recently seems to lean toward proper support for correct handling, even though it means that the road is bumpy. It used to be much more bumpy in the past already. All the kudows to Kitty in trying to break out of the (what I think as well) is pure lazyness, but most importantly, fear of Unicode. And I cannot blame them. I struggled myself, especially if you come in to this subject with zero prior knowledge, it feels (don't take me by word) like you need 3 PhD's in order to fully understand Unicode. In the end, I'm glad my terminal isn't the only one trying to move forward here. Even if Kitty isn't perfect in that regard yet (e.g. ZWJ handling and cursor placement for at least that gender flag), that's all fixable and I can welcome everybody to contribute to any project to their liking that would benefit from improving Unicode support.
Precisely. But what is
Let me check what handles VS16 (on
Which failed
I strongly disagree. VS16 forces emoji presentation on what would probably be emoji text presentation by default otherwise. And emoji presentation is recommended (not mandatory, I know) to be rendered in square, which conveniently maps to 2 grid cells in the TE.
I agree. And I'm sorry to state that doing it right actually does imply much more work. I actually strongly agree with @kovidgoyal here that the old age of ASCII is over and we should start thinking forward. The "we've always doing it like that"-argument doesn't taste well in my mouth either :-)
Actively developed / maintained programs (TEs as well as client apps) that are affected by textual complex input seem to have proven themself already in trying to be as up-to-date as possible with the Unicode standards. I'm for example getting Unicode update PRs ahead of the actual release of any Unicode standard (happened to 15 as well as 14 for me). I don't think that width is anything we should be severely concerned about, even though it's not guaranteed to be 100% stable for a decade or more. :)
This sounds narrow visioned. I'd suggest to definitely keep thinking forward here (especially with the list of terminals I gave you earlier :) ). What you could do (even much more accurate and future proof) is to detect at startup how the connected TE is treating cursor placement. Just write one of those emoji in question, call CPR, and then deal with the heuristics.
Alacritty doesn't support anything at all about grapheme clusters simply due to the always same reason: potential performance degradation. I just checked the source code, it seems to be aware of multiple codepoints per grid cell but only appends to the previous cell iff the codepoint to be written has a width of 0. This may change in the future, who knows. ;)
Yeah that sounds like a bug. printf "[\U0001F3F3\uFE0F\u200D\u26A7\uFE0F\U0001F3F3\uFE0F\u200D\u26A7\uFE0F]\n" TEs I tested that do this correctly:
iTerm and Terminal.app render correctly but position the cursor as if the flag would be of width 1. Finally, I can only welcome you (not sure how much you are affiliated with the Fish shell) and the Fish shell developers to properly implement grapheme cluster segmentation, probably do some heuristic at process startup to detect how width is treated and be done with it. Users using Alacritty know how broken it is with respectg to complex grapheme clusters, they will not be using them until they switch terminals. That's not really an issue (same for other terminals). I can generally just recommend you to think forward and not be stuck in the "good old times". For reference, I was once trying to formalize how terminals and Unicode could live together more peacefully in the future (again: I'm not talking about xterm-aged terminals here) because I still to fully believe that at least for the most common problems we're facing today we can actually pretty safely solve. The biggest problem however are the developers holding back as they all seem to be afraid due to the implied complexity. (My opinion here!). and p.s.: If any terminal claiming to conform to the modern Unicode world but doesn't do so in some corner case, I'd rather file a bug or even PR to get that fixed rather than living with the broken world. (my opinion). Have a nice day, |
Describe the bug
Kitty allows U+FE0E (Variation Selector-15) and U+FE0F (Variation Selector-16) to affect the width of the preceding character if it's a dual text/emoji presentation character. This seems technically correct, but it's hard to predict by other tools, as the typical model of terminal emulator layout among the terminals I've tested has been based on calculating the width of each character in isolation (e.g. using
wcwidth()
).I am unsure how the classic Linux terminal emulators handle this, as I'm on macOS, but I've tested macOS Terminal.app, iTerm2, Visual Studio Code's integrated terminal, and Alacritty. All four of these terminals have the following behavior:
Emoji_Presentation
property are classified as emoji and have width 2Emoji_Presentation
property) have width 1Some of these terminals ignore VS15 and VS16 entirely, forcing text presentation on any dual text/emoji characters, but still count their width as 2. Some of these terminals allow VS15 and VS16 to control whether the character renders as text or emoji, but the character still retains its original width. This does mean that e.g. printing the sequence [U+26A0 U+FE0F x] will render the emoji warning sign with the "x" overlapping its right half.
Kitty behaves differently. It allows VS15 and VS16 to affect the previous character, including that character's width. U+26A0 (⚠︎) takes one column by default, but [U+26A0 U+FE0F] (⚠️ ) takes two columns. Similarly, U+26A1 (⚡️) takes two columns by default, but [U+26A1 U+FE0E] (⚡︎) takes one column.
The problem with this is that this behavior is hard to predict by CLI tools. I'm not sure what tools like tmux do for width here, but what I have been testing is Fish's behavior, as Fish needs to know string widths for prompt reasons. Fish does not currently have the correct behavior (see fish-shell/fish-shell#8276), but fixing Fish requires being able to predict how the terminal emulator handles character widths.
From a user standpoint, Kitty's behavior is nice in that text characters have width 1 and emoji have width 2, regardless of the specific unicode details. However, behaving like Kitty is complicated and requires processing text with a state machine (or
wcswidth()
, if it can actually be trusted). It also potentially adds more of a dependency on Unicode version support (both in the CLI tool, and in guessing what the Terminal knows). We already do have the issue where newly-assigned emoji aren't known to be emoji without that Unicode support (though hopefully Unicode support is handled via a shared library such that the CLI tool and terminal emulator agree), but it introduces the possibility that emoji presentation variants added to existing text characters would cause width differences when paired with U+FE0F, which may be more of an issue.As user-friendly as Kitty's behavior is, having predictable width calculations is really important for a variety of tools. Tools that use alternate screen mode can potentially set the position for every character independently (e.g. Vim seems to do this), but they also may not (e.g. tmux seems to defer to the terminal for character positioning), and tools that don't use alternate screen mode (e.g. Fish) do not generally have precise control over layout. The standard model here is calculating width on a per-character basis (e.g. with
wcwidth()
) and the terminal's layout model should match that.Additional context
I've tested Terminal.app, iTerm2, Alacritty, VSCode's integrated terminal, and Kitty (all on macOS). The first four all calculate widths the same, Kitty is the only odd one out. I am unsure of what other terminals I can test on macOS, and I don't have easy access to Linux. I did managed to test LXTerminal, the default terminal on Raspberri Pi, and it also matches the behavior of the non-Kitty terminals (at least up through Unicode 11, it doesn't seem to recognize any Unicode 12 characters as emoji, I'm unsure how to test if this is true for the whole OS or is a property of the terminal emulator).
The text was updated successfully, but these errors were encountered: