-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode modifiers break width calculations #8276
Comments
There is no way for us to detect this, sorry. The terminal is wrong. Pick a different character.
This is correct, emoji are wide. "emoji_width" is a bit of a misnomer - it specifies if we should use the unicode 8 or 9 widths. |
@faho This is a very offputting response. I detailed exactly what’s going on here, and you’re just saying “sorry, the terminal emulator is wrong”. Even ignoring the terminal emulator’s handling of this codepoint as 1-wide instead of 2-wide, Fish is still wrong in that it treats the presence of variation selector-16 as meaning an emoji that is 2 columns wide, even when fish is otherwise configured to assume emoji are 1 column wide. So at the very least, Fish should be doing something like “variation selector-16 counts as But beyond that, it should not be unreasonable to say that Fish should have a heuristic for determining whether variation selector-16 expands codepoints to columns for a given terminal. Fish has Terminal-based heuristics for “how wide do we think emoji are”, so this is not breaking new ground here. I do think Terminal.app’s behavior here is buggy, but it’s also consistent with “the terminal emulator doesn’t know the specific details about unicode rendering, it just knows that emoji codepoints are 2 columns and others are 1” and relies on the OS to do the actual details of rendering the text. This simplistic model of emoji rendering is something I would not be at all surprised to find other terminal emulators reproducing. Really, any emulator that relies on the OS to do the actual details of rendering the glyphs instead of implementing it directly is one that I would expect to implement a simplified model like this. |
The fact is, a very popular cross-shell prompt solution ( So the net effect of saying “this is the Terminal’s bug, we won’t do anything about it” is to make fish feel broken for users of |
See my edit: "emoji_width" is a bit of a misnomer - it specifies if we should use the unicode 8 or 9 widths. In other words it only affects those emoji that were specified as narrow in unicode < 9 and wide after. It's a compatibility hack for old terminals, not a configuration knob.
and a buggy terminal. We can't, in general, work around all terminal bugs. Keeping a database of all codepoints that terminals misrender (and correctly detecting all those terminals) is infeasible. Sorry. The only feasible solution I see is for starship to pick a different character. |
Just to be clear:
It is. Because this is a much larger thing. Now we would need a per-terminal database of mistreated codepoints (and a good detection of those terminals and their versions, and the version it's fixed in, which is often impossible!). That's not in the same ballpark as “how wide do we think all emoji, as a category, are”. |
That's still a fair number of emoji.
This is a wild mischaracterization of what I'm asking for. I have not at all suggested keeping a database of characters. The fact is, right now Fish has code that explicitly says "Treat variation selector-16 as a width of 1 so that way we end up with an emoji width of 2". It does this no matter what the preceding character is. For Terminal.app, this will be wrong 100% of the time. For Terminal.app, the right answer is always "treat variation selector-16 as a width of 0". Heck, the "treat it as 1" screams "compatibility hack" because variation selector-16 is typically a zero-width character. It modifies the previous character, and for 99.9% of preceding characters, the modification has no effect. Fish's behavior is also wrong across the board for all terminals when used on characters that typically default to emoji presentation but have a text form, such as U+26A1 (⚡). These characters typically show up as emoji, but can show up as text in some contexts (for example, the GitHub comment compose textarea). Adding Variation Selector-16 will force it to emoji presentation, but should not affect the width. Fish correctly identifies U+26A1 as having width 2, and Terminal.app even assigns it width 2 despite defaulting to text presentation. Adding Variation Selector-16 does not affect the width used by the terminal, just the glyph, and yet Fish thinks Oh, and here's a fun fact: while writing this up, I downloaded iTerm2.app to test, and it has the exact same behavior as Terminal.app with regards to Variation Selector-16 widths (it differs in defaulting to emoji presentation for U+26A1 but otherwise has no actual differences in widths for these character I'm testing). I don't know about other terminal emulators, but both major terminal emulators on macOS agree: Variation Selector-16 always has width 0. Fish is strictly in the wrong here. |
If you are asking for us to work around specific terminals misrendering specific characters, that's tantamount to asking for a per-terminal quirks database. If we can find a way to make this independent of a terminal, sure, it's not. If we can find a way to treat entire classes of characters differently, that's also much simpler. But we'd have to figure out which classes that are.
To be clear: It is. Yes. We should keep the context.
Fixing that would require, again, switching to wcswidth - that's #8275 (this is the issue with filing multiple connected bug reports at the same time - I prefer keeping them in one place and then deciding where it should be split up).
Where do you get that "should" from? My experience (and e.g. #5583) says otherwise, but I'm happy to be corrected on that. If we can assign a width of 0 on that it would fix the issue for now. (but if we actually need the context for what codepoint the VS applies to, that's #8275 again)
It has a width of 3 here in Windows Terminal. Without the VS it has a width of 2. Which means it's terminal-specific again.
Or both terminals are wrong in the same way. Which would also happen if the bug is in the underlying text rendering - which is even less fixable because we don't even have a version of that (also why we can't handle font differences - we have no information about the font). |
No it won’t, as it seems that fish’s behavior is wrong in all contexts, not just this one (in fact, in the context of “following anything other than a codepoint with both text and emoji presentation” it’s obviously wrong as variation selector-16 does nothing in other contexts (and naturally has a zero width).
Manual testing in Terminal.app and a bit in iTerm2. I took a look at #5583 and it’s a little difficult to figure out what it’s trying to say. The original asciinema demonstrates some input issues, but I don’t know if that’s the same emoji width calculation issue or not as I’ve been focused on non-interactive testing (e.g. fish’s idea of how wide a string printed to the terminal is) to avoid any potential confounding issues in interactive input handling. Skimming the conversation I see discussion of emoji ZWJ sequences, which are a separate issue and probably not solvable without cooperation from the terminal emulator. And there was a mention of 🛠 and 🐛 having different widths, which very well could be this bug, but I’m on my phone right now and can’t look up the details on these characters.
My current belief, based on Terminal.app and iTerm2, is that this is the correct solution.
Windows Terminal thinks VS16 takes up a cell all by itself? Sounds like a terminal bug. It strictly modifies the previous character, it does not have an intrinsic width. In fact, now I’m curious if it’s literally classified as a combining character. Again, I’d look it up but I’m on my phone. |
Like I said: It's not. Possibly on macOS, but I've seen multiple terminals, in multiple contexts, handle it differently. And we've introduced this behavior because it fixed it in some cases, so it can't be "wrong in all contexts". It did fix problems. Okay, so this looks more and more like we'd introduce one quirk "variation selector adds nothing". That can be done and handled, unlike "terminal won't combine it with these specific codepoints". |
Honestly, it sounds like this should actually be "variation selector has non-zero width", as that sounds like the buggy behavior. Incidentally, the VSCode integrated terminal also counts it as zero width. |
Same with Alacritty. U+FE0F is zero width. Kitty has different behavior. It treat U+FE0F as zero width in most contexts, but it modifies text presentation characters to render as emoji with width 2 (but when applied to Emoji_Presentation characters it has zero width as those already have width 2 to begin with). Which is to say, Kitty's behavior cannot be handled with wcwidth(), it requires wcswidth(), and also I'm inclined to file an issue against them about how this behavior diverges from other tested emulators and is harder to predict by CLI tools. |
Honestly, unless we can point to some standard, I don't think we can claim either way. Because characters in "emoji presentation" being of width 2 even if the text version has width 1 makes sense. You seem to believe that Apple is correct by default, and I really really cannot agree with that. So: Whatever sounds nicer as a variable name in the code. If we can avoid e.g. a double-negative by turning it around? Let's do that. |
The characters with Emoji_Presentation have width 2 even when rendered as text, in all terminals I've tested except Kitty. In fact, for Kitty, adding VS15 to an Emoji_Presentation character changes it to width 1, which is not something Fish can handle (not without wcswidth()). No other terminal I've tested has this behavior.
No, this is not about Apple being correct by default, and nothing I've said should lead you to that conclusion. It's about how VS16 has an intrinsic width of zero. In typical text rendering contexts, if it follows a character with both emoji and text presentation, it forces the emoji presentation. This usually changes the width of that character, but additional VS16 characters tacked on still have width zero. And in terminal emulators, where predictable width is important and is typically calculated on a character-by-character basis, the most obvious behavior is to have VS16 have zero width in all contexts and to not modify the width of the preceding character. Terminal.app, iTerm2, VSCode's integrated terminal, and Alacritty all seem to agree here. VS16 has no width and does not affect the width of the preceding character. Characters with Emoji_Presentation have a width of 2 even if they're rendering in text presentation. So far Kitty is the only terminal I've tested that disagrees, and it still thinks VS16 has zero with in most contexts, and it also introduces behavior for VS15 that Fish cannot possibly support without wcswidth(). You suggested behavior of Windows Terminal where U+FE0F always has width 1. This seems really broken. I would like to confirm this behavior though. My usual tests here have been with U+26A1 (which has Emoji_Presentation plus a text form) and U+26A0 (which has text presentation plus an emoji form), so I'm echoing various combinations of
I don't care what we call it in code, I just care what it looks like when exposed to the user. |
It's a consequence of the model of the |
This would only barely be exposed to the user. It's like $fish_emoji_width, a variable you never want to have to touch. (and all your messing around here with $fish_emoji_width was in vain! you don't want to touch it, it was already correct before. If we had named it less appealing things would have been better!) Ideally, this would not even exist! Calling it "$fish_variation_selector_hack" would work. Or "$fish_vs16_widens"? |
Sure, either one works. The default behavior should be to treat VS16 as having zero width though, barring any heuristics for detecting terminal behavior (e.g. testing Incidentally, I just tested LXTerminal (the default terminal on my Raspberry Pi) and it agrees with Terminal.app et al. So far Kitty is the only terminal I've tested with different behavior, which I just filed kovidgoyal/kitty#3998 for. |
I'm with @kovidgoyal on this: attempting to assign a width to a specific character is a fool's errand. Unicode codepoints in-and-of themselves don't have a width, only strings composed of those codepoints can be assigned a width. Anything other than that (including what we do here in fish) is just a hack w/ the intention of getting as many common inputs right and you shouldn't navel gaze at it too long for fear of falling into the abyss. Bike-shedding over the name of a variable isn't going to change the fact that the approach itself is fundamentally wrong but until there's some way for the shell + the terminal + the OS or text renderer to agree on the width of a string (taking into account not only its components but also the font's support for the desired glyph) the situation isn't going to change. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I feel like most of these problems could be solved with an option to use |
There's a standard that lists all possible/recognised unicode code point combinations, though at times they get kinda long, like this one: https://www.emojiall.com/en/code/1F9DC-1F3FF-200D-2640-FE0F Which makes me wonder what the range the scalar The official list is here, yet is lacks the black cat (cat+zwj+black square): |
The best list is https://unicode.org/Public/emoji/15.0/emoji-test.txt |
fish 3.3.1
Also reproduced on latest master (3.3.1-288-g139b74d8e)
macOS 11.5.2 (20G95)
macOS Terminal.app renders most emoji as 2 characters, but emoji created using Variation Selector-16 (U+FE0F) are still rendered as 1 character. Unfortunately, fish treats this as 2 characters (in particular, it treats Variation Selector-16 as 1 character, with a comment saying this is equivalent to treating emoji as 2).
There's actually two issues here:
The first is that the handling of Variation Selector-16 assumes an emoji width of 2, even when
$fish_emoji_width
is set to1
or when the guessed width is 1.The second is that macOS Terminal.app does not treat emoji created by Variation Selector-16 as a width of 2 even though they visually render in 2 columns. It appears that Terminal.app simply treats Variation Selector-16 as having a width of zero:
I do not know how other terminals handle this problem.
This issue is affecting the default output of
starship
when thestatus.pipestatus
config flag is set totrue
, as it uses ✔️ in the output. This is causing fish to miscalculate the column to start input at, causing it to appear as though there are spurious extra spaces in between the prompt and the input position.The text was updated successfully, but these errors were encountered: