Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fish calculating prompt length incorrectly with Unicode emoji (☁️ - cloud) #10461

Closed
Advait-M opened this issue Apr 26, 2024 · 4 comments
Closed
Labels

Comments

@Advait-M
Copy link

Advait-M commented Apr 26, 2024

Metadata

Fish version: fish, version 3.6.1
OS: macOS Sonoma 14.3.1
Terminal: tried on default Mac Terminal, iTerm 2 and Warp

Seeing what I'd classify as "incorrect" behavior across all terminals, and this is specific to fish (compared against zsh/bash).

Summary

Fish seems to be incorrectly calculating the width of "☁️" i.e. cloud Unicode character, resulting in incorrect cursor positions across terminals.

Intro

Hey folks! I'm a Warp engineer that came across this bug when debugging an issue for Warp (I've been working on better custom prompt support). This led me down a rabbit-hole of font rendering and Unicode + how fish calculates widths for characters with wcwidth haha 😅.

Looking for some advice here and perhaps configuration levers for fish that I missed, when it comes to dealing with emoji widths (are there other variables I can modify here?)! Or whether this is something that can be tackled at the fish layer? Thanks!

What I have tried

I tried playing around with setting set -U fish_emoji_width X and set -U fish_ambiguous_width X for various combinations to try to fix this across the terminals below. These settings didn't seem to affect the terminals.

I also cloned Starship locally and ran my terminals against a local instance to confirm it was specifically the ☁️ character which caused issues.

As noted below, I repro'ed this issue without Starship, using a vanilla fish_prompt across these terminals.

Starship Repro

Specifically, Starship uses the ☁️ character in its popular gcloud prompt chip module. This chip is included by default in Starship. It seems like fish is incorrectly calculating the width of the cloud Unicode character. This doesn't cause issues in bash/zsh, presumably since they don't attempt to layout text in the same manner that fish does.

Interestingly, fish handles Powerline characters correctly (e.g.  - image below, which is 0xE0A0).
image

To repro - use this starship.toml:

format = """$gcloud"""

Behavior on default Mac terminal - note the 2 spaces between end of prompt and start of input, when we are only expecting 1 space.
image

Behavior on iTerm 2, again note the 2 spaces.
image

Behavior I see with local Warp instance (where I'm working on better prompt support). Note that the double character is due to Warp skipping certain cursor instructions being sent in-between lprompt/rprompt or just post rprompt. However, the underlying cause here is the same as the terminals above, I believe. Using the Hack font here.
image

Expected behavior (zsh on iTerm 2):
image

Minimal Repro

Note that I can repro this without Starship entirely, for a minimal repro. For example:

function fish_prompt
    echo -n "☁️1234567890" 
end

Behavior I see in default Mac Terminal (1 is entirely overwritten, but this seems more on the Terminal-specific side. Notably, seeing an extra space after the 0 here).
image

Note that more clouds results in more spaces:
image

Behavior I see with iTerm 2 (same as above).
image

Behavior I see with local Warp instance (extra space affects Warp's cursor position understanding here):

image

Expected behavior (zsh, default Mac terminal):
image

Conclusion

I'm happy to figure out fixing this on the Warp-side, if that makes most sense (I get the sense fish folks have looked at this a bit in the past, but it's a deep rabbit-hole) - but I wanted to flag this since I'm seeing, and ask for advice on how a terminal emulator should best tackle this, if it's on the terminal emulator side. Notably, in Warp, we treat this as a single cell character, rather than iTerm 2 or the default Mac terminal that treat the cloud as 2 cells wide.

Related issues

Related issues I was reading through:

@faho
Copy link
Member

faho commented Apr 26, 2024

Fish uses our widecharwidth script that parses the unicode data files to come up with a width representation.

Notably this includes "EastAsianWidth.txt", which lists:

2600..2604;N # So [5] BLACK SUN WITH RAYS..COMET

That means anything from U+2600 to U+2604 is classified as "N", which stands for "Neutral", which means they are "narrow" and occupy one cell. (tbh I don't really understand why "neutral" even exists, and TR11 is unhelpful here)

It is also listed as "Emoji" in emoji-data.txt, but not as "Emoji_Presentation". That means you can use the U+FE0F emoji variation selector to change it from text presentation (narrow because EastAsianWidth.txt applies) to emoji presentation (wide).

Fish will treat this combination as being two cells wide. This is also what I get when I copy your ☁️ here from the browser into my terminal - not just U+2601, but U+2601 followed by U+FE0F. This is also what I get from starship's default config (but again via a browser, it's always possible the U+FE0F was inserted there).

You can see what your fish thinks the width is with string length -V:

string length -V ☁️ \U2601 \U2601\UFE0F

This should print "2", "1" and "2" - if that cloud glyph is U+2601 U+FE0F. If it prints "1" "1" "2" the glyph is just U+2601.

This seems correct to me from what I know of Unicode and TR11 specifically. It's not a great document, and it's not meant for terminals specifically, but it is the best we have and fish's interpretation seems reasonable to me.

Fish's width handling isn't perfect either (notably it does not handle full grapheme clusters but only codepoint-by-codepoint with some hacks like for variation selectors), but the easy and medium difficulty cases usually work, and I would call this a medium case.


Note: The names of $fish_emoji_width and $fish_ambiguous_width are perhaps too simple. They affect very specific things:

  • fish_ambiguous_width affects all codepoints EastAsianWidth.txt classifies as "Ambiguous" width
  • fish_emoji_width is about a change that happened in Unicode 9: It declared that all codepoints with "Emoji_Presentation" should default to wide. So $fish_emoji_width defines whether fish should honor that (because the terminal does), and so it only affects all codepoints with Emoji_Presentation by default (listed in emoji-data.txt) that were introduced before Unicode 9 (because anything after that won't be supported by a program that doesn't support Unicode 9 anyway). It is not of much use anymore because Unicode 9 is pretty old by now.

Neither applies to U+2601 because it is classified as neutral and not Emoji_Presentation.


So, in summary:

Terminal.app and iTerm obviously misrender the text. They show ☁️ (which I assume is U+2601 with the emoji selector) as being two cells wide but then don't use the second cell.

Warp apparently ignores the emoji selector and draws U+2601 with the emoji selector as occupying one cell. I believe that is wrong and would therefore call this a bug in Warp.

Note that the zsh comparison doesn't help much because zsh doesn't care about the width at all here, it just prints the prompt without repositioning. The issue comes up because fish repositions the cursor to do syntax highlighting, suggestions, right prompt handling, etc.

Our guidance: Please draw U+2601 alone into one cell, and U+2601 U+FE0F into two cells.

@faho faho added the question label Apr 26, 2024
@faho
Copy link
Member

faho commented Apr 26, 2024

So, since I just tested this in 6 more terminals and 5 failed the same test (with various failure modes), here are some quotes from Unicode TRs to support our reading:

From TR51:

default emoji presentation character — A character that, by default, should appear with an emoji presentation, rather than a text presentation.
[...]
These characters have the Emoji_Presentation property.

[...]

default text presentation character — A character that, by default, should appear with a text presentation, rather than an emoji presentation.
[...]
These characters do not have the Emoji_Presentation property; that is, their Emoji_Presentation property value is No.

So a character either has default emoji or text presentation. U+2601 does not have Emoji_Presentation so it defaults to text presentation.

emoji presentation selector — The character U+FE0F VARIATION SELECTOR-16 (VS16), used to request an emoji presentation for an emoji character.

So U+FE0F can be used to get emoji presentation for something that otherwise has text presentation.

emoji presentation sequence — A variation sequence consisting of an emoji character followed by a emoji presentation selector.

So U+2601 U+FE0F is an emoji presentation sequence (it's also listed in the corresponding file).

From TR11:

East Asian Wide (W): [...] This category includes [...] characters that have the [UTS51] property Emoji_Presentation, with the exception of characters that have the [UCD] property Regional_Indicator

So anything that has Emoji_Presentation is "wide".

emoji presentation sequences behave as though they were East Asian Wide, regardless of their assigned East_Asian_Width property value.

So U+2601 U+FE0F is wide.

Neutral (Not East Asian): All other characters. Neutral characters do not occur in legacy East Asian character sets.

So, since U+2601 is text presentation by default, the emoji stuff doesn't apply, so the "Neutral" from EastAsianWidth.txt applies.


I think that should answer your question?

@Advait-M
Copy link
Author

Awesome - thank you @faho, and thanks for digging into this! Just wanted to ack these comments in the interim - I'm working through reading up on this world and understanding this more deeply 😅.

I'll likely have some follow-up questions, but this is super helpful! 🙌

@Advait-M
Copy link
Author

Advait-M commented Apr 26, 2024

Updates:

  • Got a better handle on how variation selectors work 👍
  • unicode-width, the crate we use for helping calculate widths for Unicode characters, very recently added emoji presentation support (literally 3 days ago - Support emoji presentation sequences unicode-rs/unicode-width#41 haha) 🔥.
    • Not in a release cut yet but I confirmed this fixes the width calculation for "\u{2601}\u{FE0F}" (previously 1, now gives correct result of 2).
  • Figured out a path forward on the Warp-side with how we should handle the full-width character followed by zero-width character to ultimately lead to a double-width character (on the rendering + spacing calculations side). We've got a rough prototype working here - I'm gonna continue working on this, which should actually help us better support Unicode emojis more broadly in Warp (outside of the fish use case too) 🙌
    • There's both the rendering + width calculations piece of this for us to tackle correctly, as you mentioned.

Really appreciate your help on this @faho and I might have some follow-ups!

Also, curious, which terminal succeeded when you tested this in 6 mentioned above haha? Hopefully Warp will be another one to add soon 😄 !!

EDIT: for anyone wondering, I believe the terminal that handles this correctly is Kitty! But Warp is coming soon!

@faho faho closed this as completed Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants