Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emoji issues in Fish 3.0 #5583

Closed
jerr0328 opened this issue Jan 24, 2019 · 42 comments
Closed

Emoji issues in Fish 3.0 #5583

jerr0328 opened this issue Jan 24, 2019 · 42 comments
Labels
bug Something that's not working as intended
Milestone

Comments

@jerr0328
Copy link
Contributor

Fish version: 3.0.0
macOS Mojave: 18.2.0 Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 x86_64
iTerm2 Build 3.2.7, "Use Unicode 9 widths" unchecked
Ran without customizations (sh -c 'env HOME=$(mktemp -d) fish')
Asciinema: https://asciinema.org/a/O8w4MnrYyj95MHQlXrtbdZatm

Emojis are still causing issues. #2652 claims this issue is fixed, but different combinations of fish_emoji_width (none, 1, 2) seem to affect the behavior without really fixing the problem. In Fish 2.7 I seemed to have avoided the problem with the iTerm workaround, but after updating to Fish 3.0, it started breaking. I can set the emoji width to 1 or 2, but it only works on some emojis (e.g. with 🛠and 🐛, one breaks and the other is ok when setting the value to 1, and the breakage swaps when I change the value). Using a newer emoji (e.g. 🥳) also causes a lot of issues. I have tried as well the ambiguous width values, but it also seems to cause the same random issue.

I installed Fish via brew.

@faho
Copy link
Member

faho commented Jan 24, 2019

"Use Unicode 9 widths" unchecked

Check that, set $fish_emoji_width to 2, set $fish_ambiguous_width to whatever your terminal says - 2 for wide, 1 for narrow.

@faho faho added the question label Jan 24, 2019
@jerr0328
Copy link
Contributor Author

With those settings: https://asciinema.org/a/wq8gSqYoe0TIovNHPjRMJUfvR?t=20
I also tried with both 2 and 1 for ambiguous width, did not help with the 🛠 character at all.

@faho
Copy link
Member

faho commented Jan 24, 2019

Soo.. I'm not sure if you noticed or it its the same for you, but your 🛠️ "character" doesn't show as a character here. It shows as an image. So I can't copy it to figure out which character it is. Googling "hammer and wrench emoji" sends me to https://emojipedia.org/hammer-and-wrench/, which explains that it is:

Codepoints
🛠 U+1F6E0
️ U+FE0F

and indeed, copying that one gets me 2 codepoints in the terminal (as confirmed by od -t x1z or string length). The latter is a variation selector. Which applies to another codepoint to change it somehow.

Which we already mention in our code:

    // Check for VS16 which selects emoji presentation. This "promotes" a character like U+2764
    // (width 1) to an emoji (probably width 2). So treat it as width 1 so the sums work. See #2652.
    const int variation_selector_16 = 0xFE0F;
    if (wc == variation_selector_16) return 1;

So together with U+2764, a U+FE0F variation selector "upgrades" it to an emoji. So both together have a width of 2. The former separately has a width of 1, so we give U+FE0F a width of 1 as well.

Now this hammer_and_wrench thingy (🛠) together with U+FE0F forms 🛠️. Which on my system looks absolutely identical. Not just the same width, the same actual symbol.

So there's no change in width. So the U+FE0F would in this case have to have a width of 0.


So, the easiest thing for you to do is to just use a different character. Maybe echo \U1F6E0 | pbcopy gives you a hammer_and_wrench without the variation selector that just works.

What we should do, I have no clue. That bit above was from our version of a function called wcwidth that has this quaint notion that characters are independent and have a certain width. So it's called with just one character (well, "codepoint" would probably be the correct term here), without any context.

@floam
Copy link
Member

floam commented Jan 24, 2019

I was able to copy and paste 🛠 just fine from his comment, it's not an image. It does mess up fish's rendering with correctly-configured emoji widths.

@floam floam added bug Something that's not working as intended and removed question labels Jan 24, 2019
@jerr0328
Copy link
Contributor Author

Interesting, that shows as a perfectly normal character to me here in Chrome, I don't know if the browser or Github is trying to render as an image, I see the character itself both in my comment and in yours.

Running the printf \U1F6E0 | pbcopy (printf to avoid the newline thrown in by echo) doesn't help too much, I get the character but there's still some weirdness (like going up in the history and editing the command). I can work around this character, knowing that the codepoint is problem. For both, though string length 🛠 shows a length of 1. Here's the output of printf \U1F6E0 | god -t x1z (prefix of g for gnu version, macOS otherwise has the BSD version which doesn't have z type):

0000000 f0 9f 9b a0                                      >�..�<
0000004

I'm not really sure if this means anything...

@faho
Copy link
Member

faho commented Jan 25, 2019

I was able to copy and paste hammer_and_wrench just fine from his comment, it's not an image.

It is for me, but that's probably due to me using firefox and/or linux. It's not all that important.

It does mess up fish's rendering with correctly-configured emoji widths.

How wide is it?

If you do printf '%s\n' aa \U1F6E0, is it as wide as one or two a?

Because our wcwidth says it's one of those that changed in Unicode 9, so if you're using Unicode 9 widths, it should be as wide as 2 a.

(And it's probably good news that it wasn't the variation selector)

@floam
Copy link
Member

floam commented Jan 25, 2019

It's as wide as two a's.

@jerr0328
Copy link
Contributor Author

I see it as two, as well.
printf '%sx\n' aa \U1F6E0 shows that it's being treated as 1 wide, with the x on the second line in the second line (first line it's on the third, following the two x characters).
screenshot 2019-01-25 at 18 19 52

@faho
Copy link
Member

faho commented Jan 25, 2019

So... I'm afraid your terminal is wrong.

@jerr0328
Copy link
Contributor Author

jerr0328 commented Jan 25, 2019

I saw the same behavior (with the printf) in the standard macOS Terminal as well, with different fonts as well. It could be a system-wide issue. I'm not sure how Vim handles it because I see it rendered correctly there

@floam
Copy link
Member

floam commented Jan 25, 2019

Yes, I should have said that I'm on macOS 10.14, Terminal.app and iTerm. If we can figure out the range of emojis rendering like this we should be able to add a special case to our wcwidth.

@floam
Copy link
Member

floam commented Jan 25, 2019

And no, it's not the case that the libc wcwidth does any better:
wcwidth

@ridiculousfish
Copy link
Member

Bug is U+1F41B, Hammer and Wrench is U+1F6E0. These both fall in the "widened in Unicode 9" category (so should respect "Use Unicode 9 widths" and $fish_emoji_width, which should agree).

The bug appears to be that iTerm2 is treating 'Hammer and Wrench' (U+1F6E0 ) as width 1, even if "use Unicode 9 widths" is set. @gnachman does that seem right to you?

@faho
Copy link
Member

faho commented Jan 25, 2019

The bug appears to be that iTerm2 is treating 'Hammer and Wrench' (U+1F6E0 ) as width 1, even if "use Unicode 9 widths" is set. @gnachman does that seem right to you?

Moreover the rendering is buggy - a character following hammer and wrench overlaps it, as can be seen in #5583 (comment):

image

@floam
Copy link
Member

floam commented Jan 25, 2019

Ah, I didn't notice that. Terminal.app does the same thing. This probably should be reported to Apple.

@ridiculousfish
Copy link
Member

The overlapping text is a natural consequence of rendering emojis as width 1; there's nothing particularly buggy about that.

@gnachman
Copy link

@ridiculousfish Doesn't look full width to me. From https://www.unicode.org/Public/11.0.0/ucd/EastAsianWidth.txt

1F6E0..1F6EA;N   # So    [11] HAMMER AND WRENCH..NORTHEAST-POINTING AIRPLANE

@faho
Copy link
Member

faho commented Jan 26, 2019

Doesn't look full width to me

@gnachman: Our wcwidth (https://github.com/ridiculousfish/widecharwidth) assumes that everything in emoji-data.txt that was introduced prior to Unicode 9 was widened in Unicode 9 (no matter what EastAsianWidth.txt says).

The rendering seems to agree - it's drawn 2 wide.

@jerr0328
Copy link
Contributor Author

@ridiculousfish Doesn't look full width to me. From https://www.unicode.org/Public/11.0.0/ucd/EastAsianWidth.txt

1F6E0..1F6EA;N   # So    [11] HAMMER AND WRENCH..NORTHEAST-POINTING AIRPLANE

TR11 Recommendations does mention that emoji should always be Wide, regardless of East_Asian_Width property value:

[UTS51] emoji presentation sequences behave as though they were East Asian Wide, regardless of their assigned East_Asian_Width property value.

In digging through Unicode pages about this, I saw here: http://unicode.org/emoji/charts/emoji-variants.html#1f6e0 that there's the text version and the emoji version (now I understand the variation selector codepoint and what it does). Since the emoji version is selected, it should behave as a wide character. The non-emoji version should behave as a narrow character. Not sure if this helps, but it's at least helped me understand a bit more about what the standard is trying to do, so hopefully it'll help someone else reading this thread.

@ridiculousfish
Copy link
Member

ridiculousfish commented Jan 27, 2019

Ugh I think I made a mistake - I believed that all emoji became width 2 in Unicode 9 but apparently that's not true.

I retract my mistake; I now think fish is right here and iTerm2 is wrong.

The basic emoji set are characters that have the Emoji property; I'm led to believe that Emoji by default implies Emoji_Presentation=Yes, though this may be switched with the text variation selector:

The emoji code points are those with property values Emoji=Yes, Emoji_Component=No, and Emoji_Presentation=Yes

Emoji_Presentation=Yes implies that the character renders as East Asian Wide regardless of EastAsianWidth.txt:

emoji presentation sequences behave as though they were East Asian Wide, regardless of their assigned East_Asian_Width property value

But what about:

1F6E0..1F6EA;N # So [11] HAMMER AND WRENCH..NORTHEAST-POINTING AIRPLANE

this implies that Hammer and Wrench has width 'N' (neutral, i.e. 1) when not rendered as emoji which is possible via the text variation selector U+FE0E. Here I'll try (good luck to your browser):

Hammer and Wrench default: 🛠
Hammer and Wrench with Text VS: 🛠︎

So Hammer and Wrench should be width 2 by default, and width 1 with Text VS. (But please don't do that second thing, there's no way fish could support that!)

(Also TextEdit renders default Hammer and Wrench as width 2 and it usually gets these things right.)

Thanks to @jerr0328 for finding the buried language.

@jerr0328
Copy link
Contributor Author

I did find an open issue regarding the rendering of these emoji with VS.
https://gitlab.com/gnachman/iterm2/issues/7239

Interesting to read in the issue:

I think bash/readline is treating the heart symbol and the emoji variant selector as two separate typed/deletable characters, which is a little surprising.

I wonder if this also relates to the fact that the quote closing change goes a bit haywire with this.

Also, I looked again at the asciinema video on a different computer (running Windows) and the different rendering of emoji causes differences. In this case I think I really need to record the screen as a video to properly capture the issue,

@faho
Copy link
Member

faho commented Jan 27, 2019

So Hammer and Wrench should be width 2 by default, and width 1 with Text VS. (But please don't do that second thing, there's no way fish could support that!)

(Dumb idea: Assume that the Text VS is used "correctly" and assign width "-1" to it iff emoji-width is 2? Just like we assign width 1 to emoji VS?)

@mqudsi
Copy link
Contributor

mqudsi commented Jan 27, 2019

What if the user is on an OS that does not support Unicode 9.0 (or higher)? e.g. I believe while U+1F469 (woman) and U+1F3EB (school) are both in Unicode 6.0, U+1F569,U+200D,U+1F3EB (female teacher) is only in Unicode 9.0. If fish "always" deduces that the combination is 1 wide, it'll be wrong when running under an older OS (even with the same terminal emulator and version and font), no?


Also, I know the asciicinema stream depends on your OS and browser, but fwiw the following did not display correctly in the asciicinema for me but locally works out-of-the-box in both iTerm2 and Terminal.app on macOS Yosemite 10.10:

sparkles emoji + text

@faho
Copy link
Member

faho commented Jan 27, 2019

If fish "always" deduces that the combination is 1 wide

That's what $fish_emoji_width is for. You set it to 1 if you don't have Unicode 9, and 2 if you do.

@mqudsi
Copy link
Contributor

mqudsi commented Jan 27, 2019

That's what $fish_emoji_width is for. You set it to 1 if you don't have Unicode 9, and 2 if you do.

Maybe I'm missing something, but each release (and we're now on Unicode version 11) brings new combinations (no?). What if your OS release supports Unicode 9 but you're trying to use a new combination from Unicode 11?

@faho
Copy link
Member

faho commented Jan 27, 2019 via email

@mqudsi
Copy link
Contributor

mqudsi commented Jan 27, 2019

Not if fish_emoji_width were replaced with a fish_unicode_version and its value were quizzed to determine supported combinations. Personally, I don't use emoji in the terminal, but my personal feelings about that don't seem to have dampened the demand for emoji support in the terminal.

(Also, to share a tangentially related issue that I found enlightening some months back: kovidgoyal/kitty#461)

@faho
Copy link
Member

faho commented Jan 27, 2019 via email

@jerr0328
Copy link
Contributor Author

I do see issues if I use emoji with lots of ZWJs (e.g. this class of emoji: 👨‍❤️‍💋‍👨), where basically the cursor jumps ahead but the text is rendered behind. I'm not sure if this is exactly the same issue, since I can backspace out the ZWJ and subcomponents of the emoji (which is more an issue with the terminal).

I wonder if a short-term fix is have another flag (build or runtime) to handle a broken variation selector in the terminal by assuming the VS is width 0 and the emojis with VS follow the east_asian_widths? I think the issue I see is that the quote-closing color change is updating quotes in the wrong position because of the widths being inconsistent (but I don't quite know how the code works so I could be totally wrong).

@gnachman
Copy link

@faho I'd like to get consensus around how terminal emulators handle width. wcwidth() does not seem adequate. fish seems to have made some progress on defining a solution to the problem. A group of terminal emulator authors are participating in the Terminals Working Group and are tasked with writing specifications for terminal emulators. I've opened an issue here, and if I've overlooked anything please jump in with suggestions: https://gitlab.freedesktop.org/terminal-wg/specifications/issues/9

@floam
Copy link
Member

floam commented Feb 19, 2019

Given that we've encountered VS 15 in the wild in #5668, I'm going to commit a change to have our function return 0 for that case. It's only right some of the time - sometimes it probably needs to subtract 1 depending on the previous character, perhaps possible in our wcswidth implementation - but returning 1 which would increase the sum is always wrong.

@floam
Copy link
Member

floam commented Mar 13, 2019

It really seems like we do need something more clever that takes entire strings into account. It seems to me that wcswidth could theoretically do this but does not. I assume ICU can figure it out? What's there short of that?

@floam
Copy link
Member

floam commented Mar 13, 2019

This is leaving aside that it seems like you actually need insight into the fonts available to figure out some things. But one could get much closer with a function given an entire string.

@faho
Copy link
Member

faho commented Mar 13, 2019

This is leaving aside that it seems like you actually need insight into the fonts available to figure out some things.

Sorry, if the font changes things there's nothing we can do. E.g. non-monospace fonts are broken here.

@floam
Copy link
Member

floam commented Mar 13, 2019

Of course. Or if you don't have a graphical representation available for the slightly smiling emoji. Point is, a wcswidth taking into account which sequences come after other sequences, it seems like we could get closer to the right answer more of the time.

@ridiculousfish
Copy link
Member

We're completely at the mercy of the terminal's rendering here. bash doesn't have these issues because bash doesn't care: it doesn't need to worry about character widths because it doesn't attempt to do layout. The price of this is that it can't support e.g. a right prompt. However I wonder if we can move fish towards caring less; if fish required the character width less often, it would help reduce this sort of issue.

@gnachman
Copy link

Bash has lots of problems with character width because it likes to redraw the prompt under various circumstances, such as when you control-R or the window resizes. It's a fairly frequent source of bug reports for iTerm2.

@mqudsi
Copy link
Contributor

mqudsi commented Mar 28, 2019

Lol. I can sympathize. It feels like I spend most of my free time filing bug reports related to various resizing and and repainting issues in neovim where one patch fixes things for a terminal only to break it for ten others.

What we need is a unified API in the opposite direction where a program can ask the terminal "tell me about the currently visible layout" (or "tell me how this changeset would be rendered") and get meaningful results back. There are too many variables beyond our control in 2019 that were simply not an issue in the 80s (let alone earlier). It's not just the number of bytes not correlating to the number of glyphs or the width of a glyph not being a single cell, even accounting for the same combination of terminal and shell. The font in use, the operating systems rendering features, accessibility options at play.. there has to be a better option than throwing it all away and bundling a web browser in a gtk toolkit and calling it a terminal that solves these problems.

@floam
Copy link
Member

floam commented Apr 5, 2019

The primary thing that would be nice to make better is the way the computed cursor position being off really makes things go to hell interactively in fish, it can make it hard to even enter commands because the user has to guess where fish is going to insert a character when they type, and guess what character will go away when they hit backspace. The stair stepping, text getting entered wrong, and way you start typing over the printed prompt is bonkers for a user to figure out. Could we do anything to cause fish not to care about the computed cursor position quite so much or limit the damage?

A right prompt being misaligned is annoying but what's really maddening is an emoji appearing in your current command-line or being flashed from an autosuggestion and totally corrupting the experience. Could cursor position save/restores or querying the cursor position on capable terminals help?

@floam
Copy link
Member

floam commented Apr 6, 2019

I wrote this function to determine the actual rendered widths of given strings. It'll give the number of columns a character actually took after being printed out by the actual terminal regardless of wcwidth. It's been useful for testing things:

function realwidth
    # enter alt screen mode, sorry
    tput smcup
    # print argument and then ask terminal to report the cursor position
    printf "%s%s" $argv[1] \e\[6n\;
    # the response is as if user typed it
    read -l out -n5
    # exit alt screen mode, sorry
    tput rmcup
    # cursor is in the column after the thing was rendered
    set -l width (math (string match -r '\[\d;(\d+)R' $out)[2] - 1)
    printf "%d\n" $width
end
$ realwidth 😳
2

It uses the alternate screen only because I couldn't figure out how to make read -s not echo any characters like bash's can (we print the asterisks/bullet points instead of changing modes), and stty -echo didn't work in fish.

If this could be done without the alternate screen hacks and by diffing the before and after row/column addresses, in c++, it seems like we could do one-off checks on launch or first launch (in likeness to how we generate manpage completions) to determine the correct emoji width value and ambiguous width value, for capable terminals. Without the alternate screen junk/tput commands it takes 4ms in fish script currently to test 1 string.

@floam
Copy link
Member

floam commented Apr 6, 2019

Or maybe it could be triggered by fish_config.

@faho
Copy link
Member

faho commented May 28, 2019

I think the current situation is the best we can do with changes just in fish, so I'll close this.

It's also not that bad anymore - use an up-to-date terminal on an up-to-date system and it should mostly "just work".

@faho faho closed this as completed May 28, 2019
@faho faho added this to the fish 3.1.0 milestone May 28, 2019
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something that's not working as intended
Projects
None yet
Development

No branches or pull requests

6 participants