Fish mis-identifies rightmost point of prompt with utf8 characters #2199

nyarly · 2015-07-13T22:18:50Z

If I start entering a command:

  0😉 judson@dijkstra /home/judson ⮀ ls <tab>

and start moving through the pager:

  0😉 judson@dijkstra /home/judson ⮀ l1.png 
1.png                                      francis_portrait.gif                  qrcode-jlinfo.png

where what I'd expect would be:

  0😉 judson@dijkstra /home/judson ⮀ ls 1.png 
1.png                                      francis_portrait.gif                  qrcode-jlinfo.png

Hitting ^c removes the last few characters of the prompt.

I've experimented with my fish_prompt - if I replace the 😉 and ⮀ with simple ASCII characters, all behaves as expected.

I'd tried instead removing all terminal coloring from the prompt - helps not at all, and in the absence of UTF8 characters, works fine (I'm using set_color everywhere.)

Using one of the git_status prompts is worse, since they use checkmarks etc.

Relevant environment:

I'm on Gentoo Linux, using urxvt.

LANG=en_US.utf-8
LC_CTYPE=en_US.UTF-8
TERM=rxvt-unicode-256color

I've done a quick tryout of a couple of other terminals and get similarly broken results. Weirdest is st, which doesn't display the characters but gets really confused about the correct insertion point.

The text was updated successfully, but these errors were encountered:

ridiculousfish · 2015-07-13T23:40:04Z

Is the emoji double width? We may need to teach wcwidth about emoji.

zanchey · 2015-07-14T01:31:02Z

I don't have those characters in any of my fonts! Certainly some UTF-8 characters are okay - ✔ and ✚ work fine in my git prompt.

ridiculousfish · 2015-07-14T01:40:50Z

The happy face emoji is definitely double width. I'm not sure what the other one is. So it looks like wcwidth() needs to be updated.

pickfire · 2015-07-14T06:50:26Z

What font are you using? It have rare characters that I seldom found in terminal.

nyarly · 2015-07-14T06:54:21Z

I'm using a monospace font and both the happy face and the triangle are single width in the font. (I edited a local copy of monofur for my own use for the happy face - the triangle is coming from Deja Vu Mono) I'll admit that I'm somewhat beyond my depth re the width of characters etc. That said, the prompt is all me, so I should be able to dig out e.g. codepoints if that'll help.

In the meantime, is there some way I can determine what characters would be safe to use? I suspect the branch symbol I'm using in my git status is "wide" - it makes the offset worse.

ridiculousfish · 2015-07-14T07:01:46Z

There's this comment:

/* The following two functions define the column width of an ISO 10646
 * character as follows:
 *
 *    - The null character (U+0000) has a column width of 0.
 *
 *    - Other C0/C1 control characters and DEL will lead to a return
 *      value of -1.
 *
 *    - Non-spacing and enclosing combining characters (general
 *      category code Mn or Me in the Unicode database) have a
 *      column width of 0.
 *
 *    - SOFT HYPHEN (U+00AD) has a column width of 1.
 *
 *    - Other format characters (general category code Cf in the Unicode
 *      database) and ZERO WIDTH SPACE (U+200B) have a column width of 0.
 *
 *    - Hangul Jamo medial vowels and final consonants (U+1160-U+11FF)
 *      have a column width of 0.
 *
 *    - Spacing characters in the East Asian Wide (W) or East Asian
 *      Full-width (F) category as defined in Unicode Technical
 *      Report #11 have a column width of 2.
 *
 *    - All remaining characters (including all printable
 *      ISO 8859-1 and WGL4 characters, Unicode control characters,
 *      etc.) have a column width of 1.
 *
 * This implementation assumes that wchar_t characters are encoded
 * in ISO 10646.
 */

That probably answers your question in far more detail than you want :)

pickfire · 2015-07-14T07:06:08Z

In the meantime, is there some way I can determine what characters would be safe to use? I suspect the branch symbol I'm using in my git status is "wide" - it makes the offset worse.

Do you mean ? In My Humble Opinion, it is safe as long as you have the patched font but I think you should look at @ridiculousfish's comment.

I personally uses CJK double-width font and I did not find any issue with them when using fish on st with the recent patch.

nyarly · 2015-07-14T21:02:23Z

@pickfire - do you mean a recent patch to st? I tried st-0.5 direct from suckless out for a second but specifically couldn't get these characters to render and dropped it. For what it's worth, the positioning is worse in st than urxvt. At least urxvt is consistent; in st, cycles of deleting and re-completing put the cursor in different places. 😱

Comparing directory-specific behavior of my prompt provides two more data points:

  0😉 judson@dijkstra /home/judson ⮀ l|s
  0😉 judson@dijkstra …4/MindSwarms/web ⭠ video_processing_int ⮀| ls

It's not conclusive, but the number of > 7bit characters in the prompt == the offset from the correct position of the cursor (synthesized here by |)

Per charmap and http://www.unicode.org/versions/Unicode7.0.0/ch02.pdf
😉 is in Symbol, so I think general category "S"
… is in "Punctuation", so general category "P"
⭠ is "not assigned"
⮀ is "not assigned"

So: none are in C0/C1, Mn, Me, Cf, medial vowels or CJK. The latter two, charmap reports as "non printable" but according to the above comment I think they should all be width 1.

Here's a thing: I'm on Gentoo Linux, which for better or worse means that fish and the libs it depends on can have different e.g. configure settings. (For instance, I just tried recompiling ncurses with the tinfo use variable set, which also didn't help...) Are there versions and configurations of fish's deps that might impact how it calculates where to move the cursor and draw?

nyarly · 2015-07-15T00:27:10Z

Okay, looking deeper: my build sets HAS_WCWIDTH, so it seems like I'm using glibc's wcwidth() function. Which leads to the next question: I have glibc-2.19-r1 installed. I'd think that would be current enough, but who can say for sure. Maybe I'll have a few minutes later tonight to build a wcwidth tester and see how it goes.

nyarly · 2015-07-15T23:11:14Z

I'm plainly out of my depth. After 30 minutes of hacking I've got:

#define _XOPEN_SOURCE
  #include <stdio.h>
  #include <wchar.h>
  #include <string.h>
  #include <errno.h>

  int main(int argc, char **argv){
    int i, j, n;
    mbstate_t *ps;
    wchar_t wchar;

    for(j = 0; j < argc; j++){
      memset(&ps, 0, sizeof(ps));
      printf("%s len %d\n", argv[j], strlen(argv[j]));                                                                                                  
      n = 1;
      for(i = 0; i < 200 && n > 0;){
        n = mbrtowc(&wchar, argv[j] + i, 8, ps);
        i = i + n;
        printf("%d - %d/%d\n", wchar, n, wcwidth(wchar));
        if(n == -1 && errno == EILSEQ){
          printf("Invalid multibyte sequence\n");
        }
        if(n < 0){
          printf("Errno: %d\n", errno);
        }
      }
    }
  }

Which gets me:

> ./a.out …2/ xyz
./a.out len 7
46 - 1/1
47 - 1/1
97 - 1/1
46 - 1/1
111 - 1/1
117 - 1/1
116 - 1/1
0 - 0/0
…2/ len 5
0 - -1/0
Invalid multibyte sequence
Errno: 84
xyz len 3
120 - 1/1
121 - 1/1
122 - 1/1
0 - 0/0

I've tried switching my LC_CTYPE around to no avail. The whole point here was to try to see what widths the various characters are reported with, but I seem to have some fundamental misunderstanding of processing wide characters in C.

nyarly · 2015-07-15T23:14:36Z

One thing that came of this experiment: fish gets even more confused when commands have multibyte characters in them.

ridiculousfish · 2015-07-16T05:27:43Z

I tried looping through every value and could not find any that returned a wcwidth above 1, in the en_US.UTF-8 locale, even though there are plenty of double-width forms. I thought I understood this stuff but maybe not? Anyways it's tempting to either "augment" or drop entirely the system wcwidth on Linux in favor of our own. We already do on OS X.

nyarly · 2015-07-16T16:36:45Z

The upshot there would be forcing HAVE_WCWIDTH 0 (or undefined) or BROKEN_WCWIDTH, right? That might be doable.

zanchey · 2015-07-23T10:51:17Z

How does this look since #2217 was merged?

ghost · 2015-10-10T05:56:04Z

Please also check this: https://github.com/JuliaLang/utf8proc

faho · 2016-05-26T14:29:47Z

@nyarly: Any improvement here with 2.3.0?

joehillen · 2016-06-14T21:07:31Z

I wasn't having this issue until 2.3.0. Now I am.

floam · 2016-06-14T21:47:36Z

Does #3143 fix it?

joehillen · 2016-06-14T21:59:18Z

@floam Yes! 💯

Strange that those changes are related. I just happened to have ➤ as my prompt char.

floam · 2016-06-14T22:35:43Z

Cool!

I am closing this as a duplicate of #3124 - we'll follow it there.

Minor cleanup related to issue #2199.

krader1961 · 2016-06-15T00:20:42Z

Remaining places that call wcwidth() rather than fish_wcwidth() have been fixed by commit d70be18.

floam · 2016-06-15T00:22:56Z

Handling of e.g. some color emojis does seem wrong. Do we simply need to update our rules for newer unicode specs?

krader1961 · 2016-06-15T03:29:13Z

Do we simply need to update our rules for newer unicode specs?

Maybe. I checked the source of our wcwidth implementation and it hasn't been modified since we copied it. Googling "markus kuhn wcwidth" didn't turn up any documents saying it was out of date. I don't think we need to spend more effort on this until someone reports a problem that isn't due to my screwup in the 2.3.0 release setting the locale when a local locale var goes out of scope.

floam · 2016-06-15T04:12:33Z

Googling "markus kuhn wcwidth" didn't turn up any documents saying it was out of date.

Google "Unicode 7" or crib off https://github.com/jquast/wcwidth/ - we're several revisions out of date I think?

I think it'd serve us well to follow what they figured out in the above github project.

floam · 2016-06-15T04:20:31Z

CharWidths.txt

krader1961 · 2016-06-15T04:29:52Z

I looked at the changes in https://github.com/jquast/wcwidth/ and they appear to be refinements to how combining characters are handled. An edge case for us that probably only affects Cygwin users using characters outside the BMP. I'm also quite leery of using random sources of "the truth" such as the CharWidths.txt link you added.

If you're passionate about this please open a new issue and take ownership of it. Personally I think we should stop using our own implementation of wcwidth. If someone is using a distro that has a broken implementation they should be demanding the distro maintainer to fix their implementation. Unicode is no longer bleeding edge technology. Having our own implementation might have made sense in 2007. It doesn't make any sense now.

ghost · 2016-06-15T04:56:15Z

This issue is not fixed at all. I cannot understand why this issue is eagerly closed for another issue just because nobody keeps complaining and for someone's indifference. It is just lucky that those emojis do not cause layout problems on some terminals that some people are using. An update to the hard-coded width table may cause problems again, for the terminal may use another table.

If someone is using a distro that has a broken implementation they should be demanding the distro maintainer to fix their implementation.

I don't see any hope for such an infamous bug about wcwidth.

Unicode is no longer bleeding edge technology.

But it is a monster, hence the ICU project. Do you know I mentioned another similar project utf8proc before?

The solution mentioned by @floam in #2484 is quite flexible, although fish needs to update the width table once the users change the font settings of their terminals. I think that gives a good direction.

(EDIT: fix typos)

krader1961 · 2016-06-15T05:23:08Z

@jakwings, I don't know what you expect us to do. We should be able to rely on the unicode implementation provided by a given distro at this point in time. As you said in your last update to issue #2484, which is still open,

it is impossible to solve the spacing problem perfectly.

I'm happy to review any changes you, or anyone else, submits to improve how fish handles the problematic characters. Since issue #2484 is still open I don't understand what you're complaining about.

krader1961 · 2016-06-15T05:29:13Z

Also, this issue was not "eagerly closed". It's been open for 11 months. And the original problem was fixed seven days after the issue was opened. That the fix may not solve all unicode character width problems isn't relevant to this issue.

ghost · 2016-06-15T05:30:36Z

Sorry, in #2484, I was just replying to floam for that multibyte issue. Now I've appended the name floam to it.

We should be able to rely on the unicode implementation provided by a given distro at this point in time.

I don't think so, even @ridiculousfish (he added the code for mk_wcwidth).

And the original problem was fixed seven days after the issue was opened. That the fix may not solve all unicode character width problems isn't relevant to this issue.

Really? I am suspicious.

ghost · 2016-06-15T05:46:24Z

Ok I remember the fix submitted by the reporter. I've no complaint for closing this issue now. But wcwidth is definitely a nasty thing.

krader1961 · 2016-06-15T05:57:59Z

The issue of character cell width reported by wcwidth() is definitely "a nasty thing". It is not the responsibility of fish to get every single case correct if the libraries we depend on get it wrong. The only reason for fish to have its own private implementation is in the very early days when we could not reasonably expect the platforms we run on to have a correct implementation.

Given that various terminals, including pseudo terminals like tmux, have their own private wcwidth implementations it is impossible for fish to do the right thing in every single circumstance.

I'm more than happy to review and merge any changes to make fish friendlier to people using characters not in the ASCII or European char sets. If you're someone who cares about this issue and has expertise on the topic I encourage you to create pull-requests for review or open issues with enough information to allow someone else to make the necessary changes.

ghost · 2016-06-15T06:14:05Z

Not decided yet, the solution from floam may not be universal. Need investigation.

llllvvuu · 2017-09-20T07:57:13Z

Forgive me for my ignorance, but might there a way to have the right prompt positioned correctly without getting wcwidth right? zsh somehow does this (I've managed to trip up their wcwidth but the rprompt is still positioned correctly). Is there an obvious reason that whatever they're doing is infeasible here, or would I have a fair shot of figuring it out?

faho · 2017-09-20T11:34:53Z

zsh somehow does this (I've managed to trip up their wcwidth but the rprompt is still positioned correctly)

Did you trip it up in the right prompt? Was the total width wrong (i.e. you didn't have two wrong characters, one over- and one undercounted)?

Here's what fish does to write the right prompt:

s_move(scr, &output, (int)(screen_width - right_prompt_width), (int)i);
s_set_color(scr, &output, 0xffffffff);
s_write_str(&output, right_prompt);

i.e. it moves the cursor to the starting position of the right prompt (which is screen_width - right_prompt_width), sets the color (which by definition has a width of 0 since no character shows up on screen) and then writes the prompt.

This depends on three things:

The color sequence needs to be one that the terminal either recognizes or throws away
The screen-width needs to be correct.
The right_prompt_width needs to be correct.

Now, there might be other ways of doing it (not that I can come up with one), but that's essentially what zsh does as well (at least that's what an asciinema recording tells me).

However, note that this bug is closed. I'd direct you to #2652 instead. Or open a new one and explain your actual problem there in detail.

tsujigiri · 2017-10-13T10:13:05Z

FWIW, I had an issue with color emoji in the left prompt, which messed things up, but turned out to be an issue with tmux. Making tmux use libutf8proc version >2 solved that one for me. The right prompt is still not aligned with the right border, though, so I searched here for leads. Maybe this helps.

JoshCheek · 2018-06-20T02:56:51Z

Heeeeeey, everybody!!

So yeah, I hit this issue, in iTerm2 on a Mac. Eventually figured out that iTerm has an option under "Text" to "Use Unicode version 9 widths", which is now checked by default. After unchecking, the problem was fixed. I imagine this sort of thing will keep coming up until everyone standardizes on the same widths, but at least I understand why it was wonky now and will have a sense of where to go looking next time it hits me.

Oh, and for more context, read iTerm's tooltip, which I captured in the screenshot.

ridiculousfish · 2018-06-23T21:03:17Z

We will never standardize on the same widths 😢

fish 3.0 has the fish_emoji_width variable which is effectively that same setting (Unicode 9 widths). So you can either tweak the iTerm2 setting or tweak fish, depending on whether you prefer 1 or 2 cell emoji. This should probably be a FAQ.

ridiculousfish added this to the fish-future milestone Jul 13, 2015

zanchey mentioned this issue Jul 15, 2015

Weird behavior with hamburger character #2069

Closed

faho mentioned this issue Jul 16, 2015

History function garbles output #895

Closed

nyarly mentioned this issue Jul 17, 2015

2.2.0 without sys wcwidth #2217

Closed

faho mentioned this issue Jan 4, 2016

Unicode support broken (was This ⸻ unicode ⸻ character ⸻ makes ⸻ weird ⸻ things ⸻ happen) #2652

Closed

ridiculousfish mentioned this issue Mar 11, 2016

Update pcre2 to 10.21 release and track as a git submodule instead of hosting a copy in version control #2808

Closed

ghost mentioned this issue Apr 10, 2016

fish_config prompt rendering is wrong #2924

Closed

faho added the needs more info label Apr 16, 2016

floam added duplicate and removed needs more info labels Jun 14, 2016

krader1961 added bug Something that's not working as intended and removed needs more info labels Jun 15, 2016

krader1961 added a commit that referenced this issue Jun 15, 2016

use fish_wcwidth rather than wcwidth

d70be18

Minor cleanup related to issue #2199.

krader1961 closed this as completed Jun 15, 2016

zanchey modified the milestones: next-2.x, 2.3.0 Jun 15, 2016

krader1961 modified the milestones: fish 2.3.0, next-2.x Sep 4, 2016

faho mentioned this issue Aug 1, 2017

Multi bytes characters breaking the prompt in Termux #4269

Closed

zanchey mentioned this issue Nov 28, 2017

Disable HAVE_BROKEN_WCWIDTH #4554

Closed

1 task

nitely mentioned this issue May 8, 2018

east_asian_width ? nitely/nim-unicodedb#3

Closed

github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2020

Fish mis-identifies rightmost point of prompt with utf8 characters #2199

Fish mis-identifies rightmost point of prompt with utf8 characters #2199

Comments

nyarly commented Jul 13, 2015

ridiculousfish commented Jul 13, 2015

zanchey commented Jul 14, 2015

ridiculousfish commented Jul 14, 2015

pickfire commented Jul 14, 2015

nyarly commented Jul 14, 2015

ridiculousfish commented Jul 14, 2015

pickfire commented Jul 14, 2015

nyarly commented Jul 14, 2015

nyarly commented Jul 15, 2015

nyarly commented Jul 15, 2015

nyarly commented Jul 15, 2015

ridiculousfish commented Jul 16, 2015

nyarly commented Jul 16, 2015

zanchey commented Jul 23, 2015

ghost commented Oct 10, 2015

faho commented May 26, 2016

joehillen commented Jun 14, 2016

floam commented Jun 14, 2016

joehillen commented Jun 14, 2016

floam commented Jun 14, 2016

krader1961 commented Jun 15, 2016

floam commented Jun 15, 2016 • edited Loading

krader1961 commented Jun 15, 2016

floam commented Jun 15, 2016 • edited Loading

floam commented Jun 15, 2016

krader1961 commented Jun 15, 2016

ghost commented Jun 15, 2016 • edited by ghost Loading

krader1961 commented Jun 15, 2016

krader1961 commented Jun 15, 2016

ghost commented Jun 15, 2016

ghost commented Jun 15, 2016

krader1961 commented Jun 15, 2016

ghost commented Jun 15, 2016

llllvvuu commented Sep 20, 2017

faho commented Sep 20, 2017

tsujigiri commented Oct 13, 2017

JoshCheek commented Jun 20, 2018 • edited Loading

ridiculousfish commented Jun 23, 2018 • edited Loading

floam commented Jun 15, 2016 •

edited

Loading

floam commented Jun 15, 2016 •

edited

Loading

ghost commented Jun 15, 2016 •

edited by ghost

Loading

JoshCheek commented Jun 20, 2018 •

edited

Loading

ridiculousfish commented Jun 23, 2018 •

edited

Loading