-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support emojis with ZWJ and variant selectors #30014
Conversation
99ca4a4
to
5dd17d9
Compare
15a15a7
to
41bc866
Compare
fc6e6a5
to
be34d8f
Compare
2829eb5
to
249a7ea
Compare
1396f0c
to
7bdfbe1
Compare
@@ -146,7 +146,7 @@ CharSize charsize_regular(CharsizeArg *csarg, char *const cur, colnr_T const vco | |||
} else if (cur_char < 0) { | |||
size = kInvalidByteCells; | |||
} else { | |||
size = char2cells(cur_char); | |||
size = ptr2cells(cur); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of makes it pointless to pass in cur_char
, as utf_ptr2cells()
already handles illegal bytes, and the cur_char >= 0x80
check can be replaced with MB_BYTE2LEN(*cur) > 1
.
Or the logic in utf_ptr2cells()
can be replicated here without the first utf_ptr2char()
to avoid decoding first char twice, but then that will also require passing in ci.chr.len
, so not sure if that's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ye, multiple decoding also happens in other places like the main win_line() loop. we probably want a specialized version of CharInfo which also includes the ptr2cells() width calculated at the same time as the byte length. Although I am thinking of that as a follow-up perf PR while only focusing on correctness (and no larger regressions) in this PR.
@@ -352,7 +352,7 @@ static inline CharSize charsize_fast_impl(win_T *const wp, bool use_tabstop, col | |||
if (cur_char < 0) { | |||
width = kInvalidByteCells; | |||
} else { | |||
width = char2cells(cur_char); | |||
width = ptr2cells(cur); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
{ | ||
if (cur_char == TAB && use_tabstop) { | ||
return tabstop_padding(vcol, buf->b_p_ts, buf->b_p_vts_array); | ||
} else if (cur_char < 0) { | ||
return kInvalidByteCells; | ||
} else { | ||
return char2cells(cur_char); | ||
return ptr2cells(cur); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
eab2bda
to
71957e8
Compare
Use the grapheme break algorithm from utf8proc to support grapheme clusters from recent unicode versions. Handle variant selector VS16 turning some codepoints into double-width emoji. This means we need to use ptr2cells rather than char2cells when possible.
@bfredl sorry to ping you but I have been searching everywhere to try and figure out a solution to my problem and I believe it is similar to the problem(s) you were aiming to fix with this PR. I posted the repro and details in a post in the Neovim reddit here: https://www.reddit.com/r/neovim/comments/1f6z9da/help_with_1_keycap_digit_1_emoji_sequence_with/ I am happy to give you more details or move this conversation somewhere else if you prefer, but the TLDR is that the emoji 1️⃣ (and the other similar numbers 2-9) are having problems and I believe it is due to the multiple code points. It is comprised of |
That probably can't be fixed due to performance reasons. |
@zeertzjq thanks for the quick reply! Is there some sort of "fallback" workaround that I could implement in my config? Like an autocmd that would render emojis like these to a broken icon (or an alternative) emoji (something I would choose)? My team uses these number emojis a lot in comments so it is not an option for me to just remove / replace. But I am totally fine if I just render emojis like these as compatible ones ("replace" how it renders on client side but not alter the actual emoji as I don't want to create / commit a change. I would just create a mapping list and add to it anytime these pop up. I'm just not sure if there is a reasonable way to do this (presumably an autocmd)? Thanks!! |
we could at least mark anything + 0xFE0F as having ambiguous terminal width ( |
Thanks @bfredl ! That sounds great to me! My issue is not the display of the emoji itself but the fact it throws the rest of the line (and often surrounding lines) off. A couple questions:
Thanks so much for the quick response! |
PSA: Please don't leave tangential comments on (especially) on a (merged) PR! If you have a problem, open an issue (yes, that means filling out the template. it's annoying but there for a reason!). That would also allow a PR to be linked to it, so you wouldn't have missed #30232. |
The implementation of grapheme clusters was upgraded to closely follow extended grapheme clusters as defined by UAX#29 in the unicode standard. Noteworthily, this enables proper display of many more emoji characters than before, including those encoded with multiple
emoji codepoints combined with ZWJ (zero width joiner) codepoints and variant selectors.
Fix #7151
Fix #22014