Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop ascribing meaning to codepoints #787

Closed
ghost opened this issue Aug 4, 2018 · 3 comments
Closed

Stop ascribing meaning to codepoints #787

ghost opened this issue Aug 4, 2018 · 3 comments

Comments

@ghost
Copy link

ghost commented Aug 4, 2018

Manish Goregaokar explains this better than I ever will: https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/. The issues described in his blog post also show up in kitty. For example, ZWJ's are rendered as <200d> on the prompt and ignored in the output. This is what echo "👨‍👩‍👧‍👦" looks like on the prompt:

$ echo "👨<200d>👩<200d>👧<200d>👦"
👨👩👧👦

On the other hand 👩🏽 renders correctly but takes up 2 cell blocks (both in the prompt and the output) so this is consistent with #461 (comment).

For text in non-latin scripts there are also some weird things happening: 각 shows up in the prompt as ᄀ<1161><11a8> but correctly in the output (and it only takes up one cell, even though it is made up of three codepoints, so this is not consistent with #461 (comment)).

All in all this is probably a (really) hard problem to solve, but it would be greatly appreciated if you could look into it.

@kovidgoyal
Copy link
Owner

zwj is not currently supported. This is something that can be
implemented, but is not something I am particularly keen to implement,
since it is a performance hit in general use for a pretty useless
feature (its only use is for combining emojis and arabic scripts). It
means that one now has to store a potentially unbounded number of unicode
codepoints per cell instead of the current three. This will have a
terrible impact on common case performance, which makes it not
worthwhile.

As for the skin tone combining char, that looks like a bug and will
probably be fixable. Feel free to open a separate issue for that.

Finally, for 각 it not showing up at the prompt is a bug in whatever
shell you are running, run cat instead of a shell and it will render
correctly although it will take up three cells, not one, which is a
fundamental limitation of kitty's design as a character grid.
I should note that in general, kitty will never work well with th emore
complex scripts such as arabic/indic languages, since it is designed as
a character grid and those scripts do not fit into the character grid
paradigm. See #704

@kovidgoyal
Copy link
Owner

Emoji skin tone modifiers are now properly recognized as combining chars. 000c1cf

@kovidgoyal
Copy link
Owner

kovidgoyal commented Aug 4, 2018

Oh and your issue with zwj is also from your shell. Use cat or up-to-date bash and 👨‍👩‍👧‍👦 will render without the <200d>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant