Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hide the <U200C> character. #1851

Closed
rnmhdn opened this issue Jul 26, 2019 · 9 comments
Closed

Hide the <U200C> character. #1851

rnmhdn opened this issue Jul 26, 2019 · 9 comments

Comments

@rnmhdn
Copy link

rnmhdn commented Jul 26, 2019

In Persian writing we use the character so much. because sometimes we need to separate to letters that are in the same word.
For example:
میروم is a single word in persian. But we don't write it like that. we write it like:
می‌روم as you can see the character 'ی' here is supposed to look just like how it looks if it comes at the end of a word write before a whitespace. people who don't know how to write that character type the word like:
می روم
Now the problem is that kitty shows that second version which is the correct version like this:
روممی
And that's not particularly great. Is that easy enough for you to fix being as kind as you are?
If you can't make it look how it should.(the correct way would be to just not show anything for that character, but it's important that the character before it looks as if a space had come after it) It would be good if you could at least make it look just like an space.
Many thanks to you for your wonderful terminal:)

@rnmhdn
Copy link
Author

rnmhdn commented Jul 26, 2019

Or maybe I should have asked How can I specify different fonts for different languages:))

@SolitudeSF
Copy link
Contributor

@kovidgoyal
Copy link
Owner

Sorry this is not making much sense to me, my eyes are having a hard time distinguishing the persian script characters. Can you please explain it with using unicode codepoints. Some this like, this equence of unicode code codepoints should render like X but instead kitty renders it like Y.

@rnmhdn
Copy link
Author

rnmhdn commented Jul 28, 2019

This following sequence of Unicode characters:
U+0645
U+06CC
U+200C
U+0631
U+0648
U+0645
Should render like: می‌روم
But kitty renders it like: روم<2008>می
(replace the '8' with 'c' I can't make github display combination of persian and english characters correctly.
What kitty is doing is that it's rendering the ZERO WIDTH NON-JOINER character as the sequence <200c>. Vim also does this to all Unicode characters that can't be displayed. And this is on purpose. So when kitty renders it like that also something else happens which is that the two parts of a single word which are 'می' and 'روم' are swapped but that is a whole different issue that kitty has with rtl text.
the issue is that kitty doesn't exactly display rtl as rtl. what kitty does it it renders each sequence of characters between two ltr characters as rtl.
by rtl characters I mean characters that are specific to an rtl language like arabic or persian. and every other character is considered as ltr even space. so what happens is that kitty renders each persian word(a continuous sequence of persian characters) correctly but it places the words in the reversed order so If I have the text:
سلام دوست عزیز من از شما بابت این شبیه ساز ترمینالی که ساختید ممنونم
kitty will render it like:
ممنونم ساختید که ترمینالی ساز شبیه این بابت شما از من عزیز دوست سلام
or to make it more readable for you and less fun for the future Persion people who are gonna read this:
گگگ ررر ببب ههه
would render as:
ههه ببب ررر گگگ
As you can see the order of the words is reversed.
So, There are two issues, each non rtl character would cause the surrounding words to swap.
The non-displayable \u200c character would render as .
There is a point to consider when fixing the second problem, which is that Persian characters are context sensitive so the unicode character U+06CC would render as یـ if it is placed before a normal persian character but it would render as ی if it is placed before a whitespace.

So if you make the character disappear, it's important that the ی before it doesn't turn into یـ as if the next character is something that ی can connect to. The ی should still look like ی as if the next character is a whitespace eventho the next character is not visible. this is unlikely to cause a problem tho, disregart what I said about ی and یـ and I will tell you about it if it actually becomes a problem.

@kovidgoyal
Copy link
Owner

kitty never displays any characters as or similar. That will be
your shell/vim/other terminal program. Try running

printf 'a\u200Cb'

and it will be displayed as

ab

as expected.

I suspect both your issues are caused by whatever is converting U+200C
to the escaped form. Because when it does that, kitty will see the
sequence instead of the unicode character, so the text becomes a
sequence of rtl chars followed by ltr chars followed by rtl chars.

@rnmhdn
Copy link
Author

rnmhdn commented Jul 28, 2019

You are right. that is not about kitty that I get <200c> I'm still so confused because it works fine in browser and other apps. it only doesn't work in terminal.

There is still the other issue that kitty considers space and most punctuation that are not specific to Persian as ltr characters and that breaks every rtl text because every rtl text probably at least has space in it. Maybe you could make the direction of space etc be determined from the context.

@kovidgoyal
Copy link
Owner

That's because many (most) terminal apps have no support for complex
scripts. As for the punctuation/space issue that's going to be far from
trivial to deal with, more work than I am willing to put in certainly.

@rnmhdn
Copy link
Author

rnmhdn commented Oct 12, 2019

kitty tries to render rtl text correctly. unlike other more simplistic terminals like tilda The problem with this is. if you do :set rl in vim in simple terminals rtl text renders correctly, but not in kitty. Is there a way to tell kitty to treat rtl text as ltr?

@kovidgoyal
Copy link
Owner

kitty doesn't try to do rtl, that comes from harfbuzz which is used to
shape the text in kitty. I dont know if there is a flag one can pass to
harfbuzz to turn it off or not. Is there is, I will be happy to add a
kitty setting for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants