Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Right-to-Left text displayed in incorrect column #1392

Closed
dotancohen opened this issue Jun 6, 2022 · 8 comments
Closed

Right-to-Left text displayed in incorrect column #1392

dotancohen opened this issue Jun 6, 2022 · 8 comments

Comments

@dotancohen
Copy link

Given the following CSV file:

$ cat test.csv
hello world,drink beer!,love people.
שלום עולם,שתה בירה!,תאהב אנשים.

The expected output would have the text שלום עולם under hello world, would have שתה בירה! under "dring beer!, and would have תאהב אנשים.underlove people`.

However, this is the display with the cursor on the first column:
Screenshot_20220606_095658

This is the display with the cursor on the second column:
Screenshot_20220606_095730

And this is the display with the cursor on the third column:
Screenshot_20220606_095805

It seems that the directionality is not reset between cells, so consecutive Right-to-Left cells are displayed from right to left instead of the expected left to right.

This is on Kubuntu 20.04 with Konsole 19.12.3 on KDE 5.18.8.

@dotancohen
Copy link
Author

dotancohen commented Jun 6, 2022

Perhaps this example will be easier for people who don't speak Hebrew to understand. Here I've used Hebrew letters that look similar to Latin letters.

A is under A. This is the same Latin letter, used for reference at the beginning of the line.
ק is under P.
ו is under I.
ס is under O.
Z is under Z. This is the same Latin letter, used for reference at the end of the line.

Here is the file:

$ cat test.csv
A,P,I,O,Z
A,ק,ו,ס,Z

This is the display with the cursor on the first column:
Screenshot_20220606_100750

This is the display with the cursor on the second column:
Screenshot_20220606_100809

This is the display with the cursor on the third column:
Screenshot_20220606_100824

This is the display with the cursor on the forth column:
Screenshot_20220606_100841

This is the display with the cursor on the fifth column:
Screenshot_20220606_100857

The only screenshot which looks as expected is with the cursor on the third column, because in that screenshot there are no unhighlighted consecutive Right-to-Left columns.

@saulpw
Copy link
Owner

saulpw commented Jun 12, 2022

Thanks so much for this detailed bug report, @dotancohen. I committed a potential fix which wraps all text with FSI/PDI as per Unicode TR9: Bidirectional Algorithm. Let's try this on all manner of terminals and see if it works for RTL text, and make sure it doesn't regress anything on weaker platforms (looking at you, Windows Terminal).

@MatanZ
Copy link

MatanZ commented Jun 14, 2022

I believe this is more a konsole bug than a visidata bug.

As a workaround (assuming you use UTF-8 terminal): In visidata config file change

options.disp_column_sep      =  '│'                  # separator between columns
options.disp_keycol_sep      =  '│'                  # separator between key columns and rest of columns
options.disp_status_sep      =  '│'                  # separator between statuses

The line character is U+2502, not ascii 0x7c. You can use any character from this unicode block.

Make sure "Use line characters contained in font" (in profile::Appearance) is unchecked.

@saulpw
Copy link
Owner

saulpw commented Jun 28, 2022

@MatanZ I think your solution is actually the better one. FSI/PDI corrupts the display on at least xterm and Windows, while using characters from the U+25xx block seems to do the better thing in nearly all cases. I have a preference for using the basic ASCII | character to separate (mainly because I like how it visually adds separation between the rows), but I think out-of-the-box RTL support is a lot more important. So we're going to change the default. Thanks for the suggestion!

@saulpw saulpw closed this as completed Jun 28, 2022
@dotancohen
Copy link
Author

dotancohen commented Jun 28, 2022

@MatanZ

Thank you for the suggestion.

In fact, just configuring the U+2502 did not resolve the issue for me, so as stated perhaps it is a Konsole / KDE issue. However, setting the separator as U+200EU+2502U+200E (wrapping the Box Drawings Light Vertical with two Left-to-Right Mark did in fact resolve the issue for me on Konsole, I don't know how well it will work with other systems or other workflows. Here is the config file, there are non-printing characters in there so you might want to copy-paste:

$ cat ~/.visidatarc
options.disp_column_sep = '│'
options.disp_keycol_sep = '│'
options.disp_status_sep = '│'

$ hexdump -C ~/.visidatarc
00000000  6f 70 74 69 6f 6e 73 2e  64 69 73 70 5f 63 6f 6c  |options.disp_col|
00000010  75 6d 6e 5f 73 65 70 20  3d 20 27 e2 80 8e e2 94  |umn_sep = '.....|
00000020  82 e2 80 8e 27 0a 6f 70  74 69 6f 6e 73 2e 64 69  |....'.options.di|
00000030  73 70 5f 6b 65 79 63 6f  6c 5f 73 65 70 20 3d 20  |sp_keycol_sep = |
00000040  27 e2 80 8e e2 94 82 e2  80 8e 27 0a 6f 70 74 69  |'.........'.opti|
00000050  6f 6e 73 2e 64 69 73 70  5f 73 74 61 74 75 73 5f  |ons.disp_status_|
00000060  73 65 70 20 3d 20 27 e2  80 8e e2 94 82 e2 80 8e  |sep = '.........|
00000070  27 0a                                             |'.|
00000072
  • e2 80 8e is the Left-to-Right Mark.
  • e2 94 82 is the Box Drawings Light Vertical

@saulpw
Copy link
Owner

saulpw commented Jul 4, 2022

@MatanZ does the recent change to U+2502 fix all RTL issues in cells for you?

@MatanZ
Copy link

MatanZ commented Jul 10, 2022

Yes. With the latest git it works displays correctly in konsole 22.04.1 (Fedora 36).

But note that in Unicode those characters are defined as neutral bidi class, so this fix will not work in terminals that implement the Unicode algorithm for bidi (which konsole does not).

I hope that konsole 22.12 will have a more standard compliant bidi rendering, but it will override those characters to strong LTR, so there won't be a problem.

@saulpw
Copy link
Owner

saulpw commented Jul 11, 2022

Hm, I'm wondering if there really is any 'general' solution, or if we have to do something different for RTL on each terminal. Or recommend a terminal on each platform that just works for RTL/bidi--which terminal is "best" from this standpoint?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants