Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

viewing binary files breaks output to kitty terminal #2084

Closed
unxed opened this issue Mar 19, 2024 · 22 comments · Fixed by #2107
Closed

viewing binary files breaks output to kitty terminal #2084

unxed opened this issue Mar 19, 2024 · 22 comments · Fixed by #2107

Comments

@unxed
Copy link
Contributor

unxed commented Mar 19, 2024

Noticed here:

When viewing binary files, far2l consistently blows the minds of the kitty terminal (it starts drawing weird things). On the one hand, the terminal itself probably has bugs, but mc under similar conditions does not produce such an effect.

And here:

visual artifacts and audio beeps

Kitty has this behavior documented and suggests using cat -v for binary data:

I catted a binary file and now kitty is hung?

Never output unknown binary data directly into a terminal.

Terminals have a single channel for both data and control. Certain bytes are control codes. Some of these control codes are of arbitrary length, so if the binary data you output into the terminal happens to contain the starting sequence for one of these control codes, the terminal will hang waiting for the closing sequence. Press ctrl+shift+delete to reset the terminal.

If you do want to cat unknown data, use cat -v.

mc does not have this bug, it just replaces all control codes with . char.

How to reproduce?

  1. sudo apt install kitty
  2. run kitty
  3. far2l --tty in kitty
  4. view initrd.img
  5. hold Down key
  6. wait for beeps
@akruphi
Copy link
Contributor

akruphi commented Mar 19, 2024

Dub #2068

@anta999
Copy link
Contributor

anta999 commented Mar 19, 2024

I scrolled through the entire initrd.img and did not reproduce it :))

@unxed
Copy link
Contributor Author

unxed commented Mar 19, 2024

Under kitty? Kovidgoyal's kitty?

@anta999
Copy link
Contributor

anta999 commented Mar 19, 2024

Yes.. But i got it after switching to UTF-8

@unxed
Copy link
Contributor Author

unxed commented Mar 19, 2024

Please note that even with -a option beeps are still happening. Looks like -a do not work as expected.

@unxed
Copy link
Contributor Author

unxed commented Mar 20, 2024

@Dazzar56 was planning to look into this.

unxed added a commit to unxed/far2ltricks that referenced this issue Mar 22, 2024
@unxed
Copy link
Contributor Author

unxed commented Mar 22, 2024

Actually, it's several separate issues.

  1. Some terminals produce sound on BELL char, so we probably should not output it.

  2. Visual artifacts in TTY mode (tested in GNOME terminal and kitty). To reproduce:
    A) view this file as UTF-8, wrap disabled
    https://upload.wikimedia.org/wikipedia/commons/a/af/K%C3%B6lner_Dom_-_Westfassade_2022_ohne_Ger%C3%BCst-0968.jpg
    B) hold Down key
    C) look at the bottom of the screen
    D) be patient

  3. Visual artifacts in GNOME terminal. To reproduce:
    A) view the same file as UTF-8, wrap disabled
    B) press Page Down
    C) press Page Up

  4. visual artifacts in tty mode #2095

@unxed
Copy link
Contributor Author

unxed commented Mar 22, 2024

This solves problems 1, 2 and 4 for me

diff --git a/WinPort/src/Backend/TTY/TTYOutput.cpp b/WinPort/src/Backend/TTY/TTYOutput.cpp
index a49cd5a0..ff8a1842 100644
--- a/WinPort/src/Backend/TTY/TTYOutput.cpp
+++ b/WinPort/src/Backend/TTY/TTYOutput.cpp
@@ -296,6 +296,12 @@ void TTYOutput::FinalizeSameChars()
 
 void TTYOutput::WriteWChar(WCHAR wch)
 {
+	for (int i = 0; i < 32; i++) {
+		if ((i != 27) && (wch == i)) {
+			wch = L'?';
+		}
+	}
+			
 	if (_same_chars.count == 0) {
 		_same_chars.wch = wch;
 
@@ -395,6 +401,15 @@ void TTYOutput::ChangeCursor(bool visible, bool force)
 void TTYOutput::MoveCursorStrict(unsigned int y, unsigned int x)
 {
 // ESC[#;#H Moves cursor to line #, column #
+
+	const char *tp = getenv("TERM_PROGRAM");
+	//if (tp && strcasecmp(tp, "WezTerm") == 0) {
+		Format(ESC "[%d;%dH", y, x);
+		_cursor.x = x;
+		_cursor.y = y;
+		return;
+	//}
+
 	if (x == 1) {
 		if (y == 1) {
 			Write(ESC "[H", 3);
@@ -412,6 +427,13 @@ void TTYOutput::MoveCursorStrict(unsigned int y, unsigned int x)
 
 void TTYOutput::MoveCursorLazy(unsigned int y, unsigned int x)
 {
+	// workaround for https://github.com/elfmz/far2l/issues/1889
+	const char *tp = getenv("TERM_PROGRAM");
+	//if (tp && strcasecmp(tp, "WezTerm") == 0) {
+		MoveCursorStrict(y, x);
+		return;
+	//}
+
 	if (_cursor.y != y && _cursor.x != x) {
 		MoveCursorStrict(y, x);
 

@unxed
Copy link
Contributor Author

unxed commented Mar 23, 2024

test1.txt
test2.txt

Some test files. I'll just leave them here for now.

@unxed
Copy link
Contributor Author

unxed commented Mar 23, 2024

test1 issues are solved by patch above.

test2 in TTY shows "?" in panels after closing viewer (need to have some free space on them; short file list, for example). Not reproduced on latest Mint Cinnamon, reproduces on Ubuntu 23.10. Font and x11 or Wayland does not matter.

Also pattern from test2 produce another bug (reproduces in Mint and in wx version also):
#2096

@unxed
Copy link
Contributor Author

unxed commented Mar 23, 2024

Hypothesis: the bugs remaining after the patch above are due to incorrect calculation of the width of some wide characters. So it manifests itself differently on different systems, because... Unicode libs are different.

That is, this is not the output of control characters 0-31, as we thought. Simply invalidating parts of the screen that need redrawing does not seem to work correctly because some characters that should be counted as double width are counted as single width. Or they should have a single width, but are considered to be zero width.

But this is only a hypothesis for now.

@unxed
Copy link
Contributor Author

unxed commented Mar 23, 2024

If we put the same sequence, say, in php:

<?php
$bytes = pack('C*', 0xE2, 0x80, 0xA2, 0xDA, 0x92, 0x3F);
echo $bytes . "\n";
echo mb_strwidth($bytes) . "\n";

we will see:

~/1111$ php ./test.php
•?
3

far2l, apparently, considers this construction to have a width of two characters, and, accordingly, sees no point in invalidating the 3rd one, so it remains on the screen.

@unxed
Copy link
Contributor Author

unxed commented Mar 23, 2024

GNOME Terminal and GNOME Text editor also treat 0xE2, 0x80, 0xA2, 0xDA, 0x92, 0x3F as 3 char width.
But far2l only shows two.

GNOME text editor also recognizes this sequence as RTL that is not supported by far2l, see
#1380

@unxed
Copy link
Contributor Author

unxed commented Mar 23, 2024

Shortest way to reproduce: try this in far2l and in GNOME Terminal:

echo -e "\xda\x92\x3f"

GNOME shows 2 chars, far2l shows 1

@unxed
Copy link
Contributor Author

unxed commented Mar 23, 2024

Похоже, мы имеем дело со сложным багом, вернее даже тремя:

  1. бипы пролезают в терминал, хотя не должны

  2. lazy смена координат курсора не работает в части терминалок (это требует внимательной перепроверки)

  3. вот эта вот последовательность
    \xda\x92\x3f
    при интерпретации в качестве utf-8 теряет один символ

Может и ещё что-то есть. Это только те причины, которые пока что удалось идентифицировать.

@unxed
Copy link
Contributor Author

unxed commented Mar 23, 2024

Another way to reproduce: try to paste this in editor:

•ڒ?

@unxed
Copy link
Contributor Author

unxed commented Mar 23, 2024

Fix, works for me. But this why file is generated wrongly?

diff --git a/utils/src/CharClasses.cpp b/utils/src/CharClasses.cpp
index aff680e5..3e2d28ab 100644
--- a/utils/src/CharClasses.cpp
+++ b/utils/src/CharClasses.cpp
@@ -141,7 +141,7 @@ bool IsCharPrefix(wchar_t c)
 		case 0x648:
 		case 0x671 ... 0x673:
 		case 0x675 ... 0x677:
-		case 0x688 ... 0x699:
+//		case 0x688 ... 0x699:
 		case 0x6c0:
 		case 0x6c3 ... 0x6cb:
 		case 0x6cd:

@unxed
Copy link
Contributor Author

unxed commented Mar 23, 2024

#2097 fixes #2084, #2095 and #2096 completely. No need to touch TTYOutput.cpp actually.

@unxed
Copy link
Contributor Author

unxed commented Mar 23, 2024

А, не, не всё так идеально. Часть артефактов ушла, но часть осталась.

Причем на mint не воспроизводится, а на ubuntu воспроизводится. Есть версия, что такое происходит, когда версия юникода, под которую собрана ICU, которой генерится CharClasses.cpp, отличается от версии, с которой терминал собран, например.

Если сделать вообще вот так:

bool IsCharFullWidth(wchar_t c)
{
	return (wcwidth(c) == 2);
}

bool IsCharPrefix(wchar_t c)
{
	return (wcwidth(c) == 0);
}

bool IsCharSuffix(wchar_t c)
{
	return (wcwidth(c) == 0);
}

, то артефактов становится ещё меньше, но всё равно часть остаётся. Проверять можно на бинарнике far2l, например: в районе 58-60% проблемы и 62-64%.

@unxed
Copy link
Contributor Author

unxed commented Mar 27, 2024

Состояние на сейчас: бипы ушли. Артефакты, увы, остались. Видны справа на бинарнике far2l при включённом wrap'е (с самого начала файла, и вниз-вверх несколько экранов пролистать), но тут влиять размер окна может, надо пробовать на разных.

#2107 проблему решает, хотя я и не уверен, что идеалогически правильно использовать wcwidth() для дополнительной проверки. Это просто то, что было найдено эмпирически, и в паре с обновлением CharClasses.cpp на последней Убунте сработало на практике.

@unxed
Copy link
Contributor Author

unxed commented Apr 7, 2024

На самом деле там артифакты даже с этим фиксом есть, просто меньше гораздо. Эту тему ещё поизучать предстоит более пристально. Хотел инструментов для тестирования дождаться спрева, прежде чем трогать, чтоб не сломать что-нибудь случайно.

@unxed
Copy link
Contributor Author

unxed commented Apr 8, 2024

По оставшемуся — вот:
#2132

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants