Fix the issue with Windows code points from Supplementary Planes #392

Konstantin-Glukhov · 2023-06-25T06:28:14Z

This pull request is to address the issue #391 "Windows code points from Supplementary Planes (U+010000 to U+10FFFF) are not accepted from keyboard"

Please provide feedback.

gwsw · 2023-06-26T21:32:27Z

screen.c

+				ip.Event.KeyEvent.wVirtualScanCode == 0 ||
+				ip.Event.KeyEvent.dwControlKeyState & (RIGHT_ALT_PRESSED|LEFT_CTRL_PRESSED)
+				)
+			) ||


Why was the dwControlKeyState logic changed? I don't really know what the whole purpose of this RIGHT_ALT/LEFT_CTRL thing is.

I did not touch the logic, the only thing I did is moving it into the loop and grouping by dwControlKeyState in one "if" statement to break out of the loop if dwControlKeyState is set.

Previously the test was for both RIGHT_ALT_PRESSED and LEFT_CTRL_PRESSED to be set. After your change it only requires either one to be set.

Apology, I misunderstood the question. I thought it was about the other part with dwControlKeyState.
I think I just optimized the code here, but please point me where I am wrong. Here is my thinking:
A & (B | C) == (B | C) is equivalent to A & (B | C) == TRUE
To illustrate, let's say A = 1111, B = 0001 and C = 0010
1111 & (0001 | 0010) == 0011
1111 & 0011 == 0011
0011 == 0011
0011 == TRUE

Suppose A=0001, B=0001, C=0010.
Then A&(B|C) == (B|C)
is (0001 & 0011) == 0011
is 0001 == 0011
which is false.

But A&(B|C) == TRUE
is (0001 & 0011) == TRUE
is 0001 == TRUE
which is true.

I will fix it.

gwsw · 2023-06-26T21:33:29Z

screen.c

+		) continue;
+
+		// TODO: Document what the following block does
+		if (ip.Event.KeyEvent.dwControlKeyState) {


Why was this conditional added here? That is, why are we doing this stuff only when dwControlKeyState is set?

The grouping of all checks is to break the loop (actually return to the caller) if dwControlKeyState is set. This logic was just moved from the bottom of the function into the loop.

gwsw · 2023-06-26T21:36:53Z

screen.c

+			// UTF-16 character
+			} else {
+				char16_t utf16[UTF16_MAX_LENGTH];
+				int utf16_nwords;


The utf16 buffer is used when the high surrogate is received and then again when the low surrogate is received, and this code expects it, and utf16_nwords, to be unchanged when the low is received. But these are locals and are reinstantiated between the two calls. I think these two variables should be moved to currentKey, with the rest of the stuff that persists while receiving characters.

utf16 buffer is local because it is used only in one invocation of the function. The reason everything was moved to the loop is to achieve exactly that. The utf16 buffer can be filled only in one call to the function. utf16_nwords can have only two values: 1 (non-surrogate) or 2 (surrogate). This local scope isolate handling non-ascii characters.

utf16 is defined in the scope between lines 2921 and 2947. But on line 2926 it sets utf16[0] to the high surrogate and immediately jumps to 2824, which destructs utf16 since 2824 is outside the 2921-2947 scope. Then when the low surrogate is received, it instantiates a new utf16 array at 2922, and stores the low surrogate in utf16[1] on 2930. At that point the code seems to expect utf16[0] to still be populated with the high surrogate. Am I reading something wrong?

utf16 could be local, but it has to be defined before line 2824 so that it remains in existence throughout the whole loop.

If it was the case the code wouldn't work. Out of scope in C does not destroy anything, this is not a function call where stack is destroyed on return, but I have no problem moving it higher.

The compiler is not required to destroy an object when it goes out of scope, but it is allowed to do so. See 6.2.4 paragraph 6 here.

Got it. Thank you for digging this.

screen.c

Konstantin-Glukhov · 2023-06-27T22:20:55Z

Let me try again from scratch. I will take into consideration all the comments and leave the prior logic untouched except moving everything into the loop.

…ementary Planes (U+010000 to U+10FFFF) are not accepted from keyboard

avih · 2023-06-28T06:24:26Z

Obligatory question: did you test that it works in the windows console (not windows terminal)?

Sample tese case:

Create a UTF8 text file which contains some text which includes this string: Ԁ߀ಀ😀 (U+0500 U+07C0 U+0C80 U+1F600)
Open this file in less with this PR patch, press / to open the search prompt, paste that string and press enter.

Expected result: that string is found and highlighted at text.

As for the code, note that keyCount is not updated when returning UTF8 bytes.

This means that after all the UTF8 bytes are returned, keyCount will stay whatever value it was assigned in win32_kbhit from ip.Event.KeyEvent.wRepeatCount.

This will have several unwanted results:

Next time win32_kbhit is called, it will NOT read a new console input, because keyCount is not 0.
Next time W32getch is called, it will return whatever was at currKey.ascii before the UTF8 was returned - keyCount times. This could also be 0, at which case whoever called W32getch will think that the next value is a scan code.
keyCount is not be respected, for instance if some (non-surrogate pair) unicode value had a count of 2 then W32getch will not return the UTF8 bytes twice.
Note that with a surrogate pair, keyCount of the 1st half is completely ignored. Not sure what the solution is, but I don't know if it should include ignoring the repeat count in some cases.

That's just one thing which caught my eye. There could be other issues too.

avih · 2023-06-28T07:01:28Z

The real problem here is that the keyboard handling is highly delicate code, with multiple states and buffers, each with its own buffer variables and state variables:

Normal ASCII-7 inputs code wants to return one byte. It's stored at currKey.ascii.
Scan code inputs code wants to return two bytes: 0 and the scan code. It's stored as currKey.ascii == 0 && curKey.scan == VALUE.
The x11 mouse code wants to return 6 values. The first one (ESC) is stored at currKey.ascii, and the other five at x11mousebuf with their state at x11mousePos and x11mouseCount.
The unicode code (which doesn't include ASCII-7...) wants to return 2-4 bytes. The bytes are stored at an array, and there are two state variables (pos and length), which were static at W32getch before this PR, and this PR wants to move them to currKey.

Additionally, there's keyCount which we do want to respect.

Clearly, all of those systems want to do a similar thing:

Enqueue some sequence of bytes, with length of 1-6, and with some repeat count.
Ensure that win32_kbhit returns true if something was buffered.
Ensure that W32getch returns this sequence key-count times.

But the current implementation is so cumbersome, and touching it in any way is highly likely to introduce state bugs, because the state is so complex.

It wasn't designed that way. Initially it was simple, but over time, it was enhanced further and further, and it reached its current state.

I really do think that before it becomes yet even more complex, all those buffers (including ASCII-7 which can be considered a buffer of 1) should be unified.

And then, once it's possible to look at the code and grasp what it does, then it can be extended further.

Otherwise, these incremental increases in complexity are bound to introduce delicate state bugs, even if they're not identified immediately.

avih · 2023-06-28T07:09:32Z

I really do think that before it becomes yet even more complex, all those buffers (including ASCII-7 which can be considered a buffer of 1) should be unified.

I'm pretty confident I can do this cleanly. I already had one go at unifying the buffers, but the approach was a bit lacking WRT key count (it was a single bytes buffer and that's it). This can be better (6 bytes buffer and key-count).

I'm willing to spend the time and do that if I get a green light that the maintainers are interested.

Konstantin-Glukhov · 2023-06-29T18:03:21Z

I took a shoot at unifying all console events into one buffer.
It requires rolling back eebf1b5 2023-06-17 Windows: fix iread (#384) for os.c
Please take a look.

avih · 2023-06-30T03:07:42Z

It requires rolling back eebf1b5

That commit is not a goal. Handling the issue which it fixes is the goal. That commit was already replaced in git master with something else which fixes the same issue.

If you roll this back without providing an alternative solution to the original problem, then you're probably re-introducing the bug which it tried to fix.

Please take a look.

Please first test that it works in all the cases which your patch touches.

Also, don't combine the surrogate pairs patch with the buffer unification patch. They're two diferent things, which deserve at the very least two different commits.

Konstantin-Glukhov · 2023-06-30T04:15:20Z

@avih, this is not a final commit. As you can notice at the bottom of this page there is an exclamation message "This pull request is still a work in progress". I did not say that the rollback has to stay. I am still working on the code and looking for constructive feedback instead of parental lecturing.

Konstantin-Glukhov · 2023-06-30T22:14:37Z

I think I found the way for the unified buffer to work with holding PgDn key. Please take a look.
I also want to test editchar() function since it depends on win32_kbhit(). Do you know how to trigger its invocation?

gwsw · 2023-09-01T15:26:19Z

This has been fixed as of 010d808.

gwsw reviewed Jun 26, 2023

View reviewed changes

screen.c Show resolved Hide resolved

Konstantin-Glukhov force-pushed the surrogate branch from 7929586 to a95ee63 Compare June 27, 2023 01:56

Konstantin-Glukhov force-pushed the surrogate branch from a95ee63 to 35eb999 Compare June 27, 2023 22:29

Updated screen.c to fix the issue with Windows code points from Suppl…

60eb2a9

…ementary Planes (U+010000 to U+10FFFF) are not accepted from keyboard

Konstantin-Glukhov force-pushed the surrogate branch from 35eb999 to 60eb2a9 Compare June 27, 2023 22:55

Konstantin-Glukhov force-pushed the surrogate branch from fd4da73 to 1aebd55 Compare June 30, 2023 22:09

Unify MS Windows console events into one buffer

16023b4

Konstantin-Glukhov force-pushed the surrogate branch from 1aebd55 to 16023b4 Compare July 2, 2023 01:48

Konstantin-Glukhov marked this pull request as ready for review August 15, 2023 05:42

gwsw closed this Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the issue with Windows code points from Supplementary Planes #392

Fix the issue with Windows code points from Supplementary Planes #392

Konstantin-Glukhov commented Jun 25, 2023

gwsw Jun 26, 2023

Konstantin-Glukhov Jun 27, 2023

gwsw Jun 27, 2023

Konstantin-Glukhov Jun 27, 2023 •

edited

gwsw Jun 27, 2023

Konstantin-Glukhov Jun 27, 2023

gwsw Jun 26, 2023

Konstantin-Glukhov Jun 27, 2023

gwsw Jun 26, 2023

Konstantin-Glukhov Jun 27, 2023

gwsw Jun 27, 2023

gwsw Jun 27, 2023

Konstantin-Glukhov Jun 27, 2023

gwsw Jun 27, 2023

Konstantin-Glukhov Jun 27, 2023

Konstantin-Glukhov commented Jun 27, 2023

avih commented Jun 28, 2023

avih commented Jun 28, 2023 •

edited

avih commented Jun 28, 2023

Konstantin-Glukhov commented Jun 29, 2023

avih commented Jun 30, 2023 •

edited

Konstantin-Glukhov commented Jun 30, 2023

Konstantin-Glukhov commented Jun 30, 2023

gwsw commented Sep 1, 2023

Fix the issue with Windows code points from Supplementary Planes #392

Fix the issue with Windows code points from Supplementary Planes #392

Conversation

Konstantin-Glukhov commented Jun 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Konstantin-Glukhov Jun 27, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Konstantin-Glukhov commented Jun 27, 2023

avih commented Jun 28, 2023

avih commented Jun 28, 2023 • edited

avih commented Jun 28, 2023

Konstantin-Glukhov commented Jun 29, 2023

avih commented Jun 30, 2023 • edited

Konstantin-Glukhov commented Jun 30, 2023

Konstantin-Glukhov commented Jun 30, 2023

gwsw commented Sep 1, 2023

Konstantin-Glukhov Jun 27, 2023 •

edited

avih commented Jun 28, 2023 •

edited

avih commented Jun 30, 2023 •

edited