Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the issue with Windows code points from Supplementary Planes #392

Closed
wants to merge 2 commits into from

Conversation

Konstantin-Glukhov
Copy link
Contributor

This pull request is to address the issue #391 "Windows code points from Supplementary Planes (U+010000 to U+10FFFF) are not accepted from keyboard"

Please provide feedback.

screen.c Outdated
ip.Event.KeyEvent.wVirtualScanCode == 0 ||
ip.Event.KeyEvent.dwControlKeyState & (RIGHT_ALT_PRESSED|LEFT_CTRL_PRESSED)
)
) ||
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the dwControlKeyState logic changed? I don't really know what the whole purpose of this RIGHT_ALT/LEFT_CTRL thing is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not touch the logic, the only thing I did is moving it into the loop and grouping by dwControlKeyState in one "if" statement to break out of the loop if dwControlKeyState is set.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the test was for both RIGHT_ALT_PRESSED and LEFT_CTRL_PRESSED to be set. After your change it only requires either one to be set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apology, I misunderstood the question. I thought it was about the other part with dwControlKeyState.
I think I just optimized the code here, but please point me where I am wrong. Here is my thinking:
A & (B | C) == (B | C) is equivalent to A & (B | C) == TRUE
To illustrate, let's say A = 1111, B = 0001 and C = 0010
1111 & (0001 | 0010) == 0011
1111 & 0011 == 0011
0011 == 0011
0011 == TRUE

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose A=0001, B=0001, C=0010.
Then A&(B|C) == (B|C)
is (0001 & 0011) == 0011
is 0001 == 0011
which is false.

But A&(B|C) == TRUE
is (0001 & 0011) == TRUE
is 0001 == TRUE
which is true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it.

screen.c Outdated
) continue;

// TODO: Document what the following block does
if (ip.Event.KeyEvent.dwControlKeyState) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this conditional added here? That is, why are we doing this stuff only when dwControlKeyState is set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grouping of all checks is to break the loop (actually return to the caller) if dwControlKeyState is set. This logic was just moved from the bottom of the function into the loop.

screen.c Outdated
// UTF-16 character
} else {
char16_t utf16[UTF16_MAX_LENGTH];
int utf16_nwords;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The utf16 buffer is used when the high surrogate is received and then again when the low surrogate is received, and this code expects it, and utf16_nwords, to be unchanged when the low is received. But these are locals and are reinstantiated between the two calls. I think these two variables should be moved to currentKey, with the rest of the stuff that persists while receiving characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utf16 buffer is local because it is used only in one invocation of the function. The reason everything was moved to the loop is to achieve exactly that. The utf16 buffer can be filled only in one call to the function. utf16_nwords can have only two values: 1 (non-surrogate) or 2 (surrogate). This local scope isolate handling non-ascii characters.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utf16 is defined in the scope between lines 2921 and 2947. But on line 2926 it sets utf16[0] to the high surrogate and immediately jumps to 2824, which destructs utf16 since 2824 is outside the 2921-2947 scope. Then when the low surrogate is received, it instantiates a new utf16 array at 2922, and stores the low surrogate in utf16[1] on 2930. At that point the code seems to expect utf16[0] to still be populated with the high surrogate. Am I reading something wrong?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utf16 could be local, but it has to be defined before line 2824 so that it remains in existence throughout the whole loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was the case the code wouldn't work. Out of scope in C does not destroy anything, this is not a function call where stack is destroyed on return, but I have no problem moving it higher.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler is not required to destroy an object when it goes out of scope, but it is allowed to do so. See 6.2.4 paragraph 6 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thank you for digging this.

screen.c Show resolved Hide resolved
@Konstantin-Glukhov
Copy link
Contributor Author

Let me try again from scratch. I will take into consideration all the comments and leave the prior logic untouched except moving everything into the loop.

…ementary Planes (U+010000 to U+10FFFF) are not accepted from keyboard
@avih
Copy link
Contributor

avih commented Jun 28, 2023

Obligatory question: did you test that it works in the windows console (not windows terminal)?

Sample tese case:

  • Create a UTF8 text file which contains some text which includes this string: Ԁ߀ಀ😀 (U+0500 U+07C0 U+0C80 U+1F600)
  • Open this file in less with this PR patch, press / to open the search prompt, paste that string and press enter.

Expected result: that string is found and highlighted at text.

As for the code, note that keyCount is not updated when returning UTF8 bytes.

This means that after all the UTF8 bytes are returned, keyCount will stay whatever value it was assigned in win32_kbhit from ip.Event.KeyEvent.wRepeatCount.

This will have several unwanted results:

  1. Next time win32_kbhit is called, it will NOT read a new console input, because keyCount is not 0.
  2. Next time W32getch is called, it will return whatever was at currKey.ascii before the UTF8 was returned - keyCount times. This could also be 0, at which case whoever called W32getch will think that the next value is a scan code.
  3. keyCount is not be respected, for instance if some (non-surrogate pair) unicode value had a count of 2 then W32getch will not return the UTF8 bytes twice.
  4. Note that with a surrogate pair, keyCount of the 1st half is completely ignored. Not sure what the solution is, but I don't know if it should include ignoring the repeat count in some cases.

That's just one thing which caught my eye. There could be other issues too.

@avih
Copy link
Contributor

avih commented Jun 28, 2023

The real problem here is that the keyboard handling is highly delicate code, with multiple states and buffers, each with its own buffer variables and state variables:

  • Normal ASCII-7 inputs code wants to return one byte. It's stored at currKey.ascii.
  • Scan code inputs code wants to return two bytes: 0 and the scan code. It's stored as currKey.ascii == 0 && curKey.scan == VALUE.
  • The x11 mouse code wants to return 6 values. The first one (ESC) is stored at currKey.ascii, and the other five at x11mousebuf with their state at x11mousePos and x11mouseCount.
  • The unicode code (which doesn't include ASCII-7...) wants to return 2-4 bytes. The bytes are stored at an array, and there are two state variables (pos and length), which were static at W32getch before this PR, and this PR wants to move them to currKey.

Additionally, there's keyCount which we do want to respect.

Clearly, all of those systems want to do a similar thing:

  • Enqueue some sequence of bytes, with length of 1-6, and with some repeat count.
  • Ensure that win32_kbhit returns true if something was buffered.
  • Ensure that W32getch returns this sequence key-count times.

But the current implementation is so cumbersome, and touching it in any way is highly likely to introduce state bugs, because the state is so complex.

It wasn't designed that way. Initially it was simple, but over time, it was enhanced further and further, and it reached its current state.

I really do think that before it becomes yet even more complex, all those buffers (including ASCII-7 which can be considered a buffer of 1) should be unified.

And then, once it's possible to look at the code and grasp what it does, then it can be extended further.

Otherwise, these incremental increases in complexity are bound to introduce delicate state bugs, even if they're not identified immediately.

@avih
Copy link
Contributor

avih commented Jun 28, 2023

I really do think that before it becomes yet even more complex, all those buffers (including ASCII-7 which can be considered a buffer of 1) should be unified.

I'm pretty confident I can do this cleanly. I already had one go at unifying the buffers, but the approach was a bit lacking WRT key count (it was a single bytes buffer and that's it). This can be better (6 bytes buffer and key-count).

I'm willing to spend the time and do that if I get a green light that the maintainers are interested.

@Konstantin-Glukhov
Copy link
Contributor Author

I took a shoot at unifying all console events into one buffer.
It requires rolling back eebf1b5 2023-06-17 Windows: fix iread (#384) for os.c
Please take a look.

@avih
Copy link
Contributor

avih commented Jun 30, 2023

It requires rolling back eebf1b5

That commit is not a goal. Handling the issue which it fixes is the goal. That commit was already replaced in git master with something else which fixes the same issue.

If you roll this back without providing an alternative solution to the original problem, then you're probably re-introducing the bug which it tried to fix.

Please take a look.

Please first test that it works in all the cases which your patch touches.

Also, don't combine the surrogate pairs patch with the buffer unification patch. They're two diferent things, which deserve at the very least two different commits.

@Konstantin-Glukhov
Copy link
Contributor Author

@avih, this is not a final commit. As you can notice at the bottom of this page there is an exclamation message "This pull request is still a work in progress". I did not say that the rollback has to stay. I am still working on the code and looking for constructive feedback instead of parental lecturing.

@Konstantin-Glukhov
Copy link
Contributor Author

I think I found the way for the unified buffer to work with holding PgDn key. Please take a look.
I also want to test editchar() function since it depends on win32_kbhit(). Do you know how to trigger its invocation?

@gwsw
Copy link
Owner

gwsw commented Sep 1, 2023

This has been fixed as of 010d808.

@gwsw gwsw closed this Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants