-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the issue with Windows code points from Supplementary Planes #392
Conversation
screen.c
Outdated
ip.Event.KeyEvent.wVirtualScanCode == 0 || | ||
ip.Event.KeyEvent.dwControlKeyState & (RIGHT_ALT_PRESSED|LEFT_CTRL_PRESSED) | ||
) | ||
) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the dwControlKeyState logic changed? I don't really know what the whole purpose of this RIGHT_ALT/LEFT_CTRL thing is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not touch the logic, the only thing I did is moving it into the loop and grouping by dwControlKeyState in one "if" statement to break out of the loop if dwControlKeyState is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the test was for both RIGHT_ALT_PRESSED and LEFT_CTRL_PRESSED to be set. After your change it only requires either one to be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apology, I misunderstood the question. I thought it was about the other part with dwControlKeyState.
I think I just optimized the code here, but please point me where I am wrong. Here is my thinking:
A & (B | C) == (B | C) is equivalent to A & (B | C) == TRUE
To illustrate, let's say A = 1111, B = 0001 and C = 0010
1111 & (0001 | 0010) == 0011
1111 & 0011 == 0011
0011 == 0011
0011 == TRUE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose A=0001, B=0001, C=0010.
Then A&(B|C) == (B|C)
is (0001 & 0011) == 0011
is 0001 == 0011
which is false.
But A&(B|C) == TRUE
is (0001 & 0011) == TRUE
is 0001 == TRUE
which is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it.
screen.c
Outdated
) continue; | ||
|
||
// TODO: Document what the following block does | ||
if (ip.Event.KeyEvent.dwControlKeyState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this conditional added here? That is, why are we doing this stuff only when dwControlKeyState is set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grouping of all checks is to break the loop (actually return to the caller) if dwControlKeyState is set. This logic was just moved from the bottom of the function into the loop.
screen.c
Outdated
// UTF-16 character | ||
} else { | ||
char16_t utf16[UTF16_MAX_LENGTH]; | ||
int utf16_nwords; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The utf16 buffer is used when the high surrogate is received and then again when the low surrogate is received, and this code expects it, and utf16_nwords, to be unchanged when the low is received. But these are locals and are reinstantiated between the two calls. I think these two variables should be moved to currentKey, with the rest of the stuff that persists while receiving characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utf16 buffer is local because it is used only in one invocation of the function. The reason everything was moved to the loop is to achieve exactly that. The utf16 buffer can be filled only in one call to the function. utf16_nwords can have only two values: 1 (non-surrogate) or 2 (surrogate). This local scope isolate handling non-ascii characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utf16 is defined in the scope between lines 2921 and 2947. But on line 2926 it sets utf16[0] to the high surrogate and immediately jumps to 2824, which destructs utf16 since 2824 is outside the 2921-2947 scope. Then when the low surrogate is received, it instantiates a new utf16 array at 2922, and stores the low surrogate in utf16[1] on 2930. At that point the code seems to expect utf16[0] to still be populated with the high surrogate. Am I reading something wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utf16 could be local, but it has to be defined before line 2824 so that it remains in existence throughout the whole loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was the case the code wouldn't work. Out of scope in C does not destroy anything, this is not a function call where stack is destroyed on return, but I have no problem moving it higher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler is not required to destroy an object when it goes out of scope, but it is allowed to do so. See 6.2.4 paragraph 6 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thank you for digging this.
7929586
to
a95ee63
Compare
Let me try again from scratch. I will take into consideration all the comments and leave the prior logic untouched except moving everything into the loop. |
a95ee63
to
35eb999
Compare
…ementary Planes (U+010000 to U+10FFFF) are not accepted from keyboard
35eb999
to
60eb2a9
Compare
Obligatory question: did you test that it works in the windows console (not windows terminal)? Sample tese case:
Expected result: that string is found and highlighted at text. As for the code, note that This means that after all the UTF8 bytes are returned, This will have several unwanted results:
That's just one thing which caught my eye. There could be other issues too. |
The real problem here is that the keyboard handling is highly delicate code, with multiple states and buffers, each with its own buffer variables and state variables:
Additionally, there's Clearly, all of those systems want to do a similar thing:
But the current implementation is so cumbersome, and touching it in any way is highly likely to introduce state bugs, because the state is so complex. It wasn't designed that way. Initially it was simple, but over time, it was enhanced further and further, and it reached its current state. I really do think that before it becomes yet even more complex, all those buffers (including ASCII-7 which can be considered a buffer of 1) should be unified. And then, once it's possible to look at the code and grasp what it does, then it can be extended further. Otherwise, these incremental increases in complexity are bound to introduce delicate state bugs, even if they're not identified immediately. |
I'm pretty confident I can do this cleanly. I already had one go at unifying the buffers, but the approach was a bit lacking WRT key count (it was a single bytes buffer and that's it). This can be better (6 bytes buffer and key-count). I'm willing to spend the time and do that if I get a green light that the maintainers are interested. |
That commit is not a goal. Handling the issue which it fixes is the goal. That commit was already replaced in git master with something else which fixes the same issue. If you roll this back without providing an alternative solution to the original problem, then you're probably re-introducing the bug which it tried to fix.
Please first test that it works in all the cases which your patch touches. Also, don't combine the surrogate pairs patch with the buffer unification patch. They're two diferent things, which deserve at the very least two different commits. |
@avih, this is not a final commit. As you can notice at the bottom of this page there is an exclamation message "This pull request is still a work in progress". I did not say that the rollback has to stay. I am still working on the code and looking for constructive feedback instead of parental lecturing. |
fd4da73
to
1aebd55
Compare
I think I found the way for the unified buffer to work with holding PgDn key. Please take a look. |
1aebd55
to
16023b4
Compare
This has been fixed as of 010d808. |
This pull request is to address the issue #391 "Windows code points from Supplementary Planes (U+010000 to U+10FFFF) are not accepted from keyboard"
Please provide feedback.