-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid bit masking in KeyCode - KeyCode.CharMask is missing 65536 values #3287
Comments
I have other problems with that enum that probably won't matter until a future version of unicode, but I'm ignoring those for now because this is fine for V2 and, so long as masks are defined correctly, shifting things left by 1 shouldn't be terribly difficult as a break-fix, if that happens before some other solution is implemented or Microsoft switches to UTF-8 and back-ports it to all windows versions (hey - we all have dreams, right?). Although...... That being said, .net8 is UTF-8 by default for console encoding. So... Has anyone tried treating stuff as Utf8 natively, yet? You have to work with |
I think this belongs in this issue more than #3275 but it's still related to both. So, here's what I did on this end of things and what the result is: As with any time I've touched multi-type files, types in CursesDriver and ConsoleDriver got split out to their own files. That's of course a non-change to the program. But I ended up looking at CursesDriver.ProcessInput (I don't remember the original reason, but it doesn't matter because the problem is fixed). There were lots of points where cleanup was possible, but there was also a ton of copying going on. It looks like the HandleEscSeqResponse method had well-intentioned thoughts on performance applied, given everything was by ref for it (Kudos, whoever did that in the first place). Unfortunately, that's not how the compiler actually compiled that code, and there were still plenty of places where value types were being copied just for temporary use and then discarded. So, I created a A ref struct is a stack-only type. Neither an instance of a ref struct or any of its properties or fields can ever escape its ref-safe context, which basically means it can never outlive the original referent. This is how Anyway... This allows the method and the methods it calls to operate on the same references to stack memory, resulting in a serious speedup and reduction in allocations. A ref struct can never exist on the heap, so it's a guarantee that there are no heap allocations if you're working with a ref struct. They can't even be reference BY things on the heap (which is why they cannot be elements of an array, for example). So... I turned the 5 values that ProcessInput handles and passes to that HandleEscSeqResponse method into this ref struct: /// <summary>
/// A private byRefLike struct to simplify calls and keep everything really on the stack.
/// </summary>
/// <remarks>
/// This is a ref struct. It cannot escape the stack and the compiler will enforce that if you try.
/// </remarks>
private ref struct EscapeSequenceData
{
public ref int code;
public ref KeyCode k;
public ref int wch;
public Key keyEventArgs;
public ConsoleKeyInfo [] cki;
} It has two reference types in it, which is why those two aren't also Then I refactored the HandleEscSeqResponse into a method that takes one ref to that type and returns the same ref at the end, doing all its work in-place on that struct. Just to show progression, I also kept the same variable names, initially, in the ProcessInput method, but made them Once that was done and verified passing, I removed those now-redundant references, one by one, until all that was left was the struct and one strange place there's a separate int (wch2) defined that I didn't want to mess with, in the moment, so it was left alone. I'm sure it can probably get nuked too, but whatev. Then I did some more cleanup in the ProcessInput, mostly around using switches instead of large if/else constructs. The end result is that, vs the code before those changes, for the tests covering them, there has been somewhere between a 10-20x speedup and a reduction in heap allocations and copies and such during those methods significant enough to be visible in the profiler, even though they're all small and seemingly innocent values. This basically means the compiler must have been generating wrapper classes for the old handler that took 5 refs. With the ref struct and the arrays being collection expressions, the compiler can very aggressively optimize that, now. Now... Not to oversell that speedup, though... On the test machine, we're talking those tests taking 0.004 seconds instead of 0.07 seconds, for example. It's within the realm of human perception, especially for a fast typist, but it isn't like this made the whole program noticeably faster beyond a few dozen milliseconds of input lag. But hey, every bit helps, right? I have plenty more to do in there before I'm actually finished with it and go back to the TextFormatter class. I've already reduced the number of warnings, suggestions, and tips from the analyzer for CursesDriver from an almost full scroll bar (when I started) to this, at this point: After I finish with this, though, I think I'll put in a PR at a working commit so others can rebase and whatnot, if it isn't too difficult to separate (it might be). |
Re That code is fragile, not well tested, and hard to understand. We need a robust ANSI esc sequence parsing engine... |
Oh, as I did all the driver refactoring work in the past 6 months, CursesDriver has been the most neglected. Note that it still does not support TrueColor. |
Ha yeah there's a lot of original code in there. It's some of the most obviously different. And the PInvoke mechanism with pre-caching the delegates and whatnot did make sense back then, but is now just needless complexity. I'm finding it hard not to just go ahead and address whatever I run into in there. 😅 |
Honestly, some of the old generated code in there makes some sense, and the intent behind the KeyCode enum also makes some sense (more than the old stuff). Obviously, the class full of constants is a C-ism that the code generator that was used didn't understand makes more sense in C# as an enum. But oddly enough there's value both ways, so I'm torn on that one. But, with the way the compiler is treating the uint, it's actually treating literals of the enum as values to be cast at run-time, not as constants, and that's not something I expected to see. I think the best solution lies somewhere in between, honestly. The way we use that enum all over the place, we need to be able to treat it as a numeric value. But uint enums can't do that without a run-time cast (oddly unless they are subjected to an arithmetic operation, in which case they ARE implicitly numeric). I'm wondering if it's a caveat of uint (seems likely), so I'm gonna run to the docs to check the spec on that. I know uint and ulong aren't CLS compliant, at least, so I wouldn't be surprised at all if they're less capable than their signed brethren. If that turns out to be the case, we might be able to just turn it into a long, for now, and isolate both 32-bit halves for each purpose. But... I think the best ultimate solution is in a well-defined struct that keeps things as more primitive structures and exposes an enum (with implicit cast operators) for seamless use both as a numeric value (also using cast operators) and as an enumerated value, for the convenience it gives for things like configuration with the built-in behaviors of enums. Now... Since it's all just 32-bit binary values, in the current code, I'm using |
Hell, it may even be advantageous to store two 32-bit values inside a Vector64 or a wider type, to enable some wicked fast operations over sequences of values (which would end up getting SSE-ified by the compiler at that point). Not to mention we have direct access to all those wide instructions via those classes, which are great for a lot of that bitwise math. However the data is stored, we can expose it in any way we like, for the public API. It's probably a good idea for us to abstract away the implementation a little bit, anyway, to protect against breakage around version changes. Actually... Thinking about it a little more... That's actually very likely a case where the strategy of doing multiple possible cases and just selecting the right one is significantly faster than anything else. What I mean is that, for example, if there's a method like the one I'm working on right now that has a bunch of conditionals based on the same or slightly different values, which then do more math based on that decision, you load the values and constants into a Vector128 or something and then apply all of those operations at once at the start. Then the conditional is just a compare and jump, and the value is already done, all in fewer clock cycles, even though the processor threw away 3/4 of the values it computed. |
Damn. The compiler behavior around those enums is exactly as the first line describing enum conversions states (emphasis mine):
That seriously sucks! :( That even makes things like that Curses class that has the switch to convert from one enum to another even more expensive than it seems. Extremely minor, of course, in the grand scheme of things, but I'm just wondering how I didn't know this (or how I forgot it more likely lol) in the 23+ years I've been using C#. 😆 I suppose I do tend to avoid large enums in favor of formal structs or classes, because of the lack of flexibility in enums such as not being able to write operators. Extension methods are a pity prize. I want operators, damn it! |
On the topic of the Curses interop.... That old code was from the mono days. I bet we can do a clean implementation in less work than it would take to untangle that thing. The majority of that class is PInvokes and some code that is honestly partially redundant to other, newer, better code we have elsewhere. Maybe I should shift my strategy in that direction, rather than trying too hard to fix this thing up? |
Side note: This is one of several open issues and some unreported issues that #3443 is partially aimed at helping to resolve. |
So, while working on the hotkey stuff, Key and KeyCode became very relevant.
And simplified logic that follows the rules for unicode resulted in a small number of test failures, yet the binary math was saying the tests are wrong - not the code.
So I took a look at the values.
KeyCode.CharMask is defined as 0x000F_FFFF, which is 20 bits.
KeyCode.MaxCodePoint is 0x0010_FFFF, which is correct, per the unicode spec, but that's 21 bits.
KeyCode.CharMask, however, is not the correct mask for that. It drops the high bit, resulting in 65536 missing values, because it ignores a range of 16 bits the way it is written.
Here's the binary explanation:
So it's an off-by-one error, but it's off by 1 in the 17th position.
As a result, anything that uses CharMask or SpecialMask will lose (or steal, for SpecialMask) the highest-order bit of the character value, if that bit is set.
That's a data corruption bug and treating the binary values correctly leads to 2 KeyTests test cases failing, and also to almost 200 other tests in other fixtures failing, because of either test case values or code that depend on the broken values. I have not yet begun that investigation, but I imagine it's a little of both and that a fairly small number of changes in a small number of places will probably fix them.
Fixing only the values for CharMask and SpecialMask to be
CharMask = 0x1F_FFFF
andSpecialMask = 0xFFE0_0000
(so, swapping the 21st bit to CharMask, where it belongs, and dropping it from SpecialMask) resolves the KeyTest failures and quite a few of the other failures, leaving a few dozen more to track down.Now I just need to diagnose the root cause of the other test failures. Most seem to be around the F keys, which seem to be a fair amount of what's often used for non-printable keyboard input test cases.
The fixes for this are important to the work I'm already doing in TextFormatter, so they're just going in that branch for now.
The text was updated successfully, but these errors were encountered: