Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows: unexpected abort on fast paging with color content #378

Closed
avih opened this issue Jun 7, 2023 · 10 comments · Fixed by #384
Closed

windows: unexpected abort on fast paging with color content #378

avih opened this issue Jun 7, 2023 · 10 comments · Fixed by #384

Comments

@avih
Copy link
Contributor

avih commented Jun 7, 2023

Tested less versions: latest official windows build (v633), but also my own builds since v570 (specifically, since 26e91bd "Allow ^X to exit F mode on Windows") upto and including v635.

Tested windows versions: Windows 10 and Windows 7.

Steps to reproduce:

  • Set Windows autorepeat to a high value (if win10 - via the classic control panel). I set it to the maximum, which is about 30 repeats/sec.
  • Maximize the console window (cmd.exe).
  • Invoke less -R file where file is a big file with color escape sequences.
  • Hold down page-down so that it auto-repeats and pages continuously.

Expected result:
It pages till the end of the file.

Actual result:
less aborts sooner rather than later. For me that's typically after less than 10 pages.

Here's a sample C file which was highlighted using GNU source-highlite, and reproduces the issue for me:
tccgen.c.zip

Additional info:

The offending code seems to be this at os.c, which was originally added at 26e91bd:

#if MSDOS_COMPILER==WIN32C
        if (win32_kbhit() && WIN32getch() == intr_char)
        {
                sigs |= S_INTERRUPT;
                reading = 0;
                return (READ_INTR);
        }
#endif

If I disable this code completely, e.g. by replacing WIN32C with WIN32C_XXXBUG, then the issue goes away.

When not disabled, and when it aborts, win32_kbhit() is true, and WIN32getch() returns 0, which apparently is not intr_char (which seem to be 24 by default), so the code block of sigs |= S_INTERRUPT; etc is NOT entered, and yet less aborts.

The issue is more likely to happen with a big console window.

I'm guessing that the cause is that less renders slower than requested (high PGDN input rate, big console, slow-to-render colors), and something gets messed up, but I didn't dig deep enough to figure out what.

Another option is that it needs ungetc() to revert the WIN32getch() call, but I couldn't figure out how (i.e. which function to use to unget it). But also, WIN32getch() returning 0 also seems a bit unexpected...

@avih avih changed the title windows: unexpected abort on repeated page-down with color content windows: unexpected abort on fast paging with color content Jun 7, 2023
@gwsw
Copy link
Owner

gwsw commented Jun 13, 2023

I would think that if the char is not intr_char, it should be ungotten by calling ungetcc_back() like check_poll() does in the non-MSDOS case. However it does seem strange that the char is 0. That could be the first byte of an extended key (pending_scancode) or it could be that the WideCharToMultiByte in WIN32getch returned 0. I'm not sure if ungetting the char would help in either of those cases. @adoxa Jason, any thoughts?

BTW, just to confirm, when you say that less "aborts", you mean that the scrolling stops, not that less terminates, correct?

@avih
Copy link
Contributor Author

avih commented Jun 13, 2023

BTW, just to confirm, when you say that less "aborts", you mean that the scrolling stops, not that less terminates, correct?

No. It exits to the shell, with exit code 0.

or it could be that the WideCharToMultiByte in WIN32getch returned 0

It could be, but it happens since v570 (specifically, since the ^X handler was added), while the windows unicode hander was added at v605.

Also, it would help if someone other than me can reproduce the issue too.

@avih
Copy link
Contributor Author

avih commented Jun 13, 2023

Here are some interesting data points:

output.c has this code at the very begining of win_flush():

	if (ctldisp != OPT_ONPLUS || (vt_enabled && sgr_mode))
		WIN32textout(obuf, ob - obuf);
	else

when running on windows 10 vt_enabled is true, and setting -Da enables sgr_mode.

Combined, on win10 with -Da it sends the content directly and lets the terminal handle the SGR sequences, which bypasses the whole SGR processing in less, making it presumably much quicker.

But the issue remains.

Going further, commenting out the WIN32textout(...) call disables the output completely (and indeed, less displays an empty screen).

But even with the output completely disabled, the issue still happens (holding down page-down aborts less after less than a second).

I'm guessing that the cause is that less renders slower than requested (high PGDN input rate, big console, slow-to-render colors), and something gets messed up, but I didn't dig deep enough to figure out what.

So I think this hypothesis doesn't hold up.

Interestingly, still with the output completely disabled, it can also happen (but harder to reproduce) with non-highlighted content, e.g. with less -R -Da screen.c (without LESSOPEN, LESSPIPE or LESS set) and holding page-down.

I still think there could be some timing issue, but I no longer thinks it's because the rendering is too slow.

it should be ungotten by calling ungetcc_back() ... I'm not sure if ungetting the char would help in either of those cases

Indeed, capturing the value returned by WIN32getch() (which is indeed 0) and then calling ungetcc_back(c) if it's not intr_char did not help.

I did check how many times this win32_kbhit() returns true (while temporarily disabling the WIN32getch() call), and it's about once per second when holding page-down with a repeat rate of about 30/sec.

I did manage to figure out why WIN32getch() return 0 when it aborts:

  • In win32_kbhit sees a KEY_EVENT, with UnicodeChar and AsciiChar of 0, and wVirtualScanCode of 81 (0121 which is PCK_PAGEDOWN at pckeys.h), thus win32_kbhit returns TRUE with only the scancode set to 81.
  • WIN32getch() sees that ascii and unicode are the same, and sets ascii = currentKey.ascii; - which is 0.
  • It then sets pending_scancode at the end of this loop:
 		pending_scancode = (ascii == 0x00);
	} while (pending_scancode &&
		(currentKey.scan == PCK_CAPS_LOCK || currentKey.scan == PCK_NUM_LOCK));
  • But because the scancode is neither PCK_CAPS_LOCK nor PCK_NUM_LOCK, the loop is aborted, and the next thing is to return ascii - which is 0.

Hopefully someone which understands the code better could figure out what the actual bug is and how to fix it.

@avih
Copy link
Contributor Author

avih commented Jun 13, 2023

I did check how many times this win32_kbhit() returns true (while temporarily disabling the WIN32getch() call), and it's about once per second when holding page-down with a repeat rate of about 30/sec.

My hunch is that there are two places which try to read a key, and they're interleaved and racy:

  • The main place (where is that?) which gets the bulk of the keys and actually applies the page-down press (and scrolls the content accordingly).
  • The place which checks intr_char.

The fact that win32_kbhit where intr_char is tested (but without calling WIN32getch()) is rarely TRUE while holding page-down suggests that something else is dequeuing the key presses.

Some comment suggests that when the scan code is read, it needs two reads, so my money is on a race where these two parts gets split between the two places which try to read a key, and one of them freaks out (probably the main place which reads the key - not the intr_char check).

@gwsw
Copy link
Owner

gwsw commented Jun 13, 2023

In win32_kbhit sees a KEY_EVENT, with UnicodeChar and AsciiChar of 0, and wVirtualScanCode of 81

81 is 'Q', which is a less command to quit. That is probably related to why it quits after the zero byte is consumed. I think the solution is indeed to unget the 0 byte, but in a way that WIN32getch will see it. ungetcc_back doesn't suffice to do that. I think a new unget mechanism may be needed here.

@avih
Copy link
Contributor Author

avih commented Jun 13, 2023

81 is 'Q'

Huh, 81 is Q in ASCII, but 81 is also page down at pckeys.h:

#define PCK_PAGEDOWN            '\121'

Right?

So I'm holding page down, but it ends up interpreted as Q?

How does it choose whether to interpret 81 as 81-the-ASCII-value-Q or 81-the-scan-code-page-down?

@gwsw
Copy link
Owner

gwsw commented Jun 13, 2023

See the comment at line 2949 in WIN32getch. Pressing PageDown will send the two byte sequence 00,81, while pressing Q just sends 81. If you press PageDown but the 00 gets consumed unexpectedly and only the 81 is read, it will get interpreted as a 'Q'. Actually I'm not sure I completelly understand the sequence of events; I think there might be some interaction between win32_kbhit and WIN32getch involved. I don't have a Windows build environment to investigate this right now.

@avih
Copy link
Contributor Author

avih commented Jun 13, 2023

Thanks.

I don't have a Windows build environment to investigate this right now.

Well, I can test some patches/printouts if you want.

adoxa added a commit to adoxa/less that referenced this issue Jun 14, 2023
The Windows version of iread removed the character, causing an extended
sequence to be incorrectly split.  Add an unget function to put the
character back.  Resolves gwsw#378.
@adoxa
Copy link
Contributor

adoxa commented Jun 14, 2023

iread is getting the 00, then the command gets the 81. PR submitted.

@gwsw gwsw closed this as completed in #384 Jun 16, 2023
gwsw pushed a commit that referenced this issue Jun 16, 2023
The Windows version of iread removed the character, causing an extended
sequence to be incorrectly split.  Add an unget function to put the
character back.  Resolves #378.
@avih
Copy link
Contributor Author

avih commented Jun 16, 2023

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants