Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Terminal, Command Prompt, OpenConsole processes cannot be suspended #9704

Closed
dmex opened this issue Apr 4, 2021 · 9 comments
Closed
Labels
Needs-Tag-Fix Doesn't match tag requirements Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting

Comments

@dmex
Copy link

dmex commented Apr 4, 2021

Windows Terminal version (or Windows build number)

10.0.19042.906

Other Software

Command Prompt
Windows Terminal
Open Console

Steps to reproduce

  1. Run Process Explorer and/or resmon.exe /res
  2. Suspend cmd.exe
  3. Suspend WindowsTerminal.exe
  4. Suspend OpenConsole.exe

Expected Behavior

The processes are suspended and show with a grey background in Process Explorer like other software.

For example:
image

Actual Behavior

cmd.exe and WindowsTerminal.exe cannot be suspended. Additionally OpenConsole.exe shows suspended for 5ms before magically becoming unsuspended.

For example:
image

This is especially problematic with the built-in resource monitor tool since the console process becomes deadlocked after selecting suspend and the process continues running:

image

@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Apr 4, 2021
@DHowett
Copy link
Member

DHowett commented Apr 4, 2021

We do not control this. Also, why?

@DHowett DHowett closed this as completed Apr 4, 2021
@dmex
Copy link
Author

dmex commented Apr 4, 2021

@DHowett

You can suspend other processes perfectly fine except the command prompt so that's a bug that should be looked into??

@DHowett
Copy link
Member

DHowett commented Apr 4, 2021

This is likely due to the job object that is created for WT thanks to it being launched inside a package, which is out of our control and isn’t something we can fix. It’s also just not a supported use case: individually stomping processes on your system to make sure they “behave” isn’t a normal part of system administration...

@dmex
Copy link
Author

dmex commented Apr 4, 2021

@DHowett

This is likely due to the job object that is created for WT

I can reproduce the same issue with a lot of other software without job objects... Chrome, Msedge, Visual Studio, Discord, Steam, Skype even Windbg which needs to suspend the thread/process to properly capture thread contexts when debugging the console project.

The whole point of creating this ticket was so you guys could follow up the problem internally with the kernel team since A) It affects the console and debugging the console, so it's a very relevant bug for this project and B) because it's a recent issue with Windows 10 that breaks everything using the SuspendThread function since 2000 (21 years of backwards compatibility).

It’s also just not a supported use case: individually stomping processes on your system to make sure they “behave” isn’t a normal part of system administration

SuspendThread has some very valid uses cases outside of administration especially for debugging. I reported this for the above reasons not because of administration.

@DHowett
Copy link
Member

DHowett commented Apr 4, 2021

It would help us build a case for getting this fixed if we understood your use case, you know? VS and WinDbg don’t seem to have any trouble with Terminal or its subprocesses, and if I understand them properly they’re not using any special APIs for doing so. We’ve been debugging things inside terminal or hosted by OpenConsole (which services console API functions and does not “host” a process in any meaningful way related to its process tree) for the past two years on windows versions 17763 to 21310.

I’m not embellishing this point for no reason. It seems like totally normal debugging tools work just fine for us. I’m worried that we’re looking at an XY Problem.

recent issue

I must have missed in your original report that this was a regression. Sorry about that! 😄

@dmex
Copy link
Author

dmex commented Apr 4, 2021

@DHowett

I must have missed in your original report that this was a regression

My bad. I should have mentioned earlier that it's a regression and it worked fine prior to Windows 10. For example:

Resmon:
image

Process Explorer:
image

Compared to Windows 10:
image

We’ve been debugging things inside terminal or hosted by OpenConsole for the past two years

It's probably not something that's been noticed/reported until now? Windbg uses GetThreadContext and per MSDN you cannot get a proper valid context for a running thread because of race conditions/other issues which is why suspension is required. If you attach windbg to the command prompt/windows terminal processes then it should suspend the process but doesn't and using some other tools you can see 1 thread never transitions to the Suspended wait state. For example:

The suspend count is 1 but that value is just a counter unrelated to the actual suspension:
image

The thread hasn't actually suspended and the KWAIT_REASON for the thread permanently remains Executive instead of transitioning to Suspended (the first thread with ntdll.dll!DbgUiRemoteBreakin is windbg and should be ignored)
image

It seems like totally normal debugging tools work just fine for us

The thread is the console calling ReadFile on a pipe and that thread is probably not something that gets debugged very often and since windbg is showing an incorrect suspension state its probably gone unnoticed?

It's also the same problem causing Process Explorer and the built-in Resmon tool to not show cmd.exe/WindowsTerminal.exe processes' as suspended because that one thread doesn't transition to the Suspended state like previous versions of Windows. Context switches are also supposed to trigger the suspension but queuing a kernel APC which does execute and causes a context switch/increments the counters also doesn't trigger the suspension.

I’m worried that we’re looking at an XY Problem.

I guess that's somewhat related. I'm using cmd.exe and WindowsTerminal.exe as an example of the problem but it's not just those two processes that can be used to demonstrate this issue on Windows 10 - I'm only mentioning cmd.exe and WindowsTerminal.exe since this tracker is only for issues related to the console. I did expect the issue to get closed but sort of expected someone would follow up/report the bug internally with whomever owns that kernel code especially since it can be easily reproduced with cmd.exe and Process Explorer when compared to previous versions of Windows?

These non-suspendable threads end up deadlocked inside IopSynchronousServiceTail which is also the same code apparently responsible for handling the CTRL+C signals for the console and that function also handles synchronous waits on file objects (such as when cmd.exe is calling ReadFile on a pipe handle - the same thread that doesn't get suspended)... So there might have been some changes for console support on Win10 that also accidently broke thread suspension as a side effect - I can't be sure what changed but these threads would successfully suspend on Win7/Win8 but are now permanently running on Win10 despite calling SuspendThread.

It would help us build a case for getting this fixed if we understood your use case

The primary use case is debugging since multiple programs/debuggers are reporting incorrect information. The other primary issue is backwards compatibility since this change on Windows 10 breaks Microsoft's own software (including built-in applications like resmon) and also lots of third party tools released since 2000 (including every version of our own software released over the last 12 years).

Hope this helps
-dmex

@eryksun
Copy link

eryksun commented Apr 5, 2021

The thread hasn't actually suspended and the KWAIT_REASON for the thread permanently remains Executive instead of transitioning to Suspended (the first thread with ntdll.dll!DbgUiRemoteBreakin is windbg and should be ignored)

I think this only applies to a synchronous I/O wait, which is an alertable kernel wait in IopSynchronousServiceTail for the Event of a synchronous-mode file object. The thread should switch to the suspended state as soon as the synchronous I/O wait is completed or canceled.

If I had to guess, I'd say this is related to the implementation of NtCancelSynchronousIoFile (i.e. WinAPI CancelSynchronousIo). I suggest asking about this on a more appropriate forum, such as OSR Developer Community.

These non-suspendable threads end up deadlocked inside IopSynchronousServiceTail which is also the same code apparently responsible for handling the CTRL+C signals

What deadlock? Also, what's the relevance of Ctrl+C? The only connection I see is that, for a conpty session, the Ctrl+C character is read from a pipe, which is a synchronous read. But handling this input by sending a console control event has nothing to do with I/O. The console host simply sends a request to the desktop-session server (csrss.exe in the current session) to request the creation of a control thread in the client.

@dmex
Copy link
Author

dmex commented Apr 5, 2021

the thread should switch to the suspended state as soon as the synchronous I/O wait is completed

That's what should occur in theory but in reality it's continuing to schedule and wait indefinitely without ever transitioning to the suspended state.

I'd say this is related to the implementation of NtCancelSynchronousIoFile

The cancellation feature has exited since Vista (2006) but the issue with suspension has only started with recent versions of Win10.

What deadlock?

The suspension APC is scheduled to the thread but it never actually executes regardless of how many days/weeks/months you're waiting for the state change which is the a textbook definition of a deadlock.

handling this input by sending a console control event has nothing to do with I/O

It does and always has.

what's the relevance of Ctrl+C?

The sourcecode for these functions where this deadlock occurs includes specific comments that mention the CTRL+C handling 🤷‍♂️

image

I suggest asking about this on a more appropriate forum, such as OSR Developer Community.

This isn't a development issue so they won't be much help and they use my project source as a reference so it would basically become a circle jerk with suggestions of switching to linux without anything on Windows ever getting fixed... The reason I created the issue here was so someone like @DHowett would hopefully follow up/raise this issue internally with his co-workers since this issue affects cmd.exe and debugging cmd.exe compared to OSR where people would just be upset and not able to do anything about it.

The only other place for reporting the issue would be using this repository (https://github.com/microsoft/Windows-Dev-Performance) but since the code for the deadlock references the console it might have been a recent change for console support that broke suspension so again I figured reporting it here would be the best place so it could be looked into?

@eryksun
Copy link

eryksun commented Apr 6, 2021

I poked around a bit. There are a couple of interesting changes relevant to suspending threads.

The first change should be a transparent optimization. If KiDisableLightWeightSuspend is 0, KiSuspendThread() doesn't call KiSignalThreadForApc() to deliver the suspend APC. If the thread is already waiting, the APC remains in the queue until the wait ends. The thread's Tcb.WaitReason does not change to Suspended (5). However, the SYSTEM_THREAD_INFORMATION for the thread reports that the wait state is Suspended, which effectively is the case for this light-weight suspend. When the thread's suspend count is decremented to 0, the original wait continues. The suspend APC gets delivered when the thread is dispatched, but there's nothing for it to do by then.

The change that's problematic is that, for non-alertable synchronous I/O, system calls such as NtReadFile are now implemented to disable kernel APCs before acquiring the file lock. The alertable kernel wait in IopSynchronousServiceTail() thus only allows delivery of special kernel APCs, such as the I/O completion APC. When the wait ends because the I/O request is completed or canceled (e.g. a console read completes), the file lock is released via IopReleaseFileObjectLock(). This in turn calls KeLeaveCriticalRegion(), which enables kernel APCs (i.e. increments the thread's KernelApcDisable value) and calls KiCheckForKernelApcDelivery(), which delivers the suspend APC. At this point, the thread wait state should become Suspended.

For example, if you suspend cmd.exe, pressing enter in its console or terminal window should complete its ReadConsoleW() call (internally an NtDeviceIoControlFile call) and allow the suspend APC to run. Suspending the host process for a conpty (pseudconsole) session is a bit more complicated. Pressing enter in the console is necessary but not sufficient because the host process may have two threads doing a synchronous read, one for the input pipe and one for the signal pipe. For the latter, resizing the console/terminal window should write to the signal pipe and complete the read in the host. Once both threads become suspended, the conhost.exe or openconsole.exe process should report as suspended in Process Explorer or Resource Monitor.

That said, I do not know why non-alertable synchronous I/O is disabling delivery of normal kernel APCs, such as the suspend APC, while the file object's lock is held.

What I do know is that, starting with Windows Vista, pended synchronous creates were changed to use an alertable kernel-mode wait in order to support cancellation when the requesting thread is terminated. Moreover, it seems that all synchronous I/O waits were changed to use an alertable kernel-mode wait. (Formerly, for an I/O system call from user mode, the wait for non-alertable synchronous I/O was a non-alertabled user-mode wait.) Thread termination in this case relies on KeAlertThread() to interrupt the alertable kernel-mode wait. When the wait in IopSynchronousServiceTail() is alerted, it calls IopCancelAlertedRequest() if the thread is terminating or the file mode is alertable synchronous I/O. Otherwise, since an arbitrary alert shouldn't be allowed to interrupt non-alertable synchronous I/O, it calls IopCheckIrpCancelled() and resumes waiting if the request isn't canceled.

The sourcecode for these functions where this deadlock occurs includes specific comments that mention the CTRL+C handling man_shrugging

The Interix (SUA) subsystem has some way to use APCs to implement POSIX signals. The Windows subsystem (or rather its designers), on the other hand, is opposed to the way Unix signals work, which entails executing handlers asynchronously in user mode on a thread that's not in a alertable wait state. Instead, Windows delivers a console control event, such as CTRL_C_EVENT, by creating a new thread in the target process that begins at the internal CtrlRoutine function.

This isn't a development issue so they won't be much help and they use my project source as a reference so it would basically become a circle jerk with suggestions of switching to linux without anything on Windows ever getting fixed...

The OSR community is for developers of NT device drivers. Like any online forum, I'm sure it has problems with trolls and difficult personalities. But the founding members there, who have been writing NT device drivers for almost 30 years, are better situated to get the attention of the kernel developers at Microsoft. I don't think the terminal/conhost team has the time or inclination to internally report an issue like this and relay feedback to you, but maybe you lucked out here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs-Tag-Fix Doesn't match tag requirements Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting
Projects
None yet
Development

No branches or pull requests

3 participants