Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential lags/deadlocks during tab close #14041

Merged
4 commits merged into from
Sep 21, 2022
Merged

Conversation

lhecker
Copy link
Member

@lhecker lhecker commented Sep 19, 2022

ConptyClosePseudoConsole blocks until OpenConsole exits.
This is problematic for the changes in 666c446, which stopped calling that
function on a background thread to solve a race condition. This commit fixes
the potential lags/deadlocks from waiting on OpenConsole's exit, by adding
ConptyClosePseudoConsoleNoWait which only closes the IO handles and allows
OpenConsole to exit naturally. This uncovered another potential deadlock
in ServiceLocator::RundownAndExit which might call itself recursively.

Closes #14032

Validation Steps Performed

  • Print tons of text and concurrently close the tab.
    Tab closes, OpenConsole/pwsh exits instantly ✅
  • Use Enter-VsDevShell and close the tab.
    Tab closes instantly, OpenConsole/pwsh exits after ~5 seconds ✅

@ghost ghost added the Issue-Bug It either shouldn't be doing this or needs an investigation. label Sep 19, 2022
@lhecker lhecker added Product-Conpty For console issues specifically related to conpty Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) Product-Terminal The new Windows Terminal. Priority-1 A description (P1) labels Sep 20, 2022
@@ -631,12 +647,6 @@ namespace winrt::Microsoft::Terminal::TerminalConnection::implementation
const auto result{ til::u8u16(std::string_view{ _buffer.data(), read }, _u16Str, _u8State) };
if (FAILED(result))
{
if (_isStateAtOrBeyond(ConnectionState::Closing))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we certain that we'll never make it here while a connection is closing? As long as we never transition a connection into the Failed state after we transition it into Closing...

Copy link
Member Author

@lhecker lhecker Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually entirely possible and the source of the race condition that I fixed.
While _isStateAtOrBeyond itself might be atomic, that doesn't mean it protects us from other threads changing the state under us concurrently. Unless I misunderstood your question, I don't believe that this is any more or less correct than before. (But the change is required to make the CancelSynchronousIo handling work correctly.)

// MSFT:40146639
// The premise of this function is that 1 thread enters and 0 threads leave alive.
// We need to prevent anyone from calling us until we actually ExitProcess(),
// so that we don't TriggerTeardown() twice. LockConsole() can't be used here,
// because doing so would prevent the render thread from progressing.
AcquireSRWLockExclusive(&s_shutdownLock);
if (locked.exchange(true, std::memory_order_relaxed))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe on all architectures? Is relaxed correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's safe on all architectures. Atomic operations themselves are always exactly as atomic as any other, no matter what memory order is specified. Relaxed only means that no additional memory is synchronized/flushed (which is fine here).

src/interactivity/base/ServiceLocator.cpp Outdated Show resolved Hide resolved
AcquireSRWLockExclusive(&s_shutdownLock);
if (locked.exchange(true, std::memory_order_relaxed))
{
Sleep(INFINITE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to confirm.

If we get here, it is the first entry and we take the lock, we proceed .We might cause either...

  1. A recursive call to RundownAndExit
  2. Another thread to RundownAndExit

The first one is fixed by preventRecursion, and the second one is fixed by locked -> Sleep(INFINITE)?

What if we trigger another thread to RundownAndExit while we're waiting on it to complete? Is that possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the change to s_shutdownLock is optional. I did it because I initially tried to implement this with a recursive mutex (which SRWLOCK can't do), failed and then wrote the atomic+sleep code, which I kept because I felt like it's easier to see what it does compared to abusing a SRWLOCK for "don't actually lock, only suspend anyone but me".

@@ -39,14 +39,30 @@ void ServiceLocator::SetOneCoreTeardownFunction(void (*pfn)()) noexcept
s_oneCoreTeardownFunction = pfn;
}

[[noreturn]] void ServiceLocator::RundownAndExit(const HRESULT hr)
void ServiceLocator::RundownAndExit(const HRESULT hr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes here are scary. Are we sure this won't cause the OneCore console to regress?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I'm not sure, but I don't believe so. The only effective difference is the addition of the thread_local and I could revert the change to s_shutdownLock if you'd like.

@DHowett
Copy link
Member

DHowett commented Sep 20, 2022

What behavior do you see for process management when you open a hundred tabs and then close the window (f.ex)?

@lhecker
Copy link
Member Author

lhecker commented Sep 20, 2022

What behavior do you see for process management when you open a hundred tabs and then close the window (f.ex)?

Closing ~40 pwsh tabs took about 2-3s for WT to disappear. The pwsh/OpenConsole processes however quit instantly. (Maybe some sort of WinUI lag? Didn't investigate it any further.)
Closing ~20 tabs of the VS dev shell made WT disappear after 1-2s. The pwsh/OpenConsole processes got detached and disappeared after 5-10s (some were faster, some took slightly longer).

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this. I'll pack it into a 1.16 build and ship it out for folks...!

@DHowett DHowett added this to To Cherry Pick in 1.16 Servicing Pipeline via automation Sep 20, 2022
Copy link
Member

@carlos-zamora carlos-zamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ngl, it's a bit hard to follow. I think with a few answered questions I'll feel comfortable approving though.

@@ -382,7 +382,7 @@ void ConPtyTests::DiesOnClose()
Log::Comment(NoThrowString().Format(L"Sleep a bit to let the process attach"));
Sleep(100);

_ClosePseudoConsoleMembers(&pty);
_ClosePseudoConsoleMembers(&pty, TRUE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Y'all don't like default params, huh? :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a C function - that's why it can't have that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also FWIW no, I personally don't like default parameters! I also don't like underdocumented boolean parameters for reasons I've mentioned before, but since this is an internal/in-file interface I don't care as much.

@@ -86,7 +86,6 @@ namespace Microsoft::Console::Interactivity

static HWND LocatePseudoWindow(const HWND owner = nullptr /*HWND_DESKTOP = 0*/);

protected:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove protected? Should we keep protected for some of these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class is final so that protected has no effect. It's also not usual / best practice to make classes any more private than necessary. A non-public copy constructor for instance is pretty unusual since you would only be able to use them inside the class. It gets even more unusual when you consider that the functions are explicitly deleted, so not even the class itself can use them. In short, this code was kinda whacky. 😅

AcquireSRWLockExclusive(&s_shutdownLock);
if (locked.exchange(true, std::memory_order_relaxed))
{
Sleep(INFINITE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put a big ol' comment here why it's ok to put Sleep(INFINITE);. I think it'd be nice instead of having to find this PR and read through the discussion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It ended up being a bit of a shorter comment. But I think that one, plus the longer explanation in front of the if condition should sufficiently explain what this does... 🤔

@DHowett DHowett added the AutoMerge Marked for automatic merge by the bot when requirements are met label Sep 21, 2022
@ghost
Copy link

ghost commented Sep 21, 2022

Hello @DHowett!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit 274bdb3 into main Sep 21, 2022
@ghost ghost deleted the dev/lhecker/14032-close-lag branch September 21, 2022 22:26
@DHowett DHowett moved this from To Cherry Pick to Cherry Picked in 1.16 Servicing Pipeline Sep 21, 2022
DHowett pushed a commit that referenced this pull request Sep 21, 2022
`ConptyClosePseudoConsole` blocks until OpenConsole exits.
This is problematic for the changes in 666c446, which stopped calling that
function on a background thread to solve a race condition. This commit fixes
the potential lags/deadlocks from waiting on OpenConsole's exit, by adding
`ConptyClosePseudoConsoleNoWait` which only closes the IO handles and allows
OpenConsole to exit naturally. This uncovered another potential deadlock
in `ServiceLocator::RundownAndExit` which might call itself recursively.

Closes #14032

## Validation Steps Performed
* Print tons of text and concurrently close the tab.
  Tab closes, OpenConsole/pwsh exits instantly ✅
* Use `Enter-VsDevShell` and close the tab.
  Tab closes instantly, OpenConsole/pwsh exits after ~5 seconds ✅

(cherry picked from commit 274bdb3)
Service-Card-Id: 85767341
Service-Version: 1.16
@ghost
Copy link

ghost commented Sep 23, 2022

🎉Windows Terminal Preview v1.16.2641.0 has been released which incorporates this pull request.:tada:

Handy links:

@DHowett DHowett added this to To Cherry Pick in 1.15 Servicing Pipeline via automation Oct 13, 2022
@DHowett DHowett moved this from To Cherry Pick to Cherry Picked in 1.15 Servicing Pipeline Oct 13, 2022
DHowett pushed a commit that referenced this pull request Oct 13, 2022
`ConptyClosePseudoConsole` blocks until OpenConsole exits.
This is problematic for the changes in 666c446, which stopped calling that
function on a background thread to solve a race condition. This commit fixes
the potential lags/deadlocks from waiting on OpenConsole's exit, by adding
`ConptyClosePseudoConsoleNoWait` which only closes the IO handles and allows
OpenConsole to exit naturally. This uncovered another potential deadlock
in `ServiceLocator::RundownAndExit` which might call itself recursively.

Closes #14032

## Validation Steps Performed
* Print tons of text and concurrently close the tab.
  Tab closes, OpenConsole/pwsh exits instantly ✅
* Use `Enter-VsDevShell` and close the tab.
  Tab closes instantly, OpenConsole/pwsh exits after ~5 seconds ✅

(cherry picked from commit 274bdb3)
Service-Card-Id: 86209670
Service-Version: 1.15
@ghost
Copy link

ghost commented Oct 18, 2022

🎉Windows Terminal v1.15.2874 has been released which incorporates this pull request.:tada:

Handy links:

@DHowett DHowett moved this from Cherry Picked to Shipped in 1.15 Servicing Pipeline Dec 1, 2022
@DHowett DHowett moved this from Cherry Picked to Shipped in 1.16 Servicing Pipeline Dec 1, 2022
ghost pushed a commit that referenced this pull request Dec 16, 2022
2 new ConPTY APIs were added as part of this commit:
* `ClosePseudoConsoleTimeout`
  Complements `ClosePseudoConsole`, allowing users to override the `INFINITE`
  wait for OpenConsole to exit with any arbitrary timeout, including 0.
* `ConptyReleasePseudoConsole`
  This releases the `\Reference` handle held by the `HPCON`. While it makes
  launching further PTY clients via `PROC_THREAD_ATTRIBUTE_PSEUDOCONSOLE`
  impossible, it does allow conhost/OpenConsole to exit naturally once all
  console clients have disconnected. This makes it unnecessary to having to
  monitor the exit of the spawned shell/application, just to then call
  `ClosePseudoConsole`, while carefully continuing to read from the output
  pipe. Instead users can now just read from the output pipe until it is
  closed, knowing for sure that no more data will come or clients connect.
  This is especially useful in combination with `ClosePseudoConsoleTimeout`
  and a timeout of 0, to allow conhost/OpenConsole to exit asynchronously.

These new APIs are used to fix an array of bugs around Windows Terminal exiting
either too early or too late. They also make usage of the ConPTY API simpler in
most situations (when spawning a single application and waiting for it to exit).

Depends on #13882, #14041, #14160, #14282

Closes #4564
Closes #14416
Closes MSFT-42244182

## Validation Steps Performed
* Calling `FreeConsole` in a handoff'd application closes the tab ✅
* Create a .bat file containing only `start /B cmd.exe`.
  If WT stable is the default terminal the tab closes instantly ✅
  With these changes included the tab stays open with a cmd.exe prompt ✅
* New ConPTY tests ✅
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-1 A description (P1) Product-Conpty For console issues specifically related to conpty Product-Terminal The new Windows Terminal.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Closing tab hangs when using VS Dev tools and Powershell Core
3 participants