Always bounce through the threadpool when switching from STA #856

oldnewthing · 2021-02-11T01:51:34Z

In PR #662, we changed the way we switch from STA to NA by bouncing through the threadpool so that we don't run the NA on an STA thread.

We extend this behavior to switches from STA to STA, to avoid RO_E_BLOCKED_CROSS_ASTA_CALL exceptions, which has been identified as the source of some crashes.

Interestingly, we add this support by deleting code. We would normally add tests for STA and MAINSTA to the NA test, but since MTA was ruled out by the previous test, and NA+STA+MAINSTA completely cover all the non-MTA possibilities, there's no need to perform any test at all. The fact that we got here means that the destination is NA/STA/MAINSTA.

Note also that the first test in this block checks if we are resuming in the same apartment that we are currently in. That test is important, because it avoids us bouncing through the threadpool just to return to our own thread, which would otherwise burn a threadpool thread.

Added a new unit test to verify that STA-to-STA transfers release the source STA. The test failed before the change, and passes after.

Note however that this introduces a possibly undesirable behavior change: Suppose STA thread "A" hangs and STA thread "B" has many tasks, some of which want to resume on STA thread "A".

Under the old behavior, STA thread "B" would run the first task, and then that task will try to switch to STA thread "A" and hang. STA thread "B" would remain hung and be unavailable to process tasks. When STA thread "A" finally wakes up, STA thread "B" also wakes up and resumes doing work. The number of hung threads is capped at the number of STA threads.

Under the new behavior, STA thread "B" would run the first task, and then ask a threadpool to resume on STA thread "A". The threadpool thread hangs instead of STA thread "B". STA thread "B" is now available to do other work (yay), but at a cost of a hung threadpool thread. One of the things STA thread "B" might do is create more work items that want to switch to STA thread "A", so we slowly (or maybe quickly) exhaust the threadpool.

Mitigation for this worst-case scenario is that we sort of had that problem anyway: If any task on a threadpool thread wants to switch to STA thread "A", it will hang with or without these changes. Since it is common to pass through a threadpool thread at some point, you are likely to exhaust the threadpool even under the old algorithm.

Previously, we changed the way we switch from STA to NA by bouncing through the threadpool so that we don't run the NA on an STA thread. We extend this behavior to switches from STA to STA, to avoid RO_E_BLOCKED_CROSS_ASTA_CALL exceptions. Interestingly, we add this support by deleting code. We would normally add tests for STA and MAINSTA to the NA test, but since MTA was ruled out by the previous test, and NA+STA+MAINSTA completely cover all the non-MTA possibilities, there's no need to perform any test at all. The fact that we got here means that the destination is NA/STA/MAINSTA. Note also that the first test in this block checks if we are resuming in the same apartment that we are currently in. That test is important, because it avoids us bouncing through the threadpool just to return to our own thread, which would otherwise burn a threadpool thread. Added a new unit test to verify that STA-to-STA transfers release the source STA. The test failed before the change, and passes after. Note however that this introduces a possibly undesirable behavior change: Suppose STA thread "A" hangs and STA thread "B" has many tasks, some of which want to resume on STA thread "A". Under the old behavior, STA thread "B" would run the first task, and then that task will try to switch to STA thread "A" and hang. STA thread "B" would remain hung and be unavailable to process tasks. When STA thread "A" finally wakes up, STA thread "B" also wakes up and resumes doing work. The number of hung threads is capped at the number of STA threads. Under the new behavior, STA thread "B" would run the first task, and then ask a threadpool to resume on STA thread "A". The threadpool thread hangs instead of STA thread "B". STA thread "B" is now available to do other work (yay), but at a cost of a hung threadpool thread. One of the things STA thread "B" might do is create more work items that want to switch to STA thread "A", so we slowly (or maybe quickly) exhaust the threadpool. Mitigation for this worst-case scenario is that we sort of had that problem anyway: If any task on a threadpool thread wants to switch to STA thread "A", it will hang with or without these changes. Since it is common to pass through a threadpool thread at some point, you are likely to exhaust the threadpool even under the old algorithm.

dmachaj · 2021-02-11T05:04:30Z

test/old_tests/UnitTests/apartment_context.cpp

+        // processes work, it will unstick the first STA thread.
+        // This test verifies that the second STA thread does become available
+        // to do work and is not hung waiting for the first STA thread.
+        co_await context1;


So if this were broken (aka before your changes) this co_await would block forever if not for the 1000 timeout on line 85. Is that right?

Right. My initial version had INFINITE, and the test hung. We don't like it when tests hang, so I had it detect that it had been stuck for 1 second (should be plenty) and declare failure if so.

kennykerr

Looks good - thanks Raymond!

oldnewthing requested review from dmachaj and kennykerr February 11, 2021 01:51

dmachaj reviewed Feb 11, 2021

View reviewed changes

dmachaj approved these changes Feb 11, 2021

View reviewed changes

kennykerr approved these changes Feb 11, 2021

View reviewed changes

kennykerr merged commit 0835afb into microsoft:master Feb 11, 2021

oldnewthing deleted the sta-to-sta branch February 16, 2021 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Always bounce through the threadpool when switching from STA #856

Always bounce through the threadpool when switching from STA #856

Uh oh!

oldnewthing commented Feb 11, 2021

Uh oh!

dmachaj Feb 11, 2021

Uh oh!

oldnewthing Feb 11, 2021

Uh oh!

kennykerr left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Always bounce through the threadpool when switching from STA #856

Always bounce through the threadpool when switching from STA #856

Uh oh!

Conversation

oldnewthing commented Feb 11, 2021

Uh oh!

dmachaj Feb 11, 2021

Choose a reason for hiding this comment

Uh oh!

oldnewthing Feb 11, 2021

Choose a reason for hiding this comment

Uh oh!

kennykerr left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants