Properly report failure to switch to target apartment #1026

oldnewthing · 2021-09-27T03:49:31Z

If we were unable to switch to a target apartment (either by an explicit co_await apartment_context or implicitly at the completion of co_await IAsyncInfo), we now report the exception in the co_await'ing coroutine, so it can be caught and handled.

Note that when this error occurs, we are stuck in limbo. We left the original apartment for the MTA, and now we can't get from the MTA to the target apartment. We have no choice but to raise the exception on the MTA, even though the coroutine may not have expected to be running there.

IAsyncAction Trapped()
{
    Object o; // some object with a destructor
    co_await destroyed_ui_context; // throws an exception
}

The destructor for Object will run on the MTA, even though the Trapped coroutine never runs on the MTA under normal conditions.

This is making the best of a bad situation. We need to raise the exception into the coroutine so that the coroutine can enter a failure state and allow the coroutine chain to proceed.

Since the object o already had to be prepared for destruction on the original apartment or the switched-to apartment (in case an exception occurs in either apartment), we know that the object cannot have apartment affinity anyway. So we're probably okay to destruct it on the MTA.

There is still a gotcha: If the co_await completes successfully, but is unable to return to the original apartment, the completed value of the co_await is lost, since the success is transformed to an apartment-switching exception. If the awaited coroutine's return value requires special cleanup, that cleanup will not occur. Example:

IAsyncAction Trapped()
{
    auto lamp = co_await TurnOnAsync();
    auto ensure_off = wil::scope_exit([&] { lamp.TurnOff(); });
    /* do more stuff with the lamp */
}

If the TurnOnAsync succeeds, but we are unable to return to the original apartment, the lamp is not returned by co_await, which means it never gets turned off. There's not really much we can do about this since co_await cannot both return a value and throw an exception.

One workaround is to do the co_await on the MTA:

IAsyncAction Trapped()
{
    apartment_context original_apartment;
    auto lampAction = TurnOnAsync();
    co_await resume_background();
    auto lamp = co_await lampAction; // await on the MTA
    auto ensure_off = wil::scope_exit([&] { lamp.TurnOff(); });
    co_await original_apartment;
    /* do more stuff with the lamp */
}

Another is to clean up the lamp in an exception handler.

IAsyncAction Trapped()
{
    auto lampAction = TurnOnAsync();
    Lamp lamp{ nullptr };
    try
    {
        lamp = co_await lampAction;
    }
    catch (...)
    {
        if (lampAction.Status() == AsyncStatus::Completed) lamp = lampAction.GetResults().TurnOff();
        throw;
    }
    auto ensure_off = wil:;scope_exit([&] { lamp.TurnOff(); });
    /* do more stuff with the lamp */
}

Implementation notes

The old code raised the exception on whatever thread happened to notice that something bad happened. Sometimes that's okay, for example if you do a co_await destroyed_ui_context from an MTA thread, in which case the exception is thrown into the coroutine and can be caught, or left uncaught and send the coroutine into an error state.

However, all of the other cases weren't quite so happy. To avoid deadlocks, the apartment switch is routed through the threadpool, and that means that the failure is raised on a threadpool thread. There's nobody around to catch that exception, so the process fails fast.

We solve the problem by having an hresult in the awaiter, which is initialized to 0 (S_OK) and transitions to error when we are about to resume the coroutine on the wrong apartment, so that the error can be propagated during await_resume(), which happens in the context of the awaiting coroutine and therefore can be caught by the coroutine (or sent to unhandled_exception).

This requires apartment_context to have a separate awaiter, rather than being its own awaiter, because we need a place to communicate the apartment switch failure. We can't store it in the apartment_context because we support awaiting a const apartment_context.

If we were unable to switch to a target apartment (either by an explicit `co_await apartment_context` or implicitly at the completion of ` co_await IAsyncInfo`), we now report the exception in the `co_await`'ing coroutine, so it can be caught and handled. Note that when this error occurs, we are stuck in limbo. We left the original apartment for the MTA, and now we can't get from the MTA to the target apartment. We have no choice but to raise the exception on the MTA, even though the coroutine may not have expected to be running there. ```cpp IAsyncAction Trapped() { Object o; // some object with a destructor co_await destroyed_ui_context; // throws an exception } ``` The destructor for `Object` will run on the MTA, even though the `Trapped` coroutine never runs on the MTA under normal conditions. This is making the best of a bad situation. We need to raise the exception into the coroutine so that the coroutine can enter a failure state and allow the coroutine chain to proceed. Since the object `o` already had to be prepared for destruction on the original apartment or the switched-to apartment (in case an exception occurs in either apartment), we know that the object cannot have apartment affinity anyway. So we're probably okay to destruct it on the MTA. The old code raised the exception on whatever thread happened to notice that something bad happened. Sometimes that's okay, for example if you do a `co_await destroyed_ui_context` from an MTA thread, in which case the exception is thrown into the coroutine and can be caught, or left uncaught and send the coroutine into an error state. However, all of the other cases weren't quite so happy. To avoid deadlocks, the apartment switch is routed through the threadpool, and that means that the failure is raised on a threadpool thread. There's nobody around to catch that exception, so the process fails fast. We solve the problem by having an atomic hresult in the awaiter, which is initialized to 0 (`S_OK`) and transitions to error when we are about to resume the coroutine on the wrong apartment, so that the error can be propagated during `await_resume()`, which happens in the context of the awaiting coroutine and therefore can be caught by the coroutine (or sent to `unhandled_exception`). The hresult in which we record the failure needs to be atomic to avoid data races in case multiple threads detect failure simultaneously. However, the access to it can be relaxed, because it is set and consumed by the same thread. The current implementation pulls a sneaky trick: To avoid having to create an `operator co_await` for the `apartment_context` object, we continue to let it be its own awaiter. This means that if the apartment switch fails, we record the failure in the `apartment_context` itself. We are making the assumption that once an apartment context goes bad, it is unrecoverably broken.

strings/base_coroutine_threadpool.h

This also removes the need for an atomic `int32_t`, and removes the assumption that context failure is permanent.

Should be initializing the int32_t, not the pointer

antmor · 2021-09-28T22:17:09Z

nit: the description's scope_exit have typos, :;scope_exit instead of ::scope_exit

antmor · 2021-09-28T22:26:57Z

So the only drawback here is that, if you have a STA-bound object, the Release and possibly the destructor of the object is not guaranteed to be freed on a STA if the co_await fails. We should at least make sure that in our documentation, we say that objects crossing a co_await need to be agile enough to not fail out if destruction doesn't happen happen on the source STA, if the thread can't be guaranteed by the developer.

oldnewthing · 2021-09-28T22:30:07Z

@antmor If you have an STA-bound object, you're already in trouble, because it might destruct on the wrong apartment even in the absence of the MTA weirdness.

fire_and_forget Victim()
{
   StaObject object;
   DoSomething1();
   co_await resume_foreground(dispatcher);
   DoSomething2();
}

If DoSomething1() or resume_foreground() raise an exception, then the StaObject destructs on the old apartment. But if the co_await or the DoSomething2() raise an exception, then the StaObject destructs on the new apartment.

So the StaObject needs to be prepared to be destructed in either the old apartment or the new one. In other words, it must already be agile.

oldnewthing requested a review from kennykerr September 27, 2021 03:49

oldnewthing commented Sep 27, 2021

View reviewed changes

strings/base_coroutine_threadpool.h Show resolved Hide resolved

oldnewthing added 2 commits September 27, 2021 06:53

Allow await a const apartment_context

7d25848

This also removes the need for an atomic `int32_t`, and removes the assumption that context failure is permanent.

Initialized the wrong member variable

0bb1781

Should be initializing the int32_t, not the pointer

kennykerr approved these changes Sep 29, 2021

View reviewed changes

kennykerr merged commit ae411cc into microsoft:master Sep 30, 2021

oldnewthing deleted the failed-apartment-switch branch September 30, 2021 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Properly report failure to switch to target apartment #1026

Properly report failure to switch to target apartment #1026

Uh oh!

oldnewthing commented Sep 27, 2021 •

edited

Loading

Uh oh!

Uh oh!

antmor commented Sep 28, 2021

Uh oh!

antmor commented Sep 28, 2021

Uh oh!

oldnewthing commented Sep 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Properly report failure to switch to target apartment #1026

Properly report failure to switch to target apartment #1026

Uh oh!

Conversation

oldnewthing commented Sep 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation notes

Uh oh!

Uh oh!

antmor commented Sep 28, 2021

Uh oh!

antmor commented Sep 28, 2021

Uh oh!

oldnewthing commented Sep 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oldnewthing commented Sep 27, 2021 •

edited

Loading