-
Notifications
You must be signed in to change notification settings - Fork 262
Properly report failure to switch to target apartment #1026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly report failure to switch to target apartment #1026
Conversation
If we were unable to switch to a target apartment
(either by an explicit `co_await apartment_context`
or implicitly at the completion of ` co_await IAsyncInfo`),
we now report the exception in the `co_await`'ing coroutine,
so it can be caught and handled.
Note that when this error occurs, we are stuck in limbo.
We left the original apartment for the MTA, and now we can't
get from the MTA to the target apartment. We have no choice but
to raise the exception on the MTA, even though the coroutine
may not have expected to be running there.
```cpp
IAsyncAction Trapped()
{
Object o; // some object with a destructor
co_await destroyed_ui_context; // throws an exception
}
```
The destructor for `Object` will run on the MTA, even though
the `Trapped` coroutine never runs on the MTA under normal
conditions.
This is making the best of a bad situation. We need to raise
the exception into the coroutine so that the coroutine can
enter a failure state and allow the coroutine chain to proceed.
Since the object `o` already had to be prepared for destruction
on the original apartment or the switched-to apartment (in case
an exception occurs in either apartment), we know that the
object cannot have apartment affinity anyway. So we're probably
okay to destruct it on the MTA.
The old code raised the exception on whatever thread happened
to notice that something bad happened. Sometimes that's okay,
for example if you do a `co_await destroyed_ui_context` from
an MTA thread, in which case the exception is thrown into the
coroutine and can be caught, or left uncaught and send the
coroutine into an error state.
However, all of the other cases weren't quite so happy.
To avoid deadlocks, the apartment switch is routed through
the threadpool, and that means that the failure is raised
on a threadpool thread. There's nobody around to catch that
exception, so the process fails fast.
We solve the problem by having an atomic hresult in the
awaiter, which is initialized to 0 (`S_OK`) and transitions
to error when we are about to resume the coroutine on the wrong
apartment, so that the error can be propagated during
`await_resume()`, which happens in the context of the awaiting
coroutine and therefore can be caught by the coroutine
(or sent to `unhandled_exception`).
The hresult in which we record the failure needs to be atomic
to avoid data races in case multiple threads detect failure
simultaneously. However, the access to it can be
relaxed, because it is set and consumed by the same thread.
The current implementation pulls a sneaky trick: To avoid
having to create an `operator co_await` for the `apartment_context`
object, we continue to let it be its own awaiter. This means
that if the apartment switch fails, we record the failure in
the `apartment_context` itself. We are making the assumption
that once an apartment context goes bad, it is unrecoverably
broken.
This also removes the need for an atomic `int32_t`, and removes the assumption that context failure is permanent.
Should be initializing the int32_t, not the pointer
|
nit: the description's scope_exit have typos, |
|
So the only drawback here is that, if you have a STA-bound object, the |
|
@antmor If you have an STA-bound object, you're already in trouble, because it might destruct on the wrong apartment even in the absence of the MTA weirdness. fire_and_forget Victim()
{
StaObject object;
DoSomething1();
co_await resume_foreground(dispatcher);
DoSomething2();
}If So the StaObject needs to be prepared to be destructed in either the old apartment or the new one. In other words, it must already be agile. |
If we were unable to switch to a target apartment (either by an explicit
co_await apartment_contextor implicitly at the completion ofco_await IAsyncInfo), we now report the exception in theco_await'ing coroutine, so it can be caught and handled.Note that when this error occurs, we are stuck in limbo. We left the original apartment for the MTA, and now we can't get from the MTA to the target apartment. We have no choice but to raise the exception on the MTA, even though the coroutine may not have expected to be running there.
The destructor for
Objectwill run on the MTA, even though theTrappedcoroutine never runs on the MTA under normal conditions.This is making the best of a bad situation. We need to raise the exception into the coroutine so that the coroutine can enter a failure state and allow the coroutine chain to proceed.
Since the object
oalready had to be prepared for destruction on the original apartment or the switched-to apartment (in case an exception occurs in either apartment), we know that the object cannot have apartment affinity anyway. So we're probably okay to destruct it on the MTA.There is still a gotcha: If the
co_awaitcompletes successfully, but is unable to return to the original apartment, the completed value of theco_awaitis lost, since the success is transformed to an apartment-switching exception. If the awaited coroutine's return value requires special cleanup, that cleanup will not occur. Example:If the
TurnOnAsyncsucceeds, but we are unable to return to the original apartment, the lamp is not returned byco_await, which means it never gets turned off. There's not really much we can do about this sinceco_awaitcannot both return a value and throw an exception.One workaround is to do the
co_awaiton the MTA:Another is to clean up the lamp in an exception handler.
Implementation notes
The old code raised the exception on whatever thread happened to notice that something bad happened. Sometimes that's okay, for example if you do a
co_await destroyed_ui_contextfrom an MTA thread, in which case the exception is thrown into the coroutine and can be caught, or left uncaught and send the coroutine into an error state.However, all of the other cases weren't quite so happy. To avoid deadlocks, the apartment switch is routed through the threadpool, and that means that the failure is raised on a threadpool thread. There's nobody around to catch that exception, so the process fails fast.
We solve the problem by having an hresult in the awaiter, which is initialized to 0 (
S_OK) and transitions to error when we are about to resume the coroutine on the wrong apartment, so that the error can be propagated duringawait_resume(), which happens in the context of the awaiting coroutine and therefore can be caught by the coroutine (or sent tounhandled_exception).This requires
apartment_contextto have a separate awaiter, rather than being its own awaiter, because we need a place to communicate the apartment switch failure. We can't store it in theapartment_contextbecause we support awaiting aconst apartment_context.