We need a mechanism to crash Trio with a given exception; how should it work? #1607

njsmith · 2020-06-10T09:39:55Z

There are a few places where we want to crash trio.run with a given exception. Meaning something like: cancel the main task, unwind everything, raise the given exception as the final result. This is needed for:

KeyboardInterrupt (when it's not delivered directly to a task) (Rethink KeyboardInterrupt protection APIs #733, Raise deferred KeyboardInterrupt out of run() after cancelling all tasks #1537, etc.)
Exceptions from "can't fail" functions like run_sync_soon and system tasks (which also want a way to avoid raising TrioInternalError – Make the TrioInternalError experience less confusing #1056)
When a deadlock is detected (Global deadlock detector #1085)
When pytest-trio's test timeout expires (Add test timeout support pytest-trio#53)

It's not quite clear what the semantics should be though!

One option: when asked to "crash", we:

cancel the main task, if it's still running (if not, that means we're already shutting down, so we can skip this step)
save the "crash" exception in the Runner state somewhere
on the way out of run, check if we have any saved exceptions, and if so, do... something with them. If main_task_outcome is a Value, it's easy, just raise a MultiError(saved_exceptions). If the main_task_outcome is an Error, I'm not sure what the best approach is. MultiError(saved_exceptions) with the main_task_outcome as __context__? That could be confusing since the exception stored in main_task_outcome might have only occurred after and as a result of a sequence like "crash" → cancel main task → something on the cancel path raises an exception. I guess the only alternatives though are (a) discarding the main task exception, (b) MultiError(saved_exceptions + [main_task_exception]). Not sure if either of those is actually better.

Note that we're already doing something kinda similar at the end of unrolled_run to handle ki_pending.

The text was updated successfully, but these errors were encountered:

njsmith · 2020-06-14T05:54:30Z

I guess I can also see an argument that if there are some queued up crasher exceptions + the main task raises an exception, then maybe we should discard the crasher exceptions?

For most of the listed use cases, we probably want to dump some error message to the console as well as injecting the exception, because they represent bugs that might break unwinding, so we can't count on the injected exception making its way out. And console message + whatever exception resulted from that seems like a reasonable final outcome.

That doesn't exactly apply to KeyboardInterrupt, though.

oremanj · 2020-06-26T09:23:26Z

This analysis all seems generally correct to me, and I've been thinking about this feature as well.

I think there are maybe even two different operations we want, which I'll call "fail" and "crash". "fail" is the thing you've been describing here, where everything gets unwound nicely. "crash" is for deliberately abandoning running tasks when we've lost faith in our ability to unwind them properly. I think there's some discussion of the latter in #1056, and the fork-survival techniques in #1614 are probably applicable as well.

How do we decide when we won't be able to unwind properly? For a deadlock or test timeout, I think it's something like "when wait_all_tasks_blocked(1) returns" -- ie, let the unwinding continue until it seems "stuck" (on a shield, never-rescheduled task, etc). For KeyboardInterrupt, probably a KI when we're already unwinding on failure should produce a crash. For system task and run_sync_soon callback exceptions, I'm not sure if there's a good general approach. Maybe we just log the error and continue with the fail path, without any fallback to crash. If the issue prevents unwinding, then either the user's Ctrl+C or the deadlock detector / test timeout will get us into the crashing state eventually.

Under this model, TrioInternalError of the form "unexpected exception escaped from the run loop" would be implemented as a crash.

Do we need to actually leak tasks on crash, or can we do some kind of forced teardown? I'm imagining something like: throw GeneratorExit (or another BaseException, or even just Cancelled) in at every checkpoint until the coroutine is exhausted, in an environment with no GLOBAL_RUN_CONTEXT (so the task can't wreak too much havoc as it unwinds). Fall back to leaking if this goes on for too long. This is better than leaking if it works, but I can also see the argument that it's too complicated for what's effectively a last-ditch failsafe option.

Result of run():

We should probably preserve the current behavior that KeyboardInterrupt "wins" and pushes everything else to __context__. That makes it easy to catch KeyboardInterrupt reliably. None of the other failure exceptions are likely to be specifically caught.
For everything else, I probably favor bundling everything (including the main task error) into a single MultiError, so that once we have ExceptionGroup we can label where each thing came from. I don't feel strongly on this one though.

njsmith mentioned this issue Jun 10, 2020

Add support for async generator finalization #1564

Merged

oremanj added exception handling todo soon labels Jun 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

We need a mechanism to crash Trio with a given exception; how should it work? #1607

We need a mechanism to crash Trio with a given exception; how should it work? #1607

njsmith commented Jun 10, 2020

njsmith commented Jun 14, 2020

oremanj commented Jun 26, 2020

We need a mechanism to crash Trio with a given exception; how should it work? #1607

We need a mechanism to crash Trio with a given exception; how should it work? #1607

Comments

njsmith commented Jun 10, 2020

njsmith commented Jun 14, 2020

oremanj commented Jun 26, 2020