Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need a mechanism to crash Trio with a given exception; how should it work? #1607

Open
njsmith opened this issue Jun 10, 2020 · 2 comments

Comments

@njsmith
Copy link
Member

njsmith commented Jun 10, 2020

There are a few places where we want to crash trio.run with a given exception. Meaning something like: cancel the main task, unwind everything, raise the given exception as the final result. This is needed for:

It's not quite clear what the semantics should be though!

One option: when asked to "crash", we:

  • cancel the main task, if it's still running (if not, that means we're already shutting down, so we can skip this step)
  • save the "crash" exception in the Runner state somewhere
  • on the way out of run, check if we have any saved exceptions, and if so, do... something with them. If main_task_outcome is a Value, it's easy, just raise a MultiError(saved_exceptions). If the main_task_outcome is an Error, I'm not sure what the best approach is. MultiError(saved_exceptions) with the main_task_outcome as __context__? That could be confusing since the exception stored in main_task_outcome might have only occurred after and as a result of a sequence like "crash" → cancel main task → something on the cancel path raises an exception. I guess the only alternatives though are (a) discarding the main task exception, (b) MultiError(saved_exceptions + [main_task_exception]). Not sure if either of those is actually better.

Note that we're already doing something kinda similar at the end of unrolled_run to handle ki_pending.

@njsmith
Copy link
Member Author

njsmith commented Jun 14, 2020

I guess I can also see an argument that if there are some queued up crasher exceptions + the main task raises an exception, then maybe we should discard the crasher exceptions?

For most of the listed use cases, we probably want to dump some error message to the console as well as injecting the exception, because they represent bugs that might break unwinding, so we can't count on the injected exception making its way out. And console message + whatever exception resulted from that seems like a reasonable final outcome.

That doesn't exactly apply to KeyboardInterrupt, though.

@oremanj
Copy link
Member

oremanj commented Jun 26, 2020

This analysis all seems generally correct to me, and I've been thinking about this feature as well.

I think there are maybe even two different operations we want, which I'll call "fail" and "crash". "fail" is the thing you've been describing here, where everything gets unwound nicely. "crash" is for deliberately abandoning running tasks when we've lost faith in our ability to unwind them properly. I think there's some discussion of the latter in #1056, and the fork-survival techniques in #1614 are probably applicable as well.

How do we decide when we won't be able to unwind properly? For a deadlock or test timeout, I think it's something like "when wait_all_tasks_blocked(1) returns" -- ie, let the unwinding continue until it seems "stuck" (on a shield, never-rescheduled task, etc). For KeyboardInterrupt, probably a KI when we're already unwinding on failure should produce a crash. For system task and run_sync_soon callback exceptions, I'm not sure if there's a good general approach. Maybe we just log the error and continue with the fail path, without any fallback to crash. If the issue prevents unwinding, then either the user's Ctrl+C or the deadlock detector / test timeout will get us into the crashing state eventually.

Under this model, TrioInternalError of the form "unexpected exception escaped from the run loop" would be implemented as a crash.

Do we need to actually leak tasks on crash, or can we do some kind of forced teardown? I'm imagining something like: throw GeneratorExit (or another BaseException, or even just Cancelled) in at every checkpoint until the coroutine is exhausted, in an environment with no GLOBAL_RUN_CONTEXT (so the task can't wreak too much havoc as it unwinds). Fall back to leaking if this goes on for too long. This is better than leaking if it works, but I can also see the argument that it's too complicated for what's effectively a last-ditch failsafe option.

Result of run():

  • We should probably preserve the current behavior that KeyboardInterrupt "wins" and pushes everything else to __context__. That makes it easy to catch KeyboardInterrupt reliably. None of the other failure exceptions are likely to be specifically caught.
  • For everything else, I probably favor bundling everything (including the main task error) into a single MultiError, so that once we have ExceptionGroup we can label where each thing came from. I don't feel strongly on this one though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants