Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A nursery might inject a Cancelled even when none of the tasks received a Cancelled #1457

Closed
vxgmichel opened this issue Apr 19, 2020 · 15 comments

Comments

@vxgmichel
Copy link
Contributor

vxgmichel commented Apr 19, 2020

I noticed that nurseries might inject Cancelled exception even when none of the tasks have received a Cancelled. I have two different examples reproducing this issue.

A nursery with a single task raising an exception

The following code runs a nursery with a single task raising an exception while the outer scope has been cancelled:

async def main():
    with trio.CancelScope() as scope:
        try:
            async with trio.open_nursery() as nursery:
                scope.cancel()
                raise RuntimeError
        except trio.MultiError as exc:
            print("Exceptions:", exc)
            print("Cancelled caught:", nursery.cancel_scope.cancelled_caught)

It prints:

Exceptions: RuntimeError(), Cancelled()
Cancelled caught: False

The Cancelled() exception is injected here:

trio/trio/_core/_run.py

Lines 845 to 851 in fbe3bd3

# Nothing to wait for, so just execute a checkpoint -- but we
# still need to mix any exception (e.g. from an external
# cancellation) in with the rest of our exceptions.
try:
await checkpoint()
except BaseException as exc:
self._add_exc(exc)

A potential fix could to be to use cancel_shielded_checkpoint() instead of checkpoint().

A nursery with a two tasks, both raising an exception

The following code runs a nursery with two tasks both raising an exception while the outer scope has been cancelled:

async def main():
    with trio.CancelScope() as scope:
        try:
            async with trio.open_nursery() as nursery:
                nursery.start_soon(_raise, RuntimeError(1))
                scope.cancel()
                raise RuntimeError(2)
        except trio.MultiError as exc:
            print("Exceptions:", exc)
            print("Cancelled caught:", nursery.cancel_scope.cancelled_caught)

It prints:

Exceptions: RuntimeError(2), Cancelled(), RuntimeError(1)
Cancelled caught: False

The Cancelled() exception is injected here:

trio/trio/_core/_run.py

Lines 835 to 840 in fbe3bd3

# If we get cancelled (or have an exception injected, like
# KeyboardInterrupt), then save that, but still wait until our
# children finish.
def aborted(raise_cancel):
self._add_exc(capture(raise_cancel).error)
return Abort.FAILED

I don't see the point of injecting a Cancelled here, since the cancellation is going to be ignored anyway: isn't it a rule of trio that a cancelled (in the sense of cancel_called) but completed operation should not raise a Cancelled?

Raising Cancelled means that the operation did not happen.
https://trio.readthedocs.io/en/latest/reference-core.html#cancellable-primitives

I feel like it would be more consistent with the trio cancellation semantics to see the nursery cleanup as:

try:
    yield nursery
finally:
    with trio.CancelScope(shield=True):
        await nursery.join_tasks()
@vxgmichel vxgmichel changed the title Nurserys injecting Cancelled even when none of the tasks received a Cancelled A nursery might inject a Cancelled even when none of the tasks received a Cancelled Apr 19, 2020
@njsmith
Copy link
Member

njsmith commented Aug 25, 2020

This bit @catern today: https://gitter.im/python-trio/general?at=5f447dd3ec534f584fbcf7d7

I think this issue is correct, and nursery __aexit__ should only raise Cancelled if one of the child tasks raises Cancelled.

catern added a commit to catern/trio that referenced this issue Aug 26, 2020
Previously, a Nursery would checkpoint on exit, regardless of whether
it was running anything. This is a bit annoying, see python-trio#1457, so let's
change it to not checkpoint on exit. Now it won't be a checkpoint at
all if there's no tasks running inside.

However, this doesn't fully solve python-trio#1457, because we'll still inject a
cancel even if none of the child tasks raise Cancelled.

This seems to break
trio/_core/tests/test_asyncgen.py::test_last_minute_gc_edge_case
such that it no longer ever hits the edge case.

That seems like a pretty complicated test, so maybe @oremanj can say
better why exactly it's breaking with this change.
@oremanj
Copy link
Member

oremanj commented Aug 26, 2020

I think exiting a nursery should always be a schedule point, and should only be a cancellation point if the nursery didn't run any tasks (note this is different from the "no tasks were still running when we exited the async with block" in #1696). That means an async function that opens a nursery and runs zero or more tasks in it will have at least one full checkpoint as long as each individual task had at least one full checkpoint. And that makes it easier to reason about whether we're upholding the "all async functions provided by Trio execute at least one checkpoint" property.

@catern
Copy link
Contributor

catern commented Aug 26, 2020

For me, my problems wouldn't be solved if the nursery injected a Canceled when none of the tasks exited with a Canceled. I think injecting a cancel in that way makes it harder for user code to provide the correct Canceled semantics...

Raising Cancelled means that the operation did not happen.

...as mentioned at the start of this issue. Most code using nurseries, that wants to provide the correct Cancelled semantics, will need to catch Cancelled, as I do here https://github.com/catern/rsyscall/blob/concur/python/rsyscall/tasks/clone.py#L83 and here https://github.com/catern/rsyscall/blob/concur/python/rsyscall/concurrency.py#L213 and probably need to do in other places as well.

So I think nurseries shouldn't be a cancellation point at all, because otherwise I have this pain. I don't really care about whether it's a schedule point, but it seems unnecessary for it to be a schedule point, so I'd also get rid of that.

@oremanj
Copy link
Member

oremanj commented Aug 26, 2020

My goal is to depart as little as possible from Trio's semantics of "every async function in the trio namespace executes at least one checkpoint, and every async context manager executes at least one on either entry or exit". I don't care that much what happens in a nursery that has already run any tasks, because those tasks should have provided the checkpoint. I do care that an infinite loop of "run all these tasks in a nursery" doesn't hang the program if there are no tasks. (And that requires that a nursery-containing-no-tasks be a full checkpoint.)

@catern
Copy link
Contributor

catern commented Aug 26, 2020

Hm, okay, how about a checkpoint on nursery entry, then?

@oremanj
Copy link
Member

oremanj commented Aug 26, 2020

Checkpoint on nursery entry is a non-starter because there is lots of code that relies on the current behavior of checkpointing on exit only. "Start a task in a cancelled scope" is very different from "don't even start the task", especially when you consider that aclose() methods have well-defined semantics (that are not "do nothing") when run in a cancelled scope.

It sounds like we're in agreement that a nursery that runs any tasks shouldn't raise Cancelled if none of the tasks raised Cancelled. Can you explain why the case of a nursery that runs no tasks is important to you?

I would even be willing to consider letting a nursery that runs no tasks skip the checkpoint if any checkpoints occurred inside the async with block, though that's getting a little magical for my tastes.

@njsmith
Copy link
Member

njsmith commented Aug 26, 2020

@oremanj How do you feel about this not containing any checkpoints?

async def noop():
    pass

async with trio.open_nursery() as nursery:
   nursery.start_soon(noop)

?

Every nursery that contains a child task has to contain at least one schedule point (proof: the nursery can't exit until the child task has been scheduled), so adding a schedule point to the __aexit__ for always-empty-nurseries does strengthen that guarantee to "every nursery contains at least one schedule point". But adding a cancel point to always-empty-nurseries doesn't.

I would hesitate to add a checkpoint to nursery entry because:

async def handler(stream: Stream):
    async with stream:
        ...

stream = await open_stream(...)
async with trio.open_nursery() as nursery:
    nursery.start_soon(handler, stream)

Right now, the above code guarantees that stream will be closed; if we add a checkpoint to nursery entry then that guarantee goes away.

@oremanj
Copy link
Member

oremanj commented Aug 26, 2020

@njsmith I'm fine with the noop example not containing any cancel points, because I think of nurseries as a compositional primitive: if the elements (tasks) obey some property of interest, then their composition should too. I'm more worried about patterns like:

async def run_all(funcs):
    async with trio.open_nursery() as nursery:
        for func in funcs:
            nursery.start_soon(func)

and then await run_all([]) now not executing any checkpoints. It's not a hard no, just a concern.

Maybe a more realistic example of "has child tasks, but can't actually take a cancellation" involves child tasks that run in a shielded scope.

@catern
Copy link
Contributor

catern commented Aug 26, 2020

It sounds like we're in agreement that a nursery that runs any tasks shouldn't raise Cancelled if none of the tasks raised Cancelled. Can you explain why the case of a nursery that runs no tasks is important to you?

Ah, I didn't understand. Yes, that's fine - I don't care about the case of a nursery that runs no tasks, it can have a cancellation point or not.

I thought you were referring to the situation where a nursery was exiting when all its tasks had already exited. But I see now that we agree that in that situation, a nursery should not provide a cancellation point.

await run_all([])

IMO, this shouldn't raise Cancelled, because it successfully performed its task: It did nothing, successfully. But I don't feel too strongly about it, since it isn't an issue for me right now.

@njsmith
Copy link
Member

njsmith commented Aug 26, 2020

I think of nurseries as a compositional primitive: if the elements (tasks) obey some property of interest, then their composition should too.

Yeah, that's also the intuition for why, maybe, an always-empty nursery should not do a checkpoint :-)

I.e. in general we expect these to be mostly equivalent:

await foo()

async with trio.open_nursery() as nursery:
    nursery.start_soon(foo)

Right now they're not, b/c the nursery injects a checkpoint. Injecting a schedule point is unavoidable, but more-or-less harmless, since schedule points have minimal effect on user-visible behavior. Injecting a cancel point is avoidable, though, and as this issue shows it causes practical problems, and it's just kind of surprising b/c of how it breaks the equivalence above.

Generalizing, we expect these to be more-or-less similar:

for f in fs:
    await f()

async with trio.open_nursery() as nursery:
    for f in fs:
        nursery.start_soon(f)

Obviously the latter runs the functions concurrently rather than sequentially, and there are some semantic differences that follow directly from that – in particular, it means that if two calls raise exceptions, we have to report them both, inside of the first one breaking out of the loop. And again it forces at least one schedule point. But again again, you wouldn't expect it to inject a cancellation point.

So by this argument, you would expect these two trivial cases to be equivalent:

pass  # don't call anything

async with trio.open_nursery() as nursery:
    pass  # don't start anything

...but of course that's exactly what putting a checkpoint in always-empty-nurseries would change, so we have two sets of principles that we all agree on but that contradict each other here.

I guess one way to look at it is: is a nursery block an "operation" (so should be a checkpoint), or just scaffolding for arranging other operations?

@oremanj
Copy link
Member

oremanj commented Aug 26, 2020

Mm, I see your point. We've generally taken the point of view that an async context manager must perform a checkpoint-y "operation" on either entry or exit (or both), but I think nurseries deserve to break that rule if anyone does. I'm happy for nursery exit to be an unconditional schedule point and unconditional lack of a cancellation point, if that sounds good to you. (We could drop the schedule point if the nursery has ever started a task, but I'm not sure that helps any practical use case enough to pay for the weirdness -- dropping it is maybe better for performance, but that seems better served by having a more general mechanism for dropping schedule points if we've scheduled recently enough.)

@njsmith
Copy link
Member

njsmith commented Sep 1, 2020

I'm happy for nursery exit to be an unconditional schedule point and unconditional lack of a cancellation point, if that sounds good to you.

Let's do it.

We could drop the schedule point if the nursery has ever started a task, but I'm not sure that helps any practical use case enough to pay for the weirdness -- dropping it is maybe better for performance, but that seems better served by having a more general mechanism for dropping schedule points if we've scheduled recently enough.

Yeah, let's not bother with trying to micro-optimize this right now. (And yeah, eliding schedule points if we've scheduled recently is probably a good idea, but orthogonal to the rest of this.)

catern added a commit to catern/trio that referenced this issue Sep 1, 2020
Previously, a Nursery could raise Cancelled even if all its children
had completed successfully.

This is undesirable, as discussed in python-trio#1457, so now a Nursery will
shield itself from cancels in __aexit__.

A Nursery with children can still be cancelled fine: if any of its
children raise Cancelled, then the whole Nursery will raise Cancelled.
If none of the Nursery's children raise Cancelled, then the Nursery
will never raise Cancelled.

As another consequence, a Nursery which never had any child tasks will
never raise Cancelled.

As yet another consequence, since an internal Nursery is used in the
implementation of Nursery.start(), if start() is cancelled, but the
task it's starting does not raise Cancelled, start() won't raise
Cancelled either.
@catern
Copy link
Contributor

catern commented Sep 1, 2020

I went ahead and made the additional changes in #1696 but there are some additional implications.

As yet another consequence, since an internal Nursery is used in the
implementation of Nursery.start(), if start() is cancelled, but the
task it's starting does not raise Cancelled, start() won't raise
Cancelled either.

See the updated tests in the PR as well. I think I argue that this is correct.

@catern
Copy link
Contributor

catern commented Sep 1, 2020

Incidentally, all this also reduces the rate of MultiErrors, which is nice.

@richardsheridan
Copy link
Contributor

Fixed in #1696

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants