New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idea: unbound cancel scopes #607

Open
njsmith opened this Issue Aug 16, 2018 · 11 comments

Comments

Projects
None yet
3 participants
@njsmith
Member

njsmith commented Aug 16, 2018

I've recently run into a few places where I want a cancel scope for some code that may or may not already be running. For example, in pytest-trio, if a fixture crashes you want to cancel the main test... but these run in different tasks, so it's tricky to find the main task's cancel scope and put it somewhere that a fixture can get at it, without race conditions.

It's possible to create an object that sort of acts like a cancel scope, but where you can call cancel before or after the code inside the scope starts running. But it's fairly tricky to get all the cases right, e.g.:

@attr.s
class UnboundCancelScope:
    cancel_called = attr.ib(default=False)
    _cancel_scope = attri.ib(default=None)

    def cancel(self):
        self.cancel_called = True
        if self._cancel_scope is not None:
            self._cancel_scope.cancel()

    def __enter__(self):
        self._cancel_scope = trio.open_cancel_scope().__enter__()
        if self.cancel_called:
            self._cancel_scope.cancel()

    def __exit__(self, *args):
        return self._cancel_scope.__exit__(*args)

# Creation:
unbound_cancel_scope = UnboundCancelScope()

# Entering:
with unbound_cancel_scope:
    ...

Maybe we should make this just... how cancel scopes work, always? Right now open_cancel_scope is a context manager, so it forces you to immediately enter the scope it creates. But we could reinterpret it as returning an unbound cancel scope object, and then the with cancel_scope: ... as entering that scope – it'd even be backwards compatible!

Implementation-wise, I think it'd be almost trivial. The one thing to watch out for is that it'd become possible to attempt to re-enter a scope that you're already inside, which would be complicated (e.g. instead of keeping a set of which tasks are inside the scope, we'd have to keep a dict of task → refcount). For now we should just error out if someone tries to do this. (OTOH, I think having multiple independent tasks entering the same scope is fine and would Just Work.)

Maybe we should also make CancelScope public? Right now it's hidden in order to keep the constructor private, but that would be unnecessary in this approach – in fact open_cancel_scope(...) would just be return CancelScope(...), so maybe we'd even want to deprecate it or something.

One limitation of this approach is that cancelled_caught would become ambiguous if multiple tasks can enter the same scope. It might not matter.

Alternatives

There's a larger design space here of course. Cancel scopes are inspired in part by C#'s cancellation system, which has "cancel sources" – which let you call .cancel(), and set deadlines – and "cancel tokens" – which are read-only objects that let you check whether the corresponding source has been cancelled and what its deadline is. You can also combine multiple tokens together to create a new cancel source, that automatically becomes cancelled when any of the original tokens are cancelled. (I'm not sure why this creates a new source, rather than creating a new token. I think it doesn't matter which way you define the API though, each version can basically be implemented in terms of the other. Also, for some reason C# doesn't actually provide any API for querying for the current deadline given a source or a token, but this is silly so I'm going to ignore it.)

In Trio's current system, cancel scopes = cancel sources, and there is no reified object corresponding to cancel tokens – they're implicit on the cancel stack associated with a task, and you can query this implicit state using current_effective_deadline(). So in addition to introducing the idea of an "ambient" cancel token, we're also quite aggressive about collapsing together the different ideas here.

If we wanted to fully decompose the space, you can imagine operations:

  • create a cancel source (like an unbound cancel scope, in the proposal above)
  • given a cancel source, produce a cancel token
  • use a with block to bind a given cancel token to the current task, which produces a "cancel binding"
  • given a "cancel binding", query for whether the code inside the block was actually cancelled (the cancelled_caught attribute)
  • given a task's ambient context, produce a new cancel token that becomes cancelled if the original context becomes cancelled.
  • given a cancel token, query for current deadline and cancelled state

This is almost certainly too fine-grained a decomposition, but I find it useful to see it all laid out like that... and it does allow for things we can't do right now, like check whether another task's ambient context has been cancelled (by extracting its cancel token and then querying it later). Or a minor feature that curio has, and I envy: if you enter a thread with run_sync_in_worker_thread, and then the thread comes back into trio with BlockingTrioPortal.run, and the original run_sync_in_worker_thread is cancelled... it would be neat if this caused the code inside the BlockingTrioPortal.run call to raise a Cancelled error that propagated all the way back out of trio, through the thread, and back into trio.

Though actually... the "fully-decomposed" design is still not powerful enough to allow that! I was thinking you could do it by having run_sync_in_worker_thread capture the ambient token and then inside BlockingTrioPortal.run we could do with the_ambient_token... but this doesn't quite work, because it would create a new binding. If run_sync_in_worker_thread was cancelled, then the code inside the BTP.run call would raise Cancelled but that exception would be caught at the with the_ambient_token, instead of propagating into the thread and then back into trio. Cancelled exceptions are associated with cancel bindings, not cancel tokens or cancel scopes. Hmm! Well, at least the decomposed design gives us useful vocabulary :-).

It's not clear whether propagating cancellation across threads is really that important. But if we do want to do it... [longish text split off into #606, since it doesn't seem to be too related to this issue after all].

Other things to consider: as noted in #285, we might want to capture actual exceptions for each binding, which has the same issues as cancelled_caught, but even more so.

I'm not sure how shielding fits into the above picture. In the fully-decomposed picture, I think a shield would be a separate kind of thing, where you just do with shielded(): ..., since it's above managing the binding stack. Having a .shield attribute on cancel sources or cancel tokens doesn't make much sense conceptually.

Given the above, I'm having trouble thinking of cases where capturing a task's ambient context state in the form of a token is actually useful.

I'm not sure how useful the source/token distinction is for trio, given that the actual message delivery is via the ambient state (unlike C# where the token object is important because you have to manually examine it all the time to check if you're cancelled). And current_effective_deadline is sufficient for examining the current ambient state. Also, since a token's functionality is a subset of a source's functionality, we always add tokens later without breaking anything. (So e.g. we'd still have to support with source: ..., but that's fine, it'd just be a shorthand for with source.token: ....)

So I think the 'unbound scopes' idea captures most of the valuable parts of the "fully decomposed" design, except that I'm a little nervous about bindings – it's a little weird to have cancelled_caught / #285 state and shielding associated with the scope rather than with a with block.

If CancelScope became a public class with a public constructor, and we want to transition from with open_cancel_scope() as scope: ... to scope = CancelScope(); with scope: ... as being the primitive operations... then we have the opportunity to make scope.__enter__ return whatever we want, it doesn't have to return self. It could return something like a binding object. Or return None for now, and we reserve the right to add something like a binding object later.

This would cause some disruption for move_on_after etc., though, since they return the actual cancel scope object, and it is fairly ordinary to call .cancel() on this object, as well as to check .cancelled_caught. I suppose if we had to we could in the future declare that there's both CancelScope.cancelled_caught and CancelBinding.cancelled_caught, and the former says something like "did any binding catch something" and the latter is more specific.

For shielding... it's a bit weird to have a shielded cancel scope you enter later, or in multiple tasks, or where your scope's shield attribute can get toggled by someone somewhere else who you wanted to let cancel you... but maybe there's no harm in allowing these things? I guess it's worth at least taking a peek at how hard it would be to split shielding off into its own thing. FWIW, currently every non-test use of shielding in trio is exactly with trio.open_cancel_scope(shield=True): .... (This would also let us move shielding into hazmat!)

Possibly the shielding discussion should be split into a separate issue, too, since it's kind of orthogonal to the unbound cancel scopes idea. The cancelled_caught part is more closely related.

CC: @1st1, on the theory that you're probably thinking about similar issues

@njsmith

This comment has been minimized.

Member

njsmith commented Aug 17, 2018

Shielding also introduces quite a bit of complexity internally – especially since for any cancel scope, at any time, you can mutate it's .shield attribute. AFAICT there aren't any actual use cases for this. So I'm leaning towards splitting it off. To start with, we can keep the internals the same, but change the public APIs to give us more flexibility later. I'm thinking:

  • make the CancelScope class public, with a constructor that takes deadline= but not shield=. with cancel_scope works, and the __enter__ method returns None.
  • add a shield-only context manager to trio.hazmat (with open_cancel_shield()? with cancel_shield()? with cancel_shielded()?). This __enter__ returns None.
  • deprecate the CancelScope.shield attribute
  • deprecate open_cancel_scope

For now we can do all this without changing the internals (e.g. open_cancel_shield will still make a CancelScope object internally, but this is a hidden implementation detail).

That still leaves the question about cancelled_caught and #285. We could make it like, with cancel_scope as cancel_status: ... or with cancel_scope.enter(save_exceptions=True) as cancel_status: ..., and then instead of cancel_scope.cancelled_caught, it could be cancel_status.cancelled_caught. That'd be fairly pleasing in isolation. The big issue is what to do with move_on_after and friends. We could tell people to suck it up, and make them yield cancel status objects instead of cancel scopes:

@contextmanager
def move_on_at(deadline, *, save_exceptions=False):
    cancel_scope = CancelScope(deadline=deadline)
    with cancel_scope.enter(save_exceptions=save_exceptions) as cancel_status:
        yield cancel_status

And then anyone who wants to set a deadline, AND ALSO be able to adjust the deadline or explicitly cancel the scope, would need to drop down to the lower-level API instead of using move_on_after. So for example:

# Currently you can do this:
with move_on_after(10) as cancel_scope:
    ...
    # Whoops, need some more time:
    cancel_scope.deadline += 10
    ...
if cancel_scope.cancelled_caught:
    ...

# But if we went this way, you'd have to go fully explicit:
cancel_scope = CancelScope(deadline=trio.current_time() + 10)
with cancel_scope as cancel_status:
    ...
    # Whoops, need some more time
    cancel_scope.deadline += 10
    ...
if cancel_status.cancelled_caught:
    ...

That's not so terrible, but it's a bit cumbersome. This case is probably relatively rare; in trio's own source code, the only thing we do with the cancel scope from move_on_after is to check its cancelled_caught attribute.

We could make it so the cancel_status object re-exports CancelScope's attributes, while adding a few more. That'd be .deadline, .cancel(), and .cancel_called. Then move_on_after and friends would basically continue to work the same way they do now. The downside is that the distinction between "cancel scope" and "cancel status" is already pretty subtle and technical (= confusing), and this would make the distinction even more confusing – the first time you reach past the high-level helpers to use a raw CancelScope, you suddenly have this object that's almost-but-not-quite the same as the objects you've used before.

I'm not real happy with any of these options. So let's back up: if all the solutions are bad, maybe we can change the problem. The issue here is that if multiple tasks enter the same cancel scope, then the exception-handling related attributes become ambiguous. What if we ... don't let multiple tasks enter the same cancel scope? Up above I said that this would be fine, and except for this wrinkle it would be; I also suspect it would be useful in some cases. (E.g., you could have a cancel scope that you use to shut down all tasks of type X, where each task of type X enters it when it starts up.) Can we do an end run around that problem... what if we say each cancel scope can only be entered once, but you can somehow clone a cancel scope and then each clone can be entered once?

Maybe: cancel_scope_2 = CancelScope(bind=cancel_scope_1) creates a new cancel scope that automatically becomes cancelled whenever cancel_scope_1 becomes cancelled. Or call it link=, maybe? There are some details to work out (unidirectional versus bidirectional linkage, __init__ argument versus new_scope = orig_scope.split(), etc.). But in general this seems like a plausible approach... so maybe we should restrict CancelScope objects to only be entered once for now, and later on we can figure out how to extend this if there's a desire to do so.

If that's the plan, we could potentially go back to having CancelScope.__enter__ return self. Maybe better to wait on that a bit though to keep our options open.

@njsmith

This comment has been minimized.

Member

njsmith commented Aug 17, 2018

Another reason why it'd be nice not to break up the cancel scope object is that then we'd have to figure out what to do with nursery.cancel_scope. (I guess we could have both nursery.cancel_scope and nursery.cancel_status or something, it's not insoluble, but it is additional complication.)

@njsmith njsmith referenced this issue Aug 17, 2018

Merged

Rework fixtures #50

@xgid

This comment has been minimized.

xgid commented Aug 19, 2018

@njsmith Really interesting ideas! I specially like the "full decomposed design" route as a way to think about it all and explore possibilities. I also agree in generat that it's somehow necessary to "decompose" and "expand" de possibilities before trying to "simplify" the "user experience" or trying to simplify the API too soon (I mean, to soon in the design process. I see it as a kind of "too-early optimization" in the design process, with all that means!).

As further help to reason about what "elements" and "responsabilities" are needed and why, I think it would be useful to have a set of emblematic use cases or examples making different uses of the "available" (fully decomposed) APIs. You have already given some examples above, but it would be useful to explore some other use cases trying to make use of the new "available" elements. Some ideas in the form of questions could be:

  • Given a CancelScope, what if I needed to ask it which task (or which "cancel binding") triggered the cancellation of the scope?
  • What if a "cancel binging" could store some more "data" (or meta-data) about the moment it was created (which may not be the same moment that the CancelScope was created)? Or about the task "it was created for"?
  • What if I wanted to be able to shield a CancelScope from "some kind of "cancel bindings" and not others"?? I think this last sentence makes no sense at all if we see the "cancel bindings" (or "cancel tokens") as simple references to the "cancel scope" but... what if we see them as some kind of "access rights" to cancel the CancelScope ("cancel rights" may be a better term here). Could shielding then be expanded to be some kind of chmod for cancel scopes? Does it help to think in new directions?

@njsmith njsmith referenced this issue Aug 21, 2018

Open

[WIP] Trio event loop #298

4 of 6 tasks complete
@njsmith

This comment has been minimized.

Member

njsmith commented Aug 21, 2018

This PR is another use case where unbound scopes would help, I think: urwid/urwid#298
Interesting detail: I think it might actually want a cancel scope that you can exit and then re-enter...

@njsmith

This comment has been minimized.

Member

njsmith commented Aug 21, 2018

@xgid Interesting questions!

Given a CancelScope, what if I needed to ask it which task (or which "cancel binding") triggered the cancellation of the scope?

Generally, cancellation is triggered by either a timeout or someone calling .cancel() on the scope. (Or in C#'s terminology, on the "source".) It's not associated with any binding or task in particular – binding a scope to a task makes it so that when a cancellation occurs, that task is notified.

What if a "cancel binging" could store some more "data" (or meta-data) about the moment it was created (which may not be the same moment that the CancelScope was created)? Or about the task "it was created for"?

What sort of data?

What if I wanted to be able to shield a CancelScope from "some kind of "cancel bindings" and not others"?? I think this last sentence makes no sense at all if we see the "cancel bindings" (or "cancel tokens") as simple references to the "cancel scope" but... what if we see them as some kind of "access rights" to cancel the CancelScope ("cancel rights" may be a better term here). Could shielding then be expanded to be some kind of chmod for cancel scopes? Does it help to think in new directions?

You could in principle have a shield that protects a block of code from specific surrounding scopes/bindings but not others. I'm a bit nervous about this because it feels like it would encourage lots of confusing/fragile hacks, and is complicated to implement. Generally the idea of functions is to be an abstraction over callers – and trio's cancellation system nudges you in that direction by not allowing fine-grained access to information about caller scopes.

This does remind me a bit of #147, which discusses the idea of making "soft cancellation" a new state that's in between not-cancelled and cancelled – by default it wouldn't do anything, but a particular piece of code could request that soft-cancellation be upgraded to full cancellation. This could be useful for implementing graceful shutdown. So that's adding additional granularity in a slightly different way: the canceller can send slightly different information (though you still can't select on source of the cancellation).

I suppose a super-fancy version of this would be to allow user-defined cancellation states, like allow scope.cancel(tag=X) to mark it "cancelled in the X manner", and then code could opt-in or out-out to that on a tag-by-tag basis.

I'm having trouble coming up with anything that would justify so much complexity, though :-)

@smurfix

This comment has been minimized.

Contributor

smurfix commented Aug 22, 2018

Two issues I frequently struggle with:

  • figure out where the cancellation originated (store the stack trace from .cancel somewhere?)
  • A fairly common pattern is
    try:
        await something()
    finally:
        await cleanup()

Unfortunately, cleanup will probably do no such thing when running inside a cancelled scope.

@njsmith

This comment has been minimized.

Member

njsmith commented Aug 23, 2018

@smurfix

figure out where the cancellation originated (store the stack trace from .cancel somewhere?)

Interesting problem! It has some challenges because of the cost of capturing the stack... and I think it's otherwise orthogonal to the issues here, so can you open a separate bug so we can discuss it properly?

Unfortunately, cleanup will probably do no such thing when running inside a cancelled scope.

Yeah, this is one of the downsides of "stateful cancellation". I think stateful cancellation is still (probably?) the right way to do things, but it does force us to come up with solutions to issues like this. One thing you're probably aware of is AsyncResource's guarantee that aclose always succeeds even if cancelled; so if your block is async with stream: ..., then it does work inside a cancelled scope. (And in some cases, e.g. SSLStream.aclose, we take special measures to make sure of this.) Maybe there are other things we can do beyond that too? If you have ideas, or just think this is an important pain point that hasn't been solved adequately and should be discussed more, then maybe open an issue for that too?

@xgid

This comment has been minimized.

xgid commented Aug 25, 2018

It's not associated with any binding or task in particular – binding a scope to a task makes it so that when a cancellation occurs, that task is notified.

I see. I totally missunderstood the idea of the "cancel binding"! I thought it was a sort of bidirectional link: for the task to be able to send a "cancel" to a cancel scope and for the cancel scope to notify the cancellation (in-process) to the task. According to your comments, it is only for the "Cancel scope to task" notification means.

What sort of data?

My idea was for data that the cancel scope could use to provide information about the task that requested the cancellation or to even block the cancellation process according to some properties of this data... but that only made sense when I thought that the cancel binding was bidirectional. Now that I know your idea of a cancel binding I don't see any useful data to "store" there.

Generally the idea of functions is to be an abstraction over callers – and trio's cancellation system nudges you in that direction by not allowing fine-grained access to information about caller scopes.

I agree to not giving access to information about "caller scopes"... but I don't see a problem in letting the caller send to the cancel scope some information about him or about the cancellation "reason" when he invokes the cancel operation on the cancel scope. This is seeing the cancel operation as a method of the cancel scope that can be explicitly called.

So that's adding additional granularity in a slightly different way: the canceller can send slightly different information (though you still can't select on source of the cancellation).

That's more or less the idea. Although I don't understand your final: "though you still can't select on source of the cancellation". What do you mean?

I suppose a super-fancy version of this would be to allow user-defined cancellation states, like allow scope.cancel(tag=X) to mark it "cancelled in the X manner", and then code could opt-in or out-out to that on a tag-by-tag basis.

You can see it as "cancellation states" or just as "relevant information that the canceller wants to broadcast to the other tasks affected by the cancellation scope so they can better decide what to do". Does that make sense to anyone when seen this way?

I'm having trouble coming up with anything that would justify so much complexity, though :-)

First, it doesn't have to be complex. It is not complex if we see it just as a "way to send extra/contextual information to all tasks affected by the cancellation process" by doing it through the cancel scope itself.

And second, I insist in not trying to desperately simplify the overall design too soon. It would only deter creativity. We will have time for that. Let's give us some more time to find examples and situations where it may be useful. I neither have them yet, but let the river flow... and we'll see where it takes us.

@smurfix

This comment has been minimized.

Contributor

smurfix commented Aug 25, 2018

I don't see a problem in letting the caller send to the cancel scope some information about him or about the cancellation "reason" when he invokes the cancel operation on the cancel scope.

What's the use case? Why should a cancelled task need to know who triggered the cancellation or why? It might need to distinguish hard vs. soft cancel (terminate the connection vs. abort the current operation), but (a) that's a tightly-constrained value instead of an arbitrary reason, (b) often we just don't know (if a nursery cancels you because of an exception in another task, how would it know which type of cancellation to trigger?), (c) you can run the connection in a different cancel scope than the operation, thus the problem may not be that prevalent in the first place.

You want to distinguish different types of cancellation? you create different scopes and then decide which scope(s) to run the task under – @njsmith 's ideas already let you do that.

What's more, I don't want a task to be able to dynamically decide to ignore a cancellation, or what to do depending on an unconstrained tag value. It's a cancellation. You get cancelled. End of story. If you want to do something else, call it something else – i.e. add code to throw an exception into a scope. That would allow the code within the scope to recover, which a cancellation intentionally does not.

And second, I insist in not trying to desperately simplify the overall design too soon.

I disagree. The current design is "simple". We already are complicating it by splitting off a cancel_status from a cancel_scope.

Trio works because it intentionally limits things you're able to do, which enables new concepts to emerge (cf. "Go[to] Considered Harmful"). We shouldn't add interesting things to it before we have identified a use case that requires them. I don't see a use case for tags. Even if one should emerge, we can add it later. That's better than allowing people to (ab)use a tagged cancel for things cancellation was never intended to do.

@njsmith

This comment has been minimized.

Member

njsmith commented Aug 26, 2018

I thought it was a sort of bidirectional link: for the task to be able to send a "cancel" to a cancel scope and for the cancel scope to notify the cancellation (in-process) to the task. According to your comments, it is only for the "Cancel scope to task" notification means.

Oh, I see! Yeah, by "cancel binding" I just mean, like "a specific with cancel_scope statement". If a single cancel scope can be used with multiple withs, then it's useful to be able to talk about those as individuals – e.g. they might catch different exceptions.

but I don't see a problem in letting the caller send to the cancel scope some information about him or about the cancellation "reason" when he invokes the cancel operation on the cancel scope. This is seeing the cancel operation as a method of the cancel scope that can be explicitly called.

When initially designing cancel scopes, I considered trying to track the "reason" for a cancellation (e.g., timeout versus someone calling .cancel), or passing extra state like an exception type to inject. This gets really tricky, though, because: what if .cancel gets called twice with different reasons, oh and also the timeout expired? And everyone wants to use a different exception type? It's a mess. So instead, we treat cancellation as a simple boolean state, and that acts as a barrier between the mechanisms for mutating that state versus the mechanisms for notifying about that state.

That said, it would be possible to expand from a single boolean to like... multiple booleans. (The "tagged boolean" idea.)

Although I don't understand your final: "though you still can't select on source of the cancellation". What do you mean?

I mean you could say "I only care about cancel scopes tagged with BLAH", but not, "I only care about cancels from from cancel scope object X, not cancel scope object Y".

And second, I insist in not trying to desperately simplify the overall design too soon. It would only deter creativity. We will have time for that. Let's give us some more time to find examples and situations where it may be useful. I neither have them yet, but let the river flow... and we'll see where it takes us.

Sure, that kind of low-stakes exploring ideas is exactly what this kind of issue is for. But I'm just noting that this kind of complexity will need to find some pretty compelling use cases before it can graduate from "let the river flow" to something we actually do. Cancellation is inherently a viciously complicated thing to understand and use. Trio works really hard to make it as simple as possible, and it's still far from trivial. So when brainstorming my main goal is to find ideas that will make it easier to understand or solve common practical problems.

@njsmith njsmith referenced this issue Sep 2, 2018

Merged

Reduce extraneous exception frames #640

3 of 3 tasks complete
@njsmith

This comment has been minimized.

Member

njsmith commented Sep 20, 2018

In the discussion in #658, we realized that these two features seem to be essentially incompatible:

  • unbound cancel scopes, or "linked" cancel scopes (like at the bottom of #607 (comment))
  • the ability to wait for a cancel scope to finish executing

See #658 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment