Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential API changes to help users who don't realize parenting is a full-time job #136

Closed
merrellb opened this issue Apr 23, 2017 · 17 comments · Fixed by #375
Closed

Potential API changes to help users who don't realize parenting is a full-time job #136

merrellb opened this issue Apr 23, 2017 · 17 comments · Fixed by #375

Comments

@merrellb
Copy link

I've managed to block in a nursery scope several times (eg "while True:" loops) while learning trio, not realizing I had not hit __aexit__. For long-running processes, this will appear to succeed as we don't expect to exit until KeyboardInterrupt. However, this will have disabled all of the reaping machinery in __aexit__ and will be silently accumulating zombie tasks indefinitely.

The documentation is fairly clear on this so I am hesitant to suggest the need to add idiot-proofing, but if there is an easy way to detect the nursery is running but not in __aexit__ it might prevent some grief.

@njsmith
Copy link
Member

njsmith commented May 6, 2017

The challenge here is that we do want to allow for things like custom supervisors, where code inside the nursery takes over for __aexit__. For example, see trio._core._run.Runner.init, which is the actual root of trio's task tree. It's nursery supervises. the user's main task + system tasks, and it has a special bit of semantics: in addition to the normal rule that it cancels everything when any task crashes, it also cancels everything when the user's main task exits successfully.

But, it just occurred to me that because implementing a custom supervisor is a pretty unusual thing to be doing, we might be able to define some heuristic that catches most accidental misuses, and then if you intentionally want to block inside the nursery then there's a special flag you can enable. Like maybe it would make sense to have open_nursery by default raise an error if the body tries to execute a checkpoint, but you can do open_nursery(allow_checkpoints=True) to allow it.

I guess the question then is whether "contains a checkpoint" is a good heuristic for "something is wrong"... thinking out loud here, the cases we want to catch are when someone starts doing arbitrary amounts of work directly inside the nursery. That's almost never the right thing to do because "parenting is a full time job". In trio, if you're doing arbitrary amounts of work then you pretty much have to execute some checkpoints in the process. It's possible to avoid this (e.g. time.sleep(10000)), but there's not much trio can do about this in general (it literally has no control while the sleep is running), it's a much more general problem than just nurseries, and people are relatively unlikely to do this by accident. So "no checkpoints" does probably rule out the cases we want to rule out.

Are there any good reasons to block inside a nursery scope (i.e., does this heuristic have false positives)? There are several tests in trio's test suite that do it, but they're a bit sloppy and mostly just do things like wait_all_tasks_blocked to make sure one child has blocked before spawning the next. I think I've been moving away from this over time too (moving the wait_all_tasks_blocked into the individual children instead of doing it in the parent). And I wouldn't mind having to pass an extra kwarg to open_nursery in the occasional test that wants to be sloppy. So... I think probably this is a reasonable heuristic?

(Note: there's another way of implementing the above idea that's equivalent except for cosmetics: make open_nursery a low-level API that most people don't use, and have the ordinary version be something like async with trio.multi_spawn() as spawner:, where multi_spawn enforces the no-checkpoint rule. Sort of like the distinction between open_cancel_scope vs move_on_after-and-friends. But I think I'm not a big fan; the "nursery" name is relatively evocative, and you need to use and pass around the object in very common situations. I guess it could be open_nursery() and open_supervisor_nursery() or something, if we want to follow Guido's heuristic that if a function has a bool argument and its callers always pass a constant, then it should be a separate function.)

My feeling for right now is that maybe it's best to wait a bit before making any decisions, since we're still learning about how we want to use open_nursery and about how much trouble newcomers have with this. But it does seem plausible that we could do something here and maybe should.

@njsmith
Copy link
Member

njsmith commented May 7, 2017

maybe it would make sense to have open_nursery by default raise an error if the body tries to execute a checkpoint, but you can do open_nursery(allow_checkpoints=True) to allow it.

Note: it occurs to me that if we decide we want these semantics, they'd be pretty simple to implement: it basically just means running the nursery body inside a pre-cancelled cancel scope. (I guess that would mean most nurseries had 3 associated cancel scopes, but oh well, it doesn't harm anything and 2 would be invisible to the user.)

Minor complications:

  • Getting the right exception when someone messes up. Could be lazy and do it like fail_after (user code gets Cancelled, then it's converted to BadParentError after we catch it), or could use some magic to make sure the right exception gets raised to start with (potentially less confusing... And not that hard actually since the exception type here would still be constant per scope)

  • current_deadline becomes wonky inside nursery scope bodies. I guess this is also easy to fix if we flag these scopes as having special behavior.

@merrellb
Copy link
Author

merrellb commented May 8, 2017

A few thoughts:

  • I would hesitate to add a new term like spawner or supervisor_nursery for such a minor distinction. There would definitely need to be some good examples of when you would want to block.
  • Do we need to worry about anyone ambitious enough to write a custom supervisor forgetting it is a full time job? :-)
  • Your idea of starting it as a canceled scope sounds slick as long as the complications you mention can be resolved to make it clear what is happening.
  • Would "callers always pass a constant"? My thought would be that this was enforced by default to prevent beginners from making this mistake.
  • Would it be clearer if open_nursery simply took a list of functions and arguments to spawn?

@njsmith
Copy link
Member

njsmith commented May 9, 2017

I would hesitate to add a new term ... Do we need to worry about anyone ambitious enough to write a custom supervisor

Yeah, the idea would be that writing a custom supervisor is a pretty rare and special thing to be doing, so it's OK if you need to go read about some API you never heard of before and that is trickier to use than the usual one. It could even live in trio.hazmat if necessary.

Would "callers always pass a constant"?

Right, what I mean is: any given nursery call is either going to look like async with open_nursery(allow_checkpoints=True) or async with open_nursery(allow_checkpoints=False), never async with open_nursery(allow_checkpoints=some_var). Like there's literally no case where this makes sense, b/c checkpoints are syntactically marked with await, so either you have them in the nursery body or you don't. [OK technically it's possible to have an await without a checkpoint, but we try to avoid that.]

Would it be clearer if open_nursery simply took a list of functions and arguments to spawn?

I experimented with this originally, but it turns out some extremely common cases require passing around the nursery object! And I'd rather not have two ways to do it if in practice everyone has to learn both.


All that said... I also realized what might well be a fatal flaw in this idea :-(. I haven't quite reached the point of implementing high-level helpers for servers (#73), but I think the idiom will be something like

async with trio.open_nursery() as nursery:
    info, task = await start_tcp_server(nursery, host, port, handler)
    print("Listening on {host}:{port}".format(**info))

The main idea here is that start_tcp_server (or start_http_server or...) doesn't return until the server is ready to accept connections. This solves a really common nasty problem where you have no way to know when a server has actually started so you can connect to it (e.g. this comes up in tests a lot), and even worse if you're waiting to get some information from it (e.g. if let it pick an unused port but now you want to know which port it actually ended up with).

For start_tcp_server, this means that it needs to allocate the socket, bind it, call listen, spawn the actual listener task. All of these things can be done synchronously, so actually start_tcp_server doesn't need to be async. But in general, bringing up a server may well require doing some I/O (maybe you need to read a config file or something), so I think we probably should assume that the generic version of this idiom does do await.

The alternative way to write this to avoid doing await inside the nursery scope would be something like:

async def bootstrap_server(nursery):
    info, task = await start_tcp_server(nursery, host, port, handler)
    print("Listening on {host}:{port}".format(**info))  # or whatever you want to do after the server is up

async with trio.open_nursery() as nursery:
    nursery.spawn(bootstrap_server, nursery)

But this seems... annoyingly convoluted? And I don't think it's too risky to call something like await start_*_server from inside the nursery scope – it's something that should finish in bounded time, so even if you're spawning multiple tasks and one of the earlier ones crashes while the later ones are being started, you'll still get to the nursery exit and notice that pretty promptly?

@njsmith njsmith changed the title Detecting if the nursery is not in __aexit__ Potential API changes to help users who don't realize parenting is a full-time job May 9, 2017
@merrellb
Copy link
Author

merrellb commented May 9, 2017

A few additional thoughts:

  • My thought on passing the nursery a list of functions would be that all nursery functions would be passed (and required to accept) the nursery as their first argument (which they could always ignore). Perhaps even an object oriented approach where we pass classes or instances and the nursery would be accessible through self.nursery. Although this gets away from some of the nice simplicity.
  • I am not sure I am sold on awaiting server startup being a special enough case to break the full-time parenting rule. Are we really that confident a poorly written handler wouldn't enter while True: loop and happily accept connections while never allowing __aexit__ to reap the tasks of closed connections?
  • I am surprised your second example is potentially too convoluted as I thought it was somewhat the textbook approach :-)
  • Not sure if you saw my pull request on h11 (https://github.com/njsmith/h11/pull/46) but I took a rough stab at changing the curio example to support a very simple trio tcp_server

@njsmith
Copy link
Member

njsmith commented Jul 7, 2017

I had an idea.

Simplified nurseries

We could make a version of nurseries where the code inside the async with block is treated as "just another supervised task". The semantics would just be, if a background task raises an exception, we immediately cancel the nursery whether the parent task is in __aexit__ or not. Since the body of the async with is inside the nursery's cancel scope, this would immediately cancel the body as well, along with all the tasks.

This could even make some simple kinds of supervision patterns easier. For example, the root task (init) in trio has a special supervision pattern where it's just like a normal nursery, except that there's one special task (the "main" task) where if it exits normally, then we cancel everything, just like when any task raises an error. Here's another case where this pattern comes up. With this "simplified" nursery design, this can just be written like:

async def init_task():
    async with trio.open_simplified_nursery() as nursery:
        try:
            return await main_task_fn()
        finally:
            nursery.cancel_scope.cancel()

This could be implemented on top of the current system by having a "simplified" nursery that immediately spawns a background task to watch for crashed children, which runs until the parent task hits __aexit__, at which point the background task goes away and the parent takes over again. (This is useful to have correct semantics for when the nursery should be closed -- you don't want the background task holding the nursery open after everything else has exited.) And the simplified nursery would only expose spawn and cancel_scope attributes – all the other stuff in the nursery interface is really just for custom supervisors.

Alternatively, if we baked this into the core, it could be even simpler: we could hard-code in the task exit path that when a task exits with an error, we immediately cancel the parent nursery, and that would avoid any need for background tasks etc. It could also do some auto-reaping, which basically would mean, we know that the only thing nurseries actually need to keep around is a list of raised exceptions, so we can throw away the rest of each child task object as it exits. This probably isn't a huge win in practice though because if a task raised an exception then the exception keeps its stack frames in memory, and the task object itself is not much beyond that. ...or, no, wait, it is important, because we have long-running nurseries that have an unbounded number of successful tasks cycle through them, so you can't buffer all those in memory forever. But that would all be doable; probably simpler implementation-wise than what we have now.

But.

The main problem with this design is that it really is simplified, in the sense that it's strictly less capable than what we have now: you can implement the simplified nursery on top of the original-flavor nursery, but not vice-versa. In particular, original-flavor nurseries allow you to implement arbitrary custom supervision policies, like "restart any tasks that exit", but simplified nurseries cannot; they just have the one hard-coded policy. We need a way to implement custom supervisors.

One tempting approach would be to have the "simplified" nursery be the default, and then if you want to implement a custom supervisor, you tell trio "hey I got this" and it disables the auto-reaping. Something like:

async with trio.open_nursery() as nursery:
    with nursery.custom_supervision:
        ...

An elegant further tweak would be to have the with nursery.custom_supervision block return the object that gives you monitor, zombies, etc., instead of having them available on the nursery object itself.

One thing I don't much like about this is that in practice, I think you'd always want to put the with nursery.custom_supervision line as the very first one inside the nursery block. The problem is that until you call this, the default supervisor is in play, so if you spawn some child and it crashes before you enter the supervision block, whoops, everything just died before your supervisor could start handling it. In principle you could avoid this by just being careful never to await in between the start of the nursery block and the start of the nursery.custom_supervisor block, but that's error-prone and doesn't really have any advantages AFAICT. (Is there any case where you'd want the auto-reaping behavior to start out enabled, then be disabled? Seems pretty weird.)

The API also makes it look like you could put the with nursery.custom_supervision line inside a child task and have them do the supervising, but again this has a nasty race condition where if another task crashes before the supervising child gets scheduled, then everyone gets cancelled immediately, and this race condition is impossible to avoid if you're spawning multiple tasks at once.

So this all argues that we should merge the open_nursery and enablement-of-custom-supervision into a single operation, and then Guido's rule against constant bool arguments says that we should have async with open_nursery() as nursery and async with open_supervised_nursery() as (nursery, supervisor). (Assuming we want to keep the monitor and other attributes off the nursery object, which seems wise.)

Internally hopefully these could share code, though?

We might also want to change the supervised nursery semantics a bit, like maybe you want the rule that if it hits __aexit__ with tasks still pending then that's an error, the supervisor was supposed to handle that? But eh, probably we still want to handle the case where supervisor code has a bug and itself crashes, and probably the simplest way to do that is still to require that the supervisor code run in the main task? Downside: you then can't emulate a simplified nursery using a supervisor nursery, because the simplified nursery can handle running arbitrary code in the main task. (This is why I'm worrying about putting supervisor loops into a child task.)

But if you want to handle supervisor crashes, and you want the put the supervisor in a child task... you have a real chicken-and-egg problem. I don't think there's any way to do it except by specifically nominating the supervisor task when setting up the nursery:

async with trio.open_nursery_with_background_supervision(supervisor_afn) as nursery:
    ...

Question: does putting supervisor code in a child task violate the no-secret-background-tasks design principle? It's not as bad as a real implicit background task because there is a clear point where it's spawned, when you're creating the supervised nursery. But concretely I'm thinking of code like this, where from the caller's perspective, you just open the special nursery, and this secretly spawns a task:

@asynccontextmanager
async def open_my_special_nursery():
    async with trio.open_supervised_nursery() as (nursery, supervisor):
        nursery.spawn(supervision_loop, supervisor)
        try:
            yield nursery
        finally:
            ... do... something... to push any main task exception to supervisor code and otherwise let it know that exiting is ok

...this is also rather tricky, isn't it, with the exception handling and everything. Maybe it really is better to just mandate that custom supervisors claim the job that starts them, so you can't willy-nilly spawn things from the body, and instead write something like

# On cancellation, raises Cancelled out of the nursery body, which triggers fallback cleanup
async def run_with_restarts_supervised(*thunks):
    restart_table = {}
    async with trio.open_supervised_nursery() as (nursery, supervisor):
        for thunk in thunks:
            task = nursery.spawn(thunk)
            restart_table[task] = thunk
        async for task_batch in supervisor.monitor:
            for task in task_batch:
                # obviously in a real version of this we'd want rate limiting etc., this is just a toy
                # plus some kind of logging of why the task exited
                thunk = restart_table.pop(task)
                replacement_task = nursery.spawn(thunk)
                restart_table[replacement_task] = thunk

# Usage
await run_with_restarts(thunk1, thunk2)

It would be kinda cool though to be able to do

async with open_restart_nursery(**config) as nursery:
    nursery.spawn(some_fn, *args, restart_limit=10)

Oh, and of course, you need the nursery object to be able to do things like start_tcp_server(nursery). So yeah, we really do need to be able to run stuff in the main task, don't we. And if the default nurseries can run arbitrary blocking code in the main task, it's pretty weird if custom ones can't :-(

I guess the designated task version would look like

async def supervisor_afn(nursery, supervisor_controller):
    ...

async with trio.open_supervised_nursery(supervisor_afn) as nursery:
    ...

...and maybe there'd be some special way we report issues in the main task to the supervisor function?

Not that this makes any sense for a nursery that restarts child tasks, since you can't restart the main task.

Are there actually any other use cases for custom supervision logic?

Ignore and log errors? Can be done with a simplified nursery + a custom spawn that wraps child tasks in a try/except Exception. Or just not letting errors escape in the first place.

Googling for erlang custom supervisors:

  • http://erlang.org/pipermail/erlang-questions/2009-January/041092.html (states the "parenting is a full time job" rule; see also the surrounding thread)
  • It sounds like the main reason people write custom supervisors is when they have some per-task state they want to track across restarts, when they need to do rate limiting on process spawns, or when they need to cut out some of the OTP behavior to get more scalability.

Meh, lots of questions here. I'm going to sleep on it.

@njsmith
Copy link
Member

njsmith commented Jul 16, 2017

A few more thoughts:

The mess in #214, and the way I resolved it by making a bunch of the tests in test_ssl.py ignore the parenting-is-a-full-time-job rule, seems like more evidence that the "simplified" parenting idea is a good one.

Here's another idea for API support for custom supervisors: add a spawn_supervisor method, which is like spawn but treats the spawned task specially in two ways: (1) it gets access to the supervisor functionality (maybe this would also let us avoid even allocating the UnboundedQueue for regular nurseries, IIRC that's by far the heaviest part of a nursery), (2) as long as this task is alive, dead tasks go into the queue; if it dies, we revert to the default dead task handling. (We might also want to raise an error if the supervisor exits while there are still tasks living... not sure.)

There are two issues that I still see, both revolving around the asymmetry between the parent task and the child tasks.

First, if we have a supervisor child, then how do we signal to them that the parent task is still inside the nursery block, and how do we signal when it leaves? That block is this weird thing, like a child task but not quite the same.

And second, what if someone does want to put their supervision logic inside the parent task's nursery block? The whole point of this thread is that it's convenient to be able to put logic there; the asymmetry is part of the appeal. And supervisor logic is inherently a special "child", so maybe it's natural to want these two kinds of asymmetry to line up? Like I think the example supervisor in the docs is a "race" function that spawns several children and then whichever one finishes first gets returned. This would look like:

# supervisor in parent task
async def race(*afns):
    async with trio.open_nursery() as nursery:
        for afn in afns:
            nursery.spawn(afns)
        with nursery.supervisor_powers() as s:
            async for batch in s.monitor:
                for task in batch:
                    nursery.cancel_scope.cancel()
                    return s.reap_and_unwrap(task)

# supervisor in child
async def race(*afns):
    async def supervise(s):
        async for batch in s.monitor:
            for task in batch:
                s.nursery.cancel_scope.cancel()
                return s.reap_and_unwrap(task)

    async with trio.open_nursery() as nursery:
        supervisor_task = nursery.spawn_supervisor(supervise)
        for afn in afns:
            nursery.spawn(afn)
        supervisor_task.wait()

    return supervisor_task.result()

Hmm, not sure how representative these are. Both have some trickery, but the first one does seem somewhat nicer.

@merrellb
Copy link
Author

There is a lot to unpack here. It seems like a lot of complication comes from moving supervision code into children. Is that a common use case? I haven't done much with custom supervisors but they always imagined them being implemented by parents. I like the idea of simple and let-me-supervise nurseries (or all simple nurseries but one can grab supervise_powers when wanted).

Having slept on this does it still seem like something worth pursuing?

@njsmith
Copy link
Member

njsmith commented Jul 22, 2017

@merrellb: In current trio, yeah, supervision is always implemented by parents – that's true for custom supervisors (b/c there are race conditions otherwise), and it's true for the default supervisor (the one that runs inside the nursery's __aexit__ block). BUT the idea here is that having the default supervisor running in the parent task is confusing, and to make it run invisibly in the background. So basically the whole idea here is to allow supervisors to run in a background task – at least for the default supervisors :-). We could say that it's only the default supervisor that gets this superpower, and custom supervisors have to live in the parent task. I'm just unsure if that's the right way to go.

A handy rule of thumb for designing low-level system primitives like this is that the user should be able to replicate the default behavior "by hand" (i.e., the system isn't cheating and doing things that you couldn't do if you wanted to). And if having the parent task available to run regular code is so important that we want to rewrite the API to make it possible, then maybe it's important for custom supervisors too! I don't really know what the landscape of custom supervisors is going to look like, so I'm hesitant to cut off options. And I would guess that something like an erlang-style restart-on-failure supervisor API might look like:

from trio_supervisor_library import open_restart_nursery

async with open_restart_nursery() as nursery:
    await nursery.start_restarted_job(start_http_listener, nursery, "localhost", 80)
    await nursery.start_restarted_job(start_https_listener, nursery, "localhost", 443)

So notice that here the actual supervision code is hidden inside the special "RestartNursery" object's __aexit__, so we're back to the rule that supervising children is a full-time job, but now only for a subset of nurseries, which is potentially even more confusing and error-prone than what we have currently.

...On the other hand, for a restart nursery in particular it makes no sense to put any logic in the parent task, because if it fails you can't exactly restart it.

And there certainly are a number of things that get simpler if only the parent task can do the supervision – no need to signal parent task death to the supervisor, no need to keep track of two types of children (supervisor child and regular children), etc.

I'm still thinking about it :-). I think the benefits in the non-custom-supervisor case are fairly compelling, but it's tricky to figure out how everything should fit together.

@njsmith
Copy link
Member

njsmith commented Jul 26, 2017

Another radical idea: dropping supervision/monitoring as a builtin concept

I'm not sure if this is a good idea, but it's certainly a big enough conceptual shift that it deserves headline font :-)

What I've started wondering is if we would be better off just dropping the specialized supervision APIs, and the Task attributes wait, result, add_monitor, remove_monitor, in favor of a system where:

  • task return values are always discarded (or possibly returning a non-None value is an error)
  • task exceptions are always propagated the way they are now (no custom supervision logic)

The idea would be that if you want a notification when a task exits, then call event.set() at the end, or queue.put(...), or whatever. If you want to implement a supervisor that intercepts task crashes and does something different with them than the default, then wrap them in a try/finally and put whatever logic you want there.

Motivations

First, this would certainly avoid all those gnarly API design problems I'm grappling with in the comments above.

Second, well. I certainly don't have that much experience writing programs on top of trio, but ... so far I don't think I've encountered a single sitaution where I wanted to use any of the "traditional" task management APIs. Happy eyeballs is a good example -- it requires spawning a collection of tasks that interact in some complicated way, and it turned out to be way simpler to just ignore all that stuff and put my own coordination logic in the task.

Or, for the toy race supervisor, it can be done as:

async def race(*afns):
    q = trio.Queue(1)
    async def jockey(afn):
        await q.put(await afn())

    async with trio.open_nursery() as nursery:
        await for afn in afns:
            nursery.spawn(jockey, afn)
        winner = await q.get()
        nursery.cancel_scope.cancel()
        return winner

...which is maybe a little bit more complicated than the original version? But it's very straightforward; it only uses obvious things we learn about in the tutorial, instead of bringing in UnboundedQueue and its weird interface.

And if one of the goals is "one obvious way to do it"... well, the above code works right now (modulo parenting being a full-time job), so we clearly have two ways to do it. And the dedicated supervision APIs are in some sense the "obvious" one (they're documented as being the way you supervise things!), but they're often worse than doing things "by hand" (the happy eyeballs case is a really compelling example of this), and I don't know any cases where they're much better.

It is a little unfortunate that you have to make a little closure and call it; that's getting back towards the kind of callback twistyness that trio is trying to avoid. But eh, that's task spawning, and defining a custom supervisor is going to be somewhat tricky one way or another.

(Also, huh, if we do this I think we can move UnboundedQueue to trio.hazmat! That's probably a better place for it; we need it for some of the low-level event loop APIs but AFAICT its use for task death notification is the only reason it was in the top-level trio namespace.)

@merrellb
Copy link
Author

merrellb commented Jul 26, 2017

I must say you are doing a poor job of convincing me of the potential downsides of this approach :-) I really like your proposal as it seems much more explicit and clean. Although I haven't interacted much with the supervisor in my code, knowing it was back there managing things made trio feel a bit heavier and opaque. The fact that it caused such reasoning headaches with simplified parenting and was the only thing keeping UnboundedQueues out of hazmat seems like gravy. I haven't tried writing it but I imagine supervision could be layered on top of this pretty easily (await HappyEyeballs(afns).run()). Would implementing this mainly be additional guidance in the documentation and deprecating supervisors?

@njsmith
Copy link
Member

njsmith commented Aug 1, 2017

Heck, I guess we could even move current_task() and Task objects in general into hazmat. [Edit: and even Result, maybe... if the only public usage is to pass to reschedule...]

njsmith added a commit to njsmith/trio that referenced this issue Aug 21, 2017
start_soon is just a new name for spawn, except it doesn't return the
new task (in preparation for python-triogh-136, where we're going to stop
emphasizing task objects in the main api)

start is a major new feature: it provides a very simple way to start
up a long running task, while blocking until it's finished whatever
initialization it wants to do. At least... it's simple from the user's
point of view. Internally it's quite tricky indeed. The whole _run.py
file probably needs some refactoring and splitting up, but this is one
of those cases where I think it's best to first get the new
functionality working and nailed down, and then we can see what shape
the new abstractions should be.

Fixes python-triogh-284.
njsmith added a commit to njsmith/trio that referenced this issue Aug 21, 2017
start_soon is just a new name for spawn, except it doesn't return the
new task (in preparation for python-triogh-136, where we're going to stop
emphasizing task objects in the main api)

start is a major new feature: it provides a very simple way to start
up a long running task, while blocking until it's finished whatever
initialization it wants to do. At least... it's simple from the user's
point of view. Internally it's quite tricky indeed. The whole _run.py
file probably needs some refactoring and splitting up, but this is one
of those cases where I think it's best to first get the new
functionality working and nailed down, and then we can see what shape
the new abstractions should be.

Fixes python-triogh-284.
@njsmith njsmith mentioned this issue Aug 21, 2017
17 tasks
njsmith added a commit to njsmith/trio that referenced this issue Aug 22, 2017
For python-triogh-136

This commit contains:

- the actual deprecations
- changes to _run.py to prevent warnings
- changes to test_run.py to prevent warnings

Still need to clean up everything else.
@njsmith
Copy link
Member

njsmith commented Aug 23, 2017

Another small point: our task tree introspection APIs aren't fully fleshed out, but the old concept was to represent it as a tree of tasks, with nurseries an invisible details. And this was fine; it didn't really make sense for a task to have more than one nursery in it, because once it had one nursery it had to park in __aexit__ ("parenting is a full-time job"). But if we're changing things so that parenting isn't a full time job, and the code inside the nursery block is more-or-less like a little inline child task, then there's no particular reason that nested nurseries can't happen, so our introspection API should reflect that reality. Plus we were already tracking task→parent nursery→parent task links anyway, just hiding it in the introspection API, and for #284 I had to start tracking task→child nursery links, so we might as well provide this information.

So tasks should have parent_nursery and child_nurseries attributes, and nurseries should parent_task and child_tasks attributes.

njsmith added a commit to njsmith/trio that referenced this issue Aug 23, 2017
New API:

- nursery.child_tasks
- nursery.parent_task
- task.child_nurseries
- task.parent_nursery (may be None)

Rationale: python-trio#136 (comment)
njsmith added a commit that referenced this issue Aug 31, 2017
@smurfix
Copy link
Contributor

smurfix commented Nov 29, 2017

A lot to unpack here.

Another data point: I really dislike the idea that a nursery's __aexit__ silently waits for the child tasks. Explicit is better than implicit, and people don't expect the end of a __with__ block to hang around indefinitely. (At least I don't.)

Given that, I'd rather have __aexit__ always call nursery.cancel_scope.cancel() before it does anything else. If you want to wait for the remaining tasks, well, as soon as we have introspection the caller can iterate over the running tasks and wait for them the way it wants to. For instance, when I end a TCP server I'd like to cancel the remaining tasks after a couple of seconds, not immediately.

NB: instead of explicitly hooking up a supervisor task which replaces some default behavior, I'd rather subclass the nursery (cf. #340 – we don't need an explicit open_nursery() function, just use Nursery()). That way you get the default behavior when you want it – simply call super().

@njsmith
Copy link
Member

njsmith commented Nov 29, 2017

Another data point: I really dislike the idea that a nursery's __aexit__ silently waits for the child tasks. Explicit is better than implicit, and people don't expect the end of a __with__ block to hang around indefinitely. (At least I don't.)

I agree that this is surprising the first time you see it. But it's absolutely essential to how Trio works: if the parent task can continue on past the with block while child tasks are still running, then we lose our ability to reliably propagate exceptions, we lose our guarantees about with block nesting (e.g. with my_socket: async with open_nursery(): ... guarantees that any tasks spawned inside that nursery can use my_socket freely and it won't be closed until they exit), and we lose the guarantee that functions can't spawn background tasks that outlive them unless explicitly passed a nursery (i.e., no "implicit concurrency", which AFAICT is the root cause of a lot of what's confusing about twisted/asyncio).

Basically, nurseries are where Trio spends its weirdness budget: they have a weird name and they have weird semantics, but at least they're like... the number one most prominent feature that people have to learn to use Trio. So even if they're surprising the first time you see them, that presumably happened within about 5 minutes of starting to read the tutorial, and after that you can adjust your expectations. And in return we basically get to skip all the fractal weirdness that plagues async libraries, which I think is a fair trade. Obviously I'd rather not have any weirdness at all, but, this is a concurrency library – if the worst our critics can say is it's a bit confusing if you don't look at any docs ever, then I think we're doing pretty darn well :-).

Given that, I'd rather have __aexit__ always call nursery.cancel_scope.cancel() before it does anything else.

It would be possible to alter __aexit__ to do this. But even if we did we'd still have to wait for the tasks to exit for the reasons mentioned above. (Cancellation isn't instantaneous, and child tasks can even ignore it entirely.)

So the choice is between having the with block end imply cancel+wait, or just wait (like now). I like "just wait" better. This is for several reasons:

Obviously, doing one thing is simpler than doing two things.

More importantly, if the default is wait, and someone wants cancel+wait, that's trivial, just put in a call to nursery.cancel_scope.cancel() at the bottom of the block. OTOH, if the default is cancel+wait and want to just wait... how do you do that? We could have some sort of nursery.wait() call that you could block in -- but the natural definition would be to make nursery.wait() block until the nursery has finished and is not accepting any new tasks. Except... then this code would deadlock:

async with open_nursery() as nursery:
    ...
    await nursery.wait()

because the code inside the with block acts like a child task, so the nursery isn't finished until the with block exits, but the with block doesn't exit until the nursery is finished...

Alternatively we could define wait as blocking until the next moment when there are 0 child tasks, but possibly the with block is still open. This is pretty surprising though? It's certainly not what you'd expect if calling nursery.wait from another task. And this is all to try and nail down the semantics of a feature that Trio currently doesn't even need at all. Simple is better than complex.

Also, blocking waiting for calls to complete is actually very natural from a certain point of view. When I write a function call like foo(), I expect my program to stop and wait until that function finishes. Nurseries are the same way: they let you start several function calls, and then you wait for them all to finish.

And finally, my experience so far is that waiting for all the children is very often useful. I haven't found any cases yet where I'd want to cancel at the end of the with block (I'd be interested to see yours!), but can think of several off the top of my head where blocking is exactly what you want.

Here's one: implementing a two-way bytestream proxy in Trio. I like this example a lot in general, because it's such an obvious use case, yet Trio makes it dramatically simpler than ... well, any other language or library that I know of, period. This proxies from a→b and b→a, returns after both incoming streams are closed, correctly supports cancellation, correctly propagates errors, and it's even made out of independently useful and testable pieces:

async def one_way_proxy(source, sink):
    while True:
        chunk = await source.read_some(MAX_CHUNK_SIZE)
        if not chunk:
            await sink.send_eof()
            return
        await sink.send_all(chunk)

async def two_way_proxy(a, b):
    async with a, b:
        async with trio.open_nursery() as nursery:
            nursery.start_soon(one_way_proxy, a, b)
            nursery.start_soon(one_way_proxy, b, a)

How would you write this?

For instance, when I end a TCP server I'd like to cancel the remaining tasks after a couple of seconds, not immediately.

You may find #147 of interest too. This is definitely an important use case, but note that you probably also want to immediately stop accepting new connections, and notify any connections to shut down soon -- e.g. HTTP keepalive should be disabled -- it's a bit tricky. In any case it seems obvious that this will need some sort of non-default handling no matter what the nursery's default behavior is, so I don't think it tells us much about what that default behavior should be?

NB: instead of explicitly hooking up a supervisor task which replaces some default behavior, I'd rather subclass the nursery (cf. #340 – we don't need an explicit open_nursery() function, just use Nursery()). That way you get the default behavior when you want it – simply call super().

The reason we have open_nursery instead of Nursery is what I said at the top: the whole design of Trio is based around the idea that every nursery's lifetime must be tied to a with block, and that block's __aexit__ must wait for the nursery's tasks to it's exit. So it's intentional that you have to call __aenter__ to get a nursery object, and that there's no opportunity to subclass and override __aexit__, and even the Nursery class itself is hidden away. When people write custom supervisors, we want them to work "inside" a regular nursery in some sense, so that even if the supervisor has bugs and crashes, then the nursery logic still acts as a safety net to preserve Trio's core invariants.

Also subclassing is, like... bad.

njsmith added a commit to njsmith/trio that referenced this issue Dec 19, 2017
- Parenting is no longer a full time job
- This commit also removes a bunch of deprecated functions

Fixes: python-triogh-136
njsmith added a commit to njsmith/trio that referenced this issue Dec 19, 2017
- Parenting is no longer a full time job
- This commit also removes a bunch of deprecated functions

Fixes: python-triogh-136
@bluetech
Copy link
Member

Came across an article which is somewhat related to the discussion here: http://250bpm.com/blog:71

The first part defines "structured concurrency". Trio already strictly enforces this property (except the pass-nursery-to-child trick which is clearly documented to violate it -- the author of the article also describes a way to violate the property, by analogy to stack vs. heap) so this is already familiar.

The second part discusses cancellation. The author chose to go with "exiting scope cancels all tasks with a graceful cancellation mechanism" rather than Trio's "exiting scope waits for all tasks" approach. Though this might just be due to the Author's starting point (golang-inspired) or their choice of programming language (C).

@njsmith
Copy link
Member

njsmith commented Dec 23, 2017

@bluetech omg, what an amazing find. I've seen other posts on that blog, but missed that one. And it's eerie how much of what he's saying overlaps with concepts I came up with independently, a year later... guess I have some reading to do :-)

Another interesting parallel I recently learned about is that the Rust library Rayon has a construct called rayon::scope, which is essentially exactly identical to a Trio nursery except that they don't have any cancellation concept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants