You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A somewhat convoluted/complex case I discovered while refining a "service manager" style system in tractor is where a so called "service nursery" experiences the following situation:
a first task is requested to start async in the background via Nursery.start_soon()
that same nursery's .start() is called to start another 2nd task which can execute (in theory async) up to its required .started() where it unblocks the caller but, between the parent:.start() -> child:.started() sequence may need synchronization with the previous first bg task (the syncing implemented with a trio.Event for example).
the first task errors and normally would be processed on Nursery exit
For further background on what the real-world multi-actor daemon service system looks like see this test. The TL;DR is that having a service nursery in the global state of a process (in tractor terms we call this actor local) is fairly useful for spawning long lived daemon service-actors which can be started and stopped from some (other) remote program (not in the process tree) where the lifetime of the daemons (async tasks running in processes) are still lifetime-managed using persistent trio nurseries.
Example of the issue
Here is a small pytest ready module which demonstrates how this scenario causes a hang:
importpytestimporttriofromtrio_typingimportTaskStatusdeftest_stashed_child_nursery():
_child_nursery=Noneasyncdefwaits_on_signal(
ev: trio.Event(),
task_status: TaskStatus[trio.Nursery] =trio.TASK_STATUS_IGNORED,
):
''' Do some stuf, then signal other tasks, then yield back to "starter". '''awaitev.wait()
task_status.started()
asyncdefmk_child_nursery(
task_status: TaskStatus=trio.TASK_STATUS_IGNORED,
):
''' Allocate a child sub-nursery and stash it as a global. '''nonlocal_child_nurseryasyncwithtrio.open_nursery() ascn:
_child_nursery=cntask_status.started(cn)
# block until cancelled by parent.awaittrio.sleep_forever()
asyncdefsleep_and_err(ev: trio.Event):
awaittrio.sleep(0.5)
doggy()
ev.set()
asyncdefmain():
asyncwith (
trio.open_nursery() aspn,
):
cn=awaitpn.start(mk_child_nursery)
assertcnev=trio.Event()
cn.start_soon(sleep_and_err, ev)
# this causes inf hangawaitcn.start(waits_on_signal, ev)
# this does not.# cn.start_soon(waits_on_signal, ev)withpytest.raises(NameError):
trio.run(main)
so what ends up happening is: sleep_and_err crashes with an error -> cn cancels everything inside it and tries to propagate the NameError -> but cn can't exit, because waits_on_signal is in a weird state where it's still executing in the parent, but cn knows that a start call is in progress. So cn waits to see what waits_on_signal is going to do -- if it exited, or if it called started, then either way cn can continue and propagate the NameError. But because start does neither, cn gets stuck forever, so the NameError can't propagate, so it can't reach pn and cause the start call to be cancelled.
I'm not sure if this suggests a problem in trio or not!
right now started always succeeds -- if start makes sure that the nursery is open and that it will stay open until the child calls started.
what I could imagine doing instead, is to say that the nursery closes while start is running, then started raises a RuntimeError, same as if you do start_soon into a closed nursery
The main question being which of the following 2 approaches should trio take:
keep .start() -> .started() never nursery-error-interruptible
the opposite such that this example test would not hang (my personal preference)
The text was updated successfully, but these errors were encountered:
I agree that this example test would not hang. Smith said:
when you use nursery.start, the function runs "under" the call to start until it calls started, and only then does it move into nursery
so if you're calling nursery.start from some other task outside of the nursery, and you want to cancel ev.wait(), then you have to cancel the nursery.start call, not the nursery itself
That's really suprising. In my opinion, in this test, nursery.start should raise an exception just like MemoryReceiveChannel.receive if the other side is closed; And a TaskStatus.started(something) acts like a MemorySendChannel.send(something), which can be cancelled easily.
Synopsis
A somewhat convoluted/complex case I discovered while refining a "service manager" style system in
tractor
is where a so called "service nursery" experiences the following situation:Nursery.start_soon()
.start()
is called to start another 2nd task which can execute (in theory async) up to its required.started()
where it unblocks the caller but, between the parent:.start()
-> child:.started()
sequence may need synchronization with the previous first bg task (the syncing implemented with atrio.Event
for example).Nursery
exitFor further background on what the real-world multi-actor daemon service system looks like see this test. The TL;DR is that having a service nursery in the global state of a process (in
tractor
terms we call this actor local) is fairly useful for spawning long lived daemon service-actors which can be started and stopped from some (other) remote program (not in the process tree) where the lifetime of the daemons (async tasks running in processes) are still lifetime-managed using persistenttrio
nurseries.Example of the issue
Here is a small
pytest
ready module which demonstrates how this scenario causes a hang:Analysis and discussion
@njsmith mentioned in chat
The main question being which of the following 2 approaches should
trio
take:.start()
->.started()
never nursery-error-interruptibleThe text was updated successfully, but these errors were encountered: