Should `Nursery.start()` be error-interruptible by cancellation? #2258

goodboy · 2022-02-26T20:01:41Z

Synopsis

A somewhat convoluted/complex case I discovered while refining a "service manager" style system in tractor is where a so called "service nursery" experiences the following situation:

a first task is requested to start async in the background via Nursery.start_soon()
that same nursery's .start() is called to start another 2nd task which can execute (in theory async) up to its required .started() where it unblocks the caller but, between the parent:.start() -> child:.started() sequence may need synchronization with the previous first bg task (the syncing implemented with a trio.Event for example).
the first task errors and normally would be processed on Nursery exit

For further background on what the real-world multi-actor daemon service system looks like see this test. The TL;DR is that having a service nursery in the global state of a process (in tractor terms we call this actor local) is fairly useful for spawning long lived daemon service-actors which can be started and stopped from some (other) remote program (not in the process tree) where the lifetime of the daemons (async tasks running in processes) are still lifetime-managed using persistent trio nurseries.

Example of the issue

Here is a small pytest ready module which demonstrates how this scenario causes a hang:

import pytest
import trio
from trio_typing import TaskStatus


def test_stashed_child_nursery():

    _child_nursery = None

    async def waits_on_signal(
        ev: trio.Event(),
        task_status: TaskStatus[trio.Nursery] = trio.TASK_STATUS_IGNORED,
    ):
        '''
        Do some stuf, then signal other tasks, then yield back to "starter".

        '''
        await ev.wait()
        task_status.started()

    async def mk_child_nursery(
        task_status: TaskStatus = trio.TASK_STATUS_IGNORED,
    ):
        '''
        Allocate a child sub-nursery and stash it as a global.

        '''
        nonlocal _child_nursery

        async with trio.open_nursery() as cn:
            _child_nursery = cn
            task_status.started(cn)

            # block until cancelled by parent.
            await trio.sleep_forever()

    async def sleep_and_err(ev: trio.Event):
        await trio.sleep(0.5)
        doggy()
        ev.set()

    async def main():

        async with (
            trio.open_nursery() as pn,
        ):
            cn = await pn.start(mk_child_nursery)
            assert cn

            ev = trio.Event()
            cn.start_soon(sleep_and_err, ev)

            # this causes inf hang
            await cn.start(waits_on_signal, ev)

            # this does not.
            # cn.start_soon(waits_on_signal, ev)

    with pytest.raises(NameError):
        trio.run(main)

Analysis and discussion

@njsmith mentioned in chat

so what ends up happening is: sleep_and_err crashes with an error -> cn cancels everything inside it and tries to propagate the NameError -> but cn can't exit, because waits_on_signal is in a weird state where it's still executing in the parent, but cn knows that a start call is in progress. So cn waits to see what waits_on_signal is going to do -- if it exited, or if it called started, then either way cn can continue and propagate the NameError. But because start does neither, cn gets stuck forever, so the NameError can't propagate, so it can't reach pn and cause the start call to be cancelled.

I'm not sure if this suggests a problem in trio or not!

right now started always succeeds -- if start makes sure that the nursery is open and that it will stay open until the child calls started.

what I could imagine doing instead, is to say that the nursery closes while start is running, then started raises a RuntimeError, same as if you do start_soon into a closed nursery

The main question being which of the following 2 approaches should trio take:

keep .start() -> .started() never nursery-error-interruptible
the opposite such that this example test would not hang (my personal preference)

The text was updated successfully, but these errors were encountered:

GalaxySnail · 2022-03-07T17:25:37Z

I agree that this example test would not hang. Smith said:

when you use nursery.start, the function runs "under" the call to start until it calls started, and only then does it move into nursery

so if you're calling nursery.start from some other task outside of the nursery, and you want to cancel ev.wait(), then you have to cancel the nursery.start call, not the nursery itself

That's really suprising. In my opinion, in this test, nursery.start should raise an exception just like MemoryReceiveChannel.receive if the other side is closed; And a TaskStatus.started(something) acts like a MemorySendChannel.send(something), which can be cancelled easily.

goodboy · 2022-05-12T19:21:49Z

I agree that this example test would not hang.

@GalaxySnail i presume s/would/should ?

The test definitely hangs 😂

GalaxySnail · 2022-05-12T19:35:40Z

Thank you for correcting me! 😂 I'm not a native English speeker. I meant it really hangs.

goodboy added design discussion user happiness exception handling labels Feb 26, 2022

goodboy assigned njsmith Feb 26, 2022

goodboy mentioned this issue Feb 26, 2022

LinkedTaskChannel.subscribe(), explicit asyncio task cancel logging, test_trioisms.py goodboy/tractor#304

Merged

2 tasks

goodboy added a commit to pikers/piker that referenced this issue Feb 28, 2022

Adjust and add notes for python-trio/trio#2258

5507461

goodboy added a commit to pikers/piker that referenced this issue Feb 28, 2022

Adjust and add notes for python-trio/trio#2258

b1cce8f

goodboy mentioned this issue Feb 28, 2022

Async hist loading pikers/piker#287

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should `Nursery.start()` be error-interruptible by cancellation? #2258

Should `Nursery.start()` be error-interruptible by cancellation? #2258

goodboy commented Feb 26, 2022 •

edited

GalaxySnail commented Mar 7, 2022

goodboy commented May 12, 2022 •

edited

GalaxySnail commented May 12, 2022

Should Nursery.start() be error-interruptible by cancellation? #2258

Should Nursery.start() be error-interruptible by cancellation? #2258

Comments

goodboy commented Feb 26, 2022 • edited

Synopsis

Example of the issue

Analysis and discussion

GalaxySnail commented Mar 7, 2022

goodboy commented May 12, 2022 • edited

GalaxySnail commented May 12, 2022

Should `Nursery.start()` be error-interruptible by cancellation? #2258

Should `Nursery.start()` be error-interruptible by cancellation? #2258

goodboy commented Feb 26, 2022 •

edited

goodboy commented May 12, 2022 •

edited