-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure trio service shutdown follows DAG cancellation logic #12
Comments
After spending a whole day on this, I'm starting to think it might not make sense to have our trio implementation try to cancel tasks in any specific order because that will be tricky to implement (as we don't handle task cancellation directly like in our asyncio implementation) but more importantly because it wouldn't prevent the very problem you describe in your example. I say that because I believe a more realistic example would have an async function that performs any cleanup, like in the code below:
In the case above, even if we do explicitly cancel the Having said all that, it is late here and I've been staring at this for too long, so I may be missing something, but If nobody chips in to clarify what, tomorrow I'll try to write a test to see if the asyncio implementation prevents the problem you describe in the example above |
Hope this helps. I think I have it working. So one thing I realized right at the end is that we don't actually need sub-nurseries for each child task, but rather just a As for whether we need this or not. In your example above, specifically in the If you take a look at my implementation you'll see I added a I did however break something in the lifecycle that I'm trying to clean up.... |
I did a bit more work to get the test suite passing again. I think that the My goal for this library is very concretely define the lifecycle of services and the way the test is currently written is suggests that the child service should register as being cancelled which I'm not entirely sure is correct. I've re-written the test so that it runs the child service as an actual child service, and I've updated the cancellation handling logic to include explicit cancellation of child services. I'm still not 100% confident that #22 represents everything exactly right but I think it's close enought to iterate from. So my questions:
|
Actually, that was the impression I had while debugging yesterday, but I couldn't confirm that with a unit test |
Trying to summarize things here... The approach I tried in #21 requires accessing trio internals and doesn't ensure tasks are cancelled in the order we want. Then I tried a simpler approach in #26, which awaits for tasks in the order we want, but they can still be cancelled in any order. As I said there, I can't find a way to test the order of task termination, and even if I did I'm not sure this approach would work as we expect because a task might terminate immediately when cancelled, before we have a chance to wait for all of its children Piper's approach in #22 ensures order of cancellation of tasks, but at the expense of us having to be completely in charge of task cancellation, which is far from ideal as that is not trivial stuff and likely to bite us in the long run, as we've seen with CancelToken. Then there's the new test we've been discussing, which is supposed to verify that we don't let parents terminate before their children. All approaches above pass that test, but so does the current TrioManager implementation in master, so unless that test is wrong, we might not really need any of the changes above because trio's task cancellation mechanism is smart enough Oh, and all PRs above break trinity tests, so it'd be good to investigate that before merging any of them |
Sorry for spreading conversation across multiple issues.
I think #27 demonstrates that the test isn't testing the thing it's trying to test. I'm not 100% sure so feel free to poke holes in my testing logic. I think we must do something like what I did in #22 because I'm pretty sure we have to control both the cancellation order and the termination order to be able to ensure task lifecycle follows the expected rules. |
Those tests we're coming up with are almost as complicated as the code they test, and that's a big problem because it is very hard to find out whether or not they're testing the right thing. Unit tests are supposed to be simple and straightforward as they can't, by definition, be unit tested and so need to be validated by humans. I don't mean to assign blame or anything, as I'm also guilty of writing overly complex tests, but after spending a significant part of the last few days debugging the tests themselves, I thought I'd remind us all of this. About the task cancellation order, I'm afraid you're right that we may have to assume full control if we want to enforce a certain order, so I guess we can move forward with #22 |
Probably it can solve the dag issue: ethereum/async-service#12
What is wrong
See: https://github.com/ethereum/async-service/blob/04bbca368b0dfdeb87e8f9ff871ffb23a27b01ee/tests-asyncio/test_task_dag_cancellation_order.py
The asyncio service builds a DAG of tasks. The idea is that all of the tasks a service runs can be modeled as a dag
In the example above we have a DAG of
run() -> consume_socket()
. Theconsume_socket
task needs the socket to be open. Therun()
method will close the socket when it gets cancelled.IF
run()
is cancelled beforeconsume_socket()
then whenconsume_socket
get's cancelled it will try to dodo_socket_cleanup
which will error because the socket will have already been closed.In order to address this we need to ensure that we unroll our task DAG by cancelling them from the leaf's back towards the root.
How can it be fixed
The test for the asyncio version should be replicated in the trio version and then if trio doesn't do this for us automatically we will need to do something like lazily spawning new nurseries for each task to house it's sub-tasks and then doing the nursery cancellations in the proper order.
The text was updated successfully, but these errors were encountered: