Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-46752: Introduce task groups in asyncio #31270

Merged
merged 24 commits into from Feb 15, 2022
Merged

Conversation

gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Feb 11, 2022

This is EdgeDB's TaskGroup class, adapted for Python 3.11.

In the individual commits you can see how I evolved this from the version in EdgeDB.

Here's a to-do list:

  • Figure out why in test_taskgroup_14 I get a nested ExceptionGroup, while the original got a flat MultiError.
  • Fix the test framework's complaint about the event loop policy being changed.
  • Design question: Do we need TaskGroupError? Implementing it correctly is awkward. We could just raise BaseExceptionGroup.
    • If we decide to keep it, we need to fix it to do the hack where it inherits from BaseException iff at least one of the exceptions does. We also need to override derive().
  • Add tests for handling BaseException, KeyboardInterrupt and SystemExit.
  • Make test_taskgroup_21 work? (It wasn't working in EdgeDB either, so it was crudely disabled.)
  • Add a public API to tasks.py to replace __cancel_requested__, and get rid of the monkey-patching.
  • Design question: Should we allow creating new tasks while __aexit__() is already waiting?
  • Add a test showing the need for the .uncancel() call in __aexit__() (currently on line 97). Dropping that line does not cause any tests to fail.
  • Ensure the taskgroup tests are run with the C and Python Task implementations.
  • Rename tests to have meaningful names.
  • I have a few ideas for minor cleanups that I will do later.
  • What should we do with the copyright header? @1st1.
  • Create a bpo issue.
  • Add a NEWS blurb.
  • Documentation and What's New entry (in a separate PR, probably).

I also learned a few things about the ergonomics of ExceptionGroup:

  • The BaseException hacks are clever but awkward to replicate in a subclass.
  • The str() of an EG is rather sparse (Yury's version adds the number of sub-exceptions and their types).
  • Yury's version also has a convenience method that returns the set of sub-exception types.
  • Unsure about the nesting issue (see above).

CC: @iritkatriel, @1st1

Co-authored-by: Yury Selivanov yury@edgedb.com
Co-authored-by: Andrew Svetlov andrew.svetlov@gmail.com

https://bugs.python.org/issue46752

My plan is roughly the following:

- [x] Copy the files from EdgeDb without modifications
      (named following asyncio conventions)
- [ ] Bang on the tests until they will run
- [ ] Bang on the tests until they pass
- [ ] Remove pre-3.11 compatibility code
- [ ] Switch from MultiError to ExceptionGroup
- [ ] Other cleanup
- [ ] Add a public API to tasks.py to replace `__cancel_requested__`
Two remaining issues:

- [ ] _test_taskgroup_21 doesn't work (it didn't in EdgeDb either)
- [ ] the test framework complains about a change to the event loop policy
This required some changes to the tests since EdgeDb's MultiError
has some ergonomic conveniences that ExceptionGroup doesn't:

- A helper method to get the types of the exceptions
- It puts the number and types of the exceptions in the message

Also, in one case (test_taskgroup_14) an EG nested inside another
EG was raised, whereas the original code just raised one EG.
This remains to be investigated.
@iritkatriel
Copy link
Member

iritkatriel commented Feb 11, 2022

Add to todo list: give the tests meaningful names.

Yury's version also has a convenience method that returns the set of sub-exception types.

That's probably useful only for tests. We could move assertExceptionIsLike to a mixin in test.support.

The BaseException hacks are clever but awkward to replicate in a subclass.

Subclassing was an afterthought. We assumed it's not needed, now I think we're assuming it would be rare. If you do subclass here, make sure you define derive(). Otherwise split() will not create parts of your type.

The str() of an EG is rather sparse (Yury's version adds the number of sub-exceptions and their types).

We could, it might be too long then though.

@asvetlov
Copy link
Contributor

Regarding __cancel_requested__, I support the change and have a follow-up proposal: return False from cancel() call if __cancel_requested__ was set and don't override the cancellation msg in this case. I believe it makes Task.cancel() design in line with Future.cancel().

If we have an agreement for it, I can prepare a pull request on the weekend.

@gvanrossum
Copy link
Member Author

Regarding __cancel_requested__, I support the change and have a follow-up proposal: return False from cancel() call if __cancel_requested__ was set and don't override the cancellation msg in this case. I believe it makes Task.cancel() design in line with Future.cancel().

So it would, but I am worried about breaking code that depends on the current behavior -- often developers have no idea how to debug such a failure, and it would give 3.11 a bad rep "randomly breaks async apps".

But maybe it's not so bad? Would any tests break?

If we have an agreement for it, I can prepare a pull request on the weekend.

Give it a whir and we can see if it's viable. If not, we can do something less drastic to just expose the cancel-requested state (maybe that's just t.cancelled()) and add an API to reset it.

@gvanrossum
Copy link
Member Author

gvanrossum commented Feb 11, 2022

Add to todo list: give the tests meaningful names.

Done (i.e., added to the todo list :-).

Yury's version also has a convenience method that returns the set of sub-exception types.

That's probably useful only for tests. We could move assertExceptionIsLike to a mixin in test.support.

Perhaps. We'll see how often this comes up. I find my current solution manageable, even though it isn't as concise as Yury's code.

The BaseException hacks are clever but awkward to replicate in a subclass.

Subclassing was an afterthought. We assumed it's not needed, now I think we're assuming it would be rare. If you do subclass here, make sure you define derive(). Otherwise split() will not create parts of your type.

So we have a real decision to make here: do we need TaskGroupException or can we just raise ExceptionGroup?

The str() of an EG is rather sparse (Yury's version adds the number of sub-exceptions and their types).

We could, it might be too long then though.

Yeah, we might have to truncate if there are too many. But I already know that I've been confused regularly when I did print(exc) where exc was an EG, and it printed just the message. (repr(exc) is much better.) Maybe just put the count in there? Presumably this would be a change in the PEP based on practical experience during the alpha/beta period.

@gvanrossum
Copy link
Member Author

Figure out why in test_taskgroup_14 I get a nested ExceptionGroup, while the original got a flat MultiError.

Mystery solved: test_taskgroup_14 does not get a flat MultiError! It's just that assertRaisesRegex(E, pat) looks for pat anywhere in the exception's error message, and MultiError includes the complete traceback for all suberrors in the message! (Because it has to work with Python versions that don't print nested tracebacks.)

I wrote a small test program and found that the full error message contains the nested MultiError:

edb.common.taskgroup.TaskGroupError: unhandled errors in a TaskGroup; 1 sub errors: (TaskGroupError)
 + TaskGroupError: unhandled errors in a TaskGroup; 1 sub errors: (ZeroDivisionError)
 + ZeroDivisionError: division by zero
 |   File "t.py", line 7, in crash_after
 |     1/0


 |   File "t.py", line 13, in amain
 |     h.create_task(crash_after(0.5))
 |   File "/Users/guido/edgedb/edb/common/taskgroup.py", line 170, in __aexit__
 |     raise me from None

So the assertRaisesRegex() call matched on the second line of that. (@1st1: It's not obvious from the way the test was written that this was intentional.)

@gvanrossum
Copy link
Member Author

gvanrossum commented Feb 11, 2022

FWIW I am strongly leaning towards not having a custom TaskGroupError. The asyncio package only defines a few custom exceptions, and those seem semantically important (e.g. they convey that a very specific error occurred). It usually raises builtin exceptions like TypeError, ValueError, RuntimeError or ConnectionResetError.

UPDATE: Done.

@gvanrossum
Copy link
Member Author

I looked more into handling of BaseException.

This is the core of test 20:

        async def crash_soon():
            await asyncio.sleep(0.1)
            1 / 0

        async def nested():
            try:
                await asyncio.sleep(10)
            finally:
                raise KeyboardInterrupt

        async def runner():
            async with taskgroups.TaskGroup() as g:
                g.create_task(crash_soon())
                await nested()

We expect the runner() to terminate with KeyboardInterrupt. This works, since the KeyboardInterrupt transfers control directly into __aexit__, which squirrels it away and eventually raises it (without using an exception group).

Test 21 is almost the same except the crash_soon() coroutine raises KeyboardInterrupt and nested() raises a regular exception. We would still like runner() to exit with KeyboardInterrupt, but in this case the KeyboardInterrupt exception has to travel through asyncio, which (intentionally) treats KeyboardInterrupt and SystemExit special so that they immediately terminate the event loop. Therefore, test 21 doesn't work. (To make it work, asyncio would have to stop making KeyboardInterrupt and SystemExit special cases.)

I did add new tests that parallel test 20 and 21 but use MyBaseExc instead of KeyboardInterrupt, and those raise, as expected, a BaseExceptionGroup wrapping both the MyBaseExc and the other exception.

I think there are good enough reasons why asyncio has these special cases for KeyboardInterrupt and SystemExit, so for now I'll leave test 21 disabled.

@gvanrossum
Copy link
Member Author

Design question: Should we allow creating new tasks while __aexit__() is already waiting?

I just chatted with some Trio folks and they had decent use cases for allowing this. I'm going to look into how easy it would be to add this functionality. @1st1

@iritkatriel
Copy link
Member

iritkatriel commented Feb 12, 2022

Yeah, we might have to truncate if there are too many. But I already know that I've been confused regularly when I did print(exc) where exc was an EG, and it printed just the message. (repr(exc) is much better.) Maybe just put the count in there? Presumably this would be a change in the PEP based on practical experience during the alpha/beta period.

I made bpo46729 for this and a draft PR so we can see the impact on tracebacks in the tests.

We stop when there are no unfinished tasks,
and then no new tasks can be created.

(TODO: more thorough testing of edge cases?)
@gvanrossum
Copy link
Member Author

I just chatted with some Trio folks and they had decent use cases for allowing this. I'm going to look into how easy it would be to add this functionality. @1st1

Looked pretty easy, so I made this change.

(However, as soon as cancel scopes came up they started saying asyncio was dumb so I gave up asking more about that.)

@agronholm
Copy link
Contributor

I didn't say asyncio was dumb. I said the way cancellation works is. You noted this yourself here. My generalization was unjust and I apologize for that.

@agronholm
Copy link
Contributor

To get back to the matter at hand: localized timeouts are a pattern I often see in the wild, and they're somewhat problematic without cancel scopes. Take this example:

async with timeout(5):
   ...

Such a construct is provided not only by AnyIO and trio, but asyncio_timeout as well. The problem arises when the task is cancelled as a whole and the timeout expires, both before the event loop has a chance to raise the cancellation exception in the task. How, then, do you know if you need to simply exit the timeout() context manager or the entire task?

@gvanrossum
Copy link
Member Author

@agronholm If you want to discuss cancel scopes please open a new bpo issue.

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @gvanrossum for commit 9712241 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Feb 15, 2022
@asvetlov
Copy link
Contributor

asvetlov commented Feb 15, 2022

Agree. I personally don't see any blocker.
But I want to improve tests a little, sure.

@gvanrossum
Copy link
Member Author

Okay, let's merge once buildbots are green, then we can improve tests and docs in later PRs.

BTW, are you interested in having something like your async-timeout in the stdlib as well? This might satisfy the Trio folks' desire for cancel scopes. (In order to work properly it would have to use .uncancel().)

@asvetlov
Copy link
Contributor

asvetlov commented Feb 15, 2022

BTW, are you interested in having something like your async-timeout in the stdlib as well? This might satisfy the Trio folks' desire for cancel scopes. (In order to work properly it would have to use .uncancel().)

Yes, sure. I think the current API is stable and consistent. The only known problem is a case when timeout occurs for a task that was cancelled on the previous loop iteration.
Now, this bug can be fixed easily because of the task.cancelling() method and changed .cancel() behavior. I can prepare a pull request for timeout async context manager inclusion into asyncio.

upd the mentioned async-timeout bug is actually auto-fixed after this PR merging, no timeout source code change is required. The same probably is true for asyncio.wait_for().

@1st1
Copy link
Member

1st1 commented Feb 15, 2022

BTW, are you interested in having something like your async-timeout in the stdlib as well? This might satisfy the Trio folks' desire for cancel scopes. (In order to work properly it would have to use .uncancel().)

Yes, sure.

Andrew, I also think it's time for us to add a context manager for timeouts to asyncio.

Lib/asyncio/taskgroups.py Outdated Show resolved Hide resolved
Copy link
Member

@1st1 1st1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvanrossum I think this is ready. With this merged we'll need to update the docs in a few places to de-prioritize asyncio.gather() and steer people towards TaskGroups.

@gvanrossum gvanrossum merged commit 602630a into python:main Feb 15, 2022
@Tinche
Copy link
Contributor

Tinche commented Feb 16, 2022

Are you folks acquainted with https://pypi.org/project/quattro/, since there's talk of timeout context managers in the stdlib?

@Tinche
Copy link
Contributor

Tinche commented Feb 16, 2022

It's also a little unfortunate Quattro task groups weren't used instead, they are identical but with type annotations (and I think I fixed a minor issue with something, pickling?)

@gvanrossum
Copy link
Member Author

A simple "thank you" would have sufficed.

@gvanrossum gvanrossum deleted the taskgroups branch February 16, 2022 04:54
@1st1
Copy link
Member

1st1 commented Feb 16, 2022

@Tinche

It's also a little unfortunate Quattro task groups weren't used instead, they are identical but with type annotations (and I think I fixed a minor issue with something, pickling?)

I've taken a quick look, the file seems identical with what Guido committed (except typing). If there's something you fixed and you can remember what it was, please file a bug or create a PR.

Nit: I've also noticed that you removed the original copyright comment from the file. Not that I care in this specific case, but you shouldn't remove copyright notices from files you borrow from other projects unless you're explicitly permitted to do so. (Thanks for crediting in README though!)

@agronholm
Copy link
Contributor

It's also a little unfortunate Quattro task groups weren't used instead, they are identical but with type annotations

Type annotations are still not a thing in the standard library. Were you hoping for the code to be type annotated?

@Tinche On another note, I am currently writing a BPO for potential low level asyncio cancellation machinery updates to support the implementation of cancel scopes and level cancellation in asyncio. The writing process has largely turned into a rubber ducking session and led to new insights on my part. The interesting aspect of Quattro is that it implements cancel scopes along with task groups, which this PR does not. It would be interesting to compare our implementations privately. I'll reach out to you by Gitter so as not to unnecessarily flood this PR.

@Tinche
Copy link
Contributor

Tinche commented Feb 16, 2022

Nit: I've also noticed that you removed the original copyright comment from the file.

Sorry, I restored it just now. To me it's just clutter since the repo already has a license included. I see the TaskGroup here doesn't have it, so I might switch to that then ;)

@gvanrossum
Copy link
Member Author

gvanrossum commented Feb 16, 2022

@Tinche I've created a separate issue for cancel scopes: bpo-46771.

@1st1
Copy link
Member

1st1 commented Feb 16, 2022

Sorry, I restored it just now. To me it's just clutter since the repo already has a license included. I see the TaskGroup here doesn't have it, so I might switch to that then ;)

You should include the full copyright comment.

@gvanrossum
Copy link
Member Author

gvanrossum commented Feb 17, 2022

HOWEVER, this introduces a change in behavior around task cancellation:

  • A task that catches CancelledError is allowed to run undisturbed (ignoring further .cancel() calls and allowing any number of await calls!) until it either exits or calls .uncancel().

While this change was merged, it is still being debated, and we may revise this. See e.g. bpo-46771. (In general, what was merged is intended as a starting point for further experiments and discussion, at least until feature freeze / beta 1, late May.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants