Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-95704: Don't suppress errors from tasks when TG is cancelled #95761

Merged
merged 6 commits into from
Aug 17, 2022

Conversation

gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Aug 7, 2022

When a task catches CancelledError and raises some other error,
the other error should not silently be suppressed.

Exception: when uncancel() != 0, always propagate CancelledError.

NOTE: This represents a change in behavior (hence the need to
change three tests). But it is only an edge case.

We need to discuss with the RM whether to backport to 3.11RC2,
or 3.11.1, or not at all.

When a task catches CancelledError and raises some other error,
the other error should not silently be suppressed.

Exception: when uncancel() != 0, always propagate CancelledError.

NOTE: This represents a change in behavior (hence the need to
change three tests).  But it is only an edge case.

We need to discuss with the RM whether to backport to 3.11RC2,
or 3.11.1, or not at all.
@gvanrossum gvanrossum marked this pull request as ready for review August 7, 2022 15:42
@gvanrossum
Copy link
Member Author

@Tinche Please review as well.

Copy link
Contributor

@Tinche Tinche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of the old boy scouts rule, 'Always leave the campground cleaner than you found it', can we give the changed tests meaningful names and docstrings?

The functionality change looks ok, although I was a little confused at first as to why we're now also checking self._parent_cancel_requested, so might want to add a small comment there?

@gvanrossum
Copy link
Member Author

gvanrossum commented Aug 8, 2022

In the spirit of the old boy scouts rule, 'Always leave the campground cleaner than you found it', can we give the changed tests meaningful names and docstrings?

Hm, I have some issues with boy scouts. :-)

I know I let you do that in the past, but when we started comparing the code to the original in EdgeDb I realized that it would be better to keep the names and style consistent with the original.

Eventually we could do a separate cleanup PR that adds docstrings (actually for tests I prefer comments, since the unittest package does something funny with docstrings) and maybe add something to the name (e.g. test_taskgroup_08_cancellation_in_body) but I am loath to clean up just the changed tests -- it will look weird (PEP 8 has a rule about respecting the local style over almost everything else: "Consistency within one module or function is the most important").

The functionality change looks ok, although I was a little confused at first as to why we're now also checking self._parent_cancel_requested, so might want to add a small comment there?

Yeah, that I can do.

@gvanrossum
Copy link
Member Author

Not sure if the comment explains things well enough, but I am loath to insert a novella. Everything about propagating cancellation is confusingly complicated.

# Propagate CancelledError if there is one, except if there
# are other errors and we're not required by uncancel() != 0.
# (The latter means we *must* propagate the cancellation.)
if (propagate_cancellation_error is not None and
Copy link
Contributor

@graingert graingert Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this now suppress the TimeoutError when used with async with timeout():?

I'll try and make a repro

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the condition is true in fewer cases than the original code (where it was just if propagate_cancellation_error is not None:) I don't see how it could suppress something that wouldn't be suppressed previously. But I'm not sure what you're thinking of, so please make a repro.

@Tinche
Copy link
Contributor

Tinche commented Aug 8, 2022

@gvanrossum I'm trying grok your newly added comment, finding the state machine a little complicated.

IIUC, the comments are saying if uncancel() returns non-zero (i.e. there are others up-stack that have also cancelled and are now waiting for the CancelledError), we should swallow child exceptions? But I think we should always propagate, so exceptions don't pass silently. WYT?

Co-authored-by: Thomas Grainger <tagrain@gmail.com>
@gvanrossum
Copy link
Member Author

@gvanrossum I'm trying grok your newly added comment, finding the state machine a little complicated.

Me too, but it's the only thing that makes all the original tests pass.

IIUC, the comments are saying if uncancel() returns non-zero (i.e. there are others up-stack that have also cancelled and are now waiting for the CancelledError), we should swallow child exceptions? But I think we should always propagate, so exceptions don't pass silently. WYT?

Note that "propagate" here only refers to CancelledError. When we don't "propagate" because there are other errors, those will be bundled in an ExceptionGroup and raised below.

In the remaining case where we swallow child exceptions (this is the self._parent_cancel_requested part) because otherwise test_taskgroup_{15,16} fail. Maybe we need to consider these tests in detail. The first one does the following:

  • Create a task group with a child that will crash after 0.3s
  • in the runner (parent) task, in the body of the async with TaskGroup()... statement, sleep 10s
  • after 0.1s the parent task is cancelled
  • it catches the CancelledError, and sleeps 0.5s more (this is followed by a bare raise but that's not reached)
  • 0.2s into the latter sleep (0.3 into the whole thing), the child crashes
  • This cancels the parent task again, setting _parent_cancel_requested
  • TaskGroup.__aexit__ sets propagate_cancellation_error and calls uncancel(), finding the latter returns 1 (so we don't clear propagate_cancellation_error)
  • We collect the ZeroDivisionError from the child task
  • We reach the code with the comment, and decide to propagate the CancelledError, ignoring the child error

It looks like test 16 is similar but with a nested runner task.

So what this does at a high level is, it propagates CancelledError in favor of ZeroDivisionError because (through the magic of uncancel()) it knows that the parent taks was cancelled "from the outside".

Are these the right semantics? We can debate that, I'm not sure myself. I feel that given there were two tests that validated this behavior (with not a race condition in sight, unlike tests 08, 11, 12) it is defensible. OTOH what we did with tests 08, 11, 12 was that we decided that if a child raised an error we should propagate that, and so maybe as long as we are changing tests and semantics anyway, we may as well decide to propagate the child's error in this case as well. It would certainly simplify the logic a bit.

Thoughts? I'd really like @1st1's opinion here, too. And maybe this is what @graingert was alluding to?

@graingert
Copy link
Contributor

graingert commented Aug 8, 2022

So what this does at a high level is, it propagates CancelledError in favor of ZeroDivisionError because (through the magic of uncancel()) it knows that the parent taks was cancelled "from the outside".

I think what you need to do here is raise CancelledError wrapped in a BaseExceptionGroup allong with the other self._errors

This happens here in trio https://github.com/python-trio/trio/blob/2eed34c9e01ce279bfe30fd9d8f7ae897f31a515/trio/_core/_run.py#L930-L936

Here await checkpoint() is equivalent to the uncancel() == 0 check

@gvanrossum
Copy link
Member Author

I think what you need to do here is raise CancelledError wrapped in a BaseExceptionGroup allong with the other self._errors

But that is exactly what we're contorting ourselves not to do -- mainly because except CancelledError would not catch that, and we don't want to require people to use except* CancelledError (especially since that would not work in pre-3.11 code).

We have to decide what's more important -- the CancelledError or the child task errors. (Maybe I'll write more later.)

@graingert
Copy link
Contributor

mainly because except CancelledError would not catch that,

We don't want people to do that anyway

especially since that would not work in pre-3.11 code).

Right, but code can't get itself into this situation without using 3.11 features anyway. The problem only presents itself when an except* is sandwiched between a timeout and a taskgroup

@Tinche
Copy link
Contributor

Tinche commented Aug 8, 2022

Maybe I'll take a minute and see what Trio does here. Might be an interesting data point.

My gut feeling is: we never swallow child errors. But I haven't thought about it much.

@graingert
Copy link
Contributor

graingert commented Aug 8, 2022

Maybe I'll take a minute and see what Trio does here. Might be an interesting data point.

My gut feeling is: we never swallow child errors. But I haven't thought about it much.

Ah I built a tool to do this. I'll edit this comment to link it

https://gist.github.com/graingert/898a76796c925da7b26a183811a410fa#file-demo-py-L76

@gvanrossum
Copy link
Member Author

What Trio does is not necessarily what we should do, they have different constraints (its MultiError or whatever behaves differently than ExceptionGroup).

Anyway, here's a scenario that I was thinking of:

t = asyncio.create_task(run())
<do other stuff>
t.cancel()
try:
    res = await t  # Wait for task to complete
except CancelledError:
    <handle it>
except ZeroDivisionError:
    <handle that>
# etc.

This is perfectly legit code in 3.10 and before (except for the syntax to wait for a task), and I don't want to break it.

Now suppose run() was not completely under our control (say it invoked some other library that might be upgraded to use TaskGroup), and if that ends up somehow wrapping the CancelledError in an exception, we'd have broken this code.

This is why CancelledError is always propagated unwrapped.

So we just have to decide whether to raise CancelledError or the child error(s) -- we can't raise both.

@1st1
Copy link
Member

1st1 commented Aug 9, 2022

Sorry for being hard to reach. It took me some time to unpack this discussion (I glanced over it a few times and could miss a few details, sorry!) and then to debug the example from #95704 (comment) on the latest taskgroups.py in the main branch. I think I'm more or less up to date now.

I think the correct solution is to always propagate all errors, otherwise asyncio programs would be too hard to debug. Thomas' example is a perfect illustration of that--without AttrubuteErrors in the printed traceback it's almost impossible to understand why his program is abruptly crashing.

The current taskgroup.py implementation ignores side errors and propagates a single CancelledError. This could be a simple oversight by me (and now I feel dumb because I think I remember debugging pretty complex async/await code wondering why some errors appear to be silenced). It could also be the result of the current TaskGroup design predating exception groups. Who knows...

How do we fix the current behavior? The key thing we should probably keep is supporting except CancelledError blocks of code that some libraries can have. A solution would be to introduce the following subclass of BaseException that we'll use whenever we're wrapping a set of errors that also has a CancelledError:

class CancelledErrorGroup(BaseExceptionGroup, CancelledError):
    """A group of errors containing a CancelledError."""

    # This error type needed for "backwards compatibility" with code that
    # has `except CancelledError: ...; raise`-style blocks.

    def derive(self, excs):
        et = BaseExceptionGroup
        if any(e for e in excs if isinstance(e, CancelledError)):
            et = CancelledErrorGroup
        return et(self.message, excs)

The most important fix then is in these lines of taskgroups.py:

diff --git a/Lib/asyncio/taskgroups.py b/Lib/asyncio/taskgroups.py
index 097b4864f7..68391d080c 100644
--- a/Lib/asyncio/taskgroups.py
+++ b/Lib/asyncio/taskgroups.py
@@ -115,14 +121,14 @@ async def __aexit__(self, et, exc, tb):
         if self._base_error is not None:
             raise self._base_error
 
-        if propagate_cancellation_error is not None:
-            # The wrapping task was cancelled; since we're done with
-            # closing all child tasks, just propagate the cancellation
-            # request now.
-            raise propagate_cancellation_error
+        group_type = BaseExceptionGroup
 
-        if et is not None and et is not exceptions.CancelledError:
-            self._errors.append(exc)
+        if propagate_cancellation_error is not None:
+            self._errors.append(propagate_cancellation_error)
+            group_type = exceptions.CancelledErrorGroup
+        else:
+            if erroring_out and not issubclass(et, exceptions.CancelledError):
+                self._errors.append(exc)
 
         if self._errors:
             # Exceptions are heavy objects that can have object
@@ -131,7 +137,7 @@ async def __aexit__(self, et, exc, tb):
             errors = self._errors
             self._errors = None
 
-            me = BaseExceptionGroup('unhandled errors in a TaskGroup', errors)
+            me = group_type('unhandled errors in a TaskGroup', errors)
             raise me from None

This seems to do the trick and all asyncio tests pass unmodified. Thomas' program now outputs a correct and debuggable traceback:

Click to see the better tracebaack
  + Exception Group Traceback (most recent call last):
  |   File "/Users/yury/dev/python/cpython/t.py", line 31, in <module>
  |     asyncio.run(main())
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/runners.py", line 187, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/runners.py", line 119, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/base_events.py", line 650, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "/Users/yury/dev/python/cpython/t.py", line 20, in main
  |     async with asyncio.TaskGroup() as tg:
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/taskgroups.py", line 156, in __aexit__
  |     raise me from None
  | asyncio.exceptions.CancelledErrorGroup: unhandled errors in a TaskGroup (3 sub-exceptions)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/yury/dev/python/cpython/t.py", line 6, in child
    |     with contextlib.closing(sock):
    |   File "/Users/yury/dev/python/cpython/Lib/contextlib.py", line 345, in __exit__
    |     self.thing.close()
    |     ^^^^^^^^^^^^^^^^
    | AttributeError: 'Sock' object has no attribute 'close'. Did you mean: 'aclose'?
    +---------------- 2 ----------------
    | Traceback (most recent call last):
    |   File "/Users/yury/dev/python/cpython/t.py", line 6, in child
    |     with contextlib.closing(sock):
    |   File "/Users/yury/dev/python/cpython/Lib/contextlib.py", line 345, in __exit__
    |     self.thing.close()
    |     ^^^^^^^^^^^^^^^^
    | AttributeError: 'Sock' object has no attribute 'close'. Did you mean: 'aclose'?
    +---------------- 3 ----------------
    | Traceback (most recent call last):
    |   File "/Users/yury/dev/python/cpython/t.py", line 25, in main
    |     await asyncio.sleep(2)
    |   File "/Users/yury/dev/python/cpython/Lib/asyncio/tasks.py", line 644, in sleep
    |     return await future
    |            ^^^^^^^^^^^^
    | asyncio.exceptions.CancelledError
    |
    | During handling of the above exception, another exception occurred:
    |
    | Traceback (most recent call last):
    |   File "/Users/yury/dev/python/cpython/t.py", line 24, in main
    |     async with contextlib.aclosing(sock):
    |   File "/Users/yury/dev/python/cpython/Lib/asyncio/tasks.py", line 644, in sleep
    |     return await future
    |            ^^^^^^^^^^^^
    | asyncio.exceptions.CancelledError
    +------------------------------------

And the old-style code that would expect a CancelledError should also be happy.

Here's the full diff that I have with a couple more fixes & nits.
diff --git a/Lib/asyncio/exceptions.py b/Lib/asyncio/exceptions.py
index 5ece595aad..961d039aed 100644
--- a/Lib/asyncio/exceptions.py
+++ b/Lib/asyncio/exceptions.py
@@ -2,7 +2,8 @@
 
 
 __all__ = ('BrokenBarrierError',
-           'CancelledError', 'InvalidStateError', 'TimeoutError',
+           'CancelledError', 'CancelledErrorGroup',
+           'InvalidStateError', 'TimeoutError',
            'IncompleteReadError', 'LimitOverrunError',
            'SendfileNotAvailableError')
 
@@ -11,6 +12,16 @@ class CancelledError(BaseException):
     """The Future or Task was cancelled."""
 
 
+class CancelledErrorGroup(BaseExceptionGroup, CancelledError):
+    """A group of errors containing a CancelledError."""
+
+    # This error type needed for "backwards compatibility" with code that
+    # has `except CancelledError: ...; raise`-style blocks.
+
+    def derive(self, excs):
+        return CancelledErrorGroup(self.message, excs)
+
+
 TimeoutError = TimeoutError  # make local alias for the standard exception
 
 
diff --git a/Lib/asyncio/taskgroups.py b/Lib/asyncio/taskgroups.py
index 097b4864f7..9505793b03 100644
--- a/Lib/asyncio/taskgroups.py
+++ b/Lib/asyncio/taskgroups.py
@@ -55,29 +55,35 @@ async def __aenter__(self):
     async def __aexit__(self, et, exc, tb):
         self._exiting = True
 
-        if (exc is not None and
+        # True if the context manager is aborting with an error.
+        erroring_out = et is not None
+
+        if (erroring_out and
                 self._is_base_error(exc) and
                 self._base_error is None):
             self._base_error = exc
 
-        propagate_cancellation_error = \
-            exc if et is exceptions.CancelledError else None
+        propagate_cancellation_error = None
+        if et is not None and issubclass(et, exceptions.CancelledError):
+            propagate_cancellation_error = exc
+
         if self._parent_cancel_requested:
-            # If this flag is set we *must* call uncancel().
+            # If this flag is set it means that this TaskGroup requested
+            # the parent task to be cancelled. We now *must* call uncancel().
             if self._parent_task.uncancel() == 0:
                 # If there are no pending cancellations left,
                 # don't propagate CancelledError.
                 propagate_cancellation_error = None
 
-        if et is not None:
+        if erroring_out:
             if not self._aborting:
-                # Our parent task is being cancelled:
+                # Either the parent task is being cancelled:
                 #
                 #    async with TaskGroup() as g:
                 #        g.create_task(...)
-                #        await ...  # <- CancelledError
+                #        await ...  # <- errors out with CancelledError
                 #
-                # or there's an exception in "async with":
+                # or there's an exception the "async with" block:
                 #
                 #    async with TaskGroup() as g:
                 #        g.create_task(...)
@@ -97,7 +103,7 @@ async def __aexit__(self, et, exc, tb):
                 await self._on_completed_fut
             except exceptions.CancelledError as ex:
                 if not self._aborting:
-                    # Our parent task is being cancelled:
+                    # The parent task is being cancelled:
                     #
                     #    async def wrapper():
                     #        async with TaskGroup() as g:
@@ -113,16 +119,31 @@ async def __aexit__(self, et, exc, tb):
         assert not self._tasks
 
         if self._base_error is not None:
-            raise self._base_error
+            # asyncio currently can't handle SystemExit or KeyboardIterrupt
+            # without corrupting its state. In an event of such an error
+            # we simply immediately propagate it ignoring other errors
+            # (but we still want to at least log them).
+            try:
+                for error in self._errors:
+                    if isinstance(error, exceptions.CancelledError):
+                        continue
+                    cause = type(self._base_error).__name__
+                    self._loop.call_exception_handler({
+                        'message': f'ignoring an error due to {cause}',
+                        'exception': error,
+                        'task': self._parent_task,
+                    })
+            finally:
+                raise self._base_error
+
+        group_type = BaseExceptionGroup
 
         if propagate_cancellation_error is not None:
-            # The wrapping task was cancelled; since we're done with
-            # closing all child tasks, just propagate the cancellation
-            # request now.
-            raise propagate_cancellation_error
-
-        if et is not None and et is not exceptions.CancelledError:
-            self._errors.append(exc)
+            self._errors.append(propagate_cancellation_error)
+            group_type = exceptions.CancelledErrorGroup
+        else:
+            if erroring_out and not issubclass(et, exceptions.CancelledError):
+                self._errors.append(exc)
 
         if self._errors:
             # Exceptions are heavy objects that can have object
@@ -131,7 +152,7 @@ async def __aexit__(self, et, exc, tb):
             errors = self._errors
             self._errors = None
 
-            me = BaseExceptionGroup('unhandled errors in a TaskGroup', errors)
+            me = group_type('unhandled errors in a TaskGroup', errors)
             raise me from None
 
     def create_task(self, coro, *, name=None, context=None):
@@ -195,10 +216,10 @@ def _on_task_done(self, task):
             return
 
         if not self._aborting and not self._parent_cancel_requested:
-            # If parent task *is not* being cancelled, it means that we want
-            # to manually cancel it to abort whatever is being run right now
-            # in the TaskGroup.  But we want to mark parent task as
-            # "not cancelled" later in __aexit__.  Example situation that
+            # If the parent task *is not* being cancelled, it means that we
+            # want to manually cancel it, to abort whatever is being currently
+            # awaited in the TaskGroup.  But we want to mark the parent task as
+            # "not cancelled" later in __aexit__.  An example situation that
             # we need to handle:
             #
             #    async def foo():
@@ -206,13 +227,14 @@ def _on_task_done(self, task):
             #            async with TaskGroup() as g:
             #                g.create_task(crash_soon())
             #                await something  # <- this needs to be canceled
-            #                                 #    by the TaskGroup, e.g.
+            #                                 #    by the TaskGroup, i.e.
             #                                 #    foo() needs to be cancelled
             #        except Exception:
             #            # Ignore any exceptions raised in the TaskGroup
             #            pass
             #        await something_else     # this line has to be called
-            #                                 # after TaskGroup is finished.
+            #                                 # after the TaskGroup is
+            #                                 # finished.
             self._abort()
             self._parent_cancel_requested = True
             self._parent_task.cancel()

@gvanrossum Guido I'm happy to push to your branch or make a new one with this diff if you want me to. I think it would need a couple new tests (and CancelledErrorGroup.derive looks wrong to me)

@graingert
Copy link
Contributor

graingert commented Aug 9, 2022

The key thing we should probably keep is supporting except CancelledError blocks of code that some libraries can have.

I don't think you actually need to support this usecase. Because the only way to end up in this situation is when you use the new TaskGroup code, don't handle the grouped exceptions, and want to catch a single CancelledError only.

Consider what would happen if someone replaced their asyncio.gather calls with TaskGroup: In this situation with the old style when gather is cancelled if any of its child task throws an error it replaces the CancelledError entirely which means the except CancelledError: block wouldn't run anyway.

If TaskGroup now takes special effort to re-raise a bare CancelledError when faced with additional errors from crashing tasks it would actually break existing code by reaching except CancelledError: clauses that were previously unreachable

import asyncio
import contextlib

class Sock:
    async def aclose():
        pass

async def foo():
    with contextlib.closing(Sock()):
        await asyncio.sleep(1)
        print("hello")


async def demo():
    task = asyncio.current_task()
    task.get_loop().call_later(0.1, task.cancel)
    try:
        await asyncio.gather(foo(), foo())
    except asyncio.CancelledError:
        pass # the intent here is to only reach it if asyncio.gather was cancelled cleanly


asyncio.run(demo())

@1st1
Copy link
Member

1st1 commented Aug 9, 2022

Because the only way to end up in this situation is when you use the new TaskGroup code, don't handle the grouped exceptions, and want to catch a single CancelledError only.

That's not the only way. You can have a library that takes an arbitrary task/coroutine and awaits on it, reacting specifically to CancelledError and re-raising it immediately. It's those scenarios that I care about.

If TaskGroup now takes special effort to re-raise a bare CancelledError when faced with additional errors from cashing tasks it would actually break existing code by reaching except CancelledError: clauses that were previously unreachable

Perfect compatibility with asyncio.gather() with never a design goal. As a user I would expect your example code to actually propagate a CancelledError (amongst others) when used with a TaskGroup.

Also "break existing code" isn't an argument here, because a user would be switching from old API to new API that's not declared as a drop-in replacement.

@graingert
Copy link
Contributor

graingert commented Aug 9, 2022

That's not the only way. You can have a library that takes an arbitrary task/coroutine and awaits on it, reacting specifically to CancelledError and re-raising it immediately. It's those scenarios that I care about.

so this would be something like this?

async def some_library(async_fn):
    try:
        await async_fn()
    except asyncio.CancelledError:
        raise
    except BaseException:
        # accidentally catching a BaseExceptionGroup('unhandled errors in a TaskGroup', [asyncio.CancelledError()])
        await cleanup()
        raise

As a user I would expect your example code to actually propagate a CancelledError (amongst others) when used with a TaskGroup.

yeah that's the behavior I'm suggesting here

@1st1
Copy link
Member

1st1 commented Aug 9, 2022

so this would be something like this?

Yeah, along the lines of this.

FWIW I don't want to push hard for the approach I suggested in #95761 (comment) -- the CancelledErrorGroup error feels hacky. I'd be OK if we just say: (a) manually handling CancelledError is an anti-pattern, and (b) we don't want to make that easier for the existing or for the future code that uses regular (non-group) try..except. In which case we can simplify the approach and use a regular BaseExceptionGroup instead of CancelledErrorGroup.

I do feel strongly about including all errors into the group though--my own opinion is that that's key for the usability of asyncio. I always viewed exception groups as a mechanism to deliver the cancellation signal without hiding other errors and IMO we should use the mechanism fully.

@gvanrossum
Copy link
Member Author

We've established one thing clearly: even though there were several tests that seemed to be written to ignore "side errors" (let's call them that) accompanying a cancellation, we shouldn't assume that's the best design, and it's not too late to do something better -- in particular, we should not ignore side errors.

We're past RC1, and I don't think that a bug in the cancel behavior of the brand new TaskGroup and timeout async context managers is worth disrupting the release cycle. Therefore I'd like to do something smaller that we can consider it a bug fix when it lands in 3.11.1. I don't want to wait for 3.12 because by the time that rolls around, the existing 3.11 behavior will be set in stone. But 3.11.1 (which I expect arund two months after 3.11) seems soon enough.

My proposal is to keep the PR as it is, except remove the self._parent_cancel_requested or part from the condition for propagating the CancelledError, and fix the tests that this breaks (I think 15 and 16 -- I added that condition specifically to make those pass).

Now, this will mean that code (old or new) using except CancelledError: will miss a cancellation when one of the tasks crashes during cleanup. I think that's a reasonable price to pay -- for example, if we write

async with asyncio.timeout(1):
    try:
        await asyncio.sleep(2)  # Will be cancelled by the timeout
    finally:
        1/0  # oops, crash in cleanup

this will report ZeroDivisionError, not TimeoutError. So if we move the try/except block to a task and wait for it using a TaskGroup and then wrap that in a timeout(1), it's reasonable that we get the same effect (but wrapped in an ExceptionGroup).

Note that a TaskGroup wrapping another TaskGroup will still call uncancel() when needed, due to #95602 (landed 6 days ago, just before RC1).

I considered proposing adding the CancelledError to the ExceptionGroup and just another exception, but that just complicates the logic, and I think it's a cleaner rule that CancelledError is either propagated singly, or not at all.

Any scenario where a task crashes in cleanup upon cancellation
will now result in an ExceptionGroup wrapping the crash(es)
instead of propagating CancelledError and ignoring the side errors.
@gvanrossum
Copy link
Member Author

Why do you think it's cleaner to only propagate CancelledError alone? IMO it's totally OK to have it propagated in a group with other errors. I don't see the downsides of that or the complications (we don't have to implement my idea of CancelledErrorGroup, fwiw).

Remember that when we designed exception groups, we decided that changing any API from raising FooError() to raising ExceptionGroup("msg", [FooError()]) a backwards incompatible API change?

When we cancel a task (or a future -- remember tasks are futures), and the task handles the cancellation cleanly, then we expect waiting for the task to raise CancelledError. If under certain circumstances it will now instead an ExceptionGroup wrapping a CancelledError that would be an API change. OTOH, if the task crashes during cleanup, other errors can be raised, so that is not a new outcome.

I looked at the timeout() code, and it checks for CancelledError in a way that would not work for an EG wrapping a CE:

            if self._task.uncancel() == 0 and exc_type is exceptions.CancelledError:

I must assume there's plenty of careful cancellation handling code out there that similarly needs CE to be not wrapped.

Anyway, this PR is updated, please have a look.

@1st1
Copy link
Member

1st1 commented Aug 12, 2022

Remember that when we designed exception groups, we decided that changing any API from raising FooError() to raising ExceptionGroup("msg", [FooError()]) a backwards incompatible API change?

Yes, I agree. Although TaskGroups is a new API, so we can tweak it within certain limits.

When we cancel a task (or a future -- remember tasks are futures), and the task handles the cancellation cleanly, then we expect waiting for the task to raise CancelledError. If under certain circumstances it will now instead an ExceptionGroup wrapping a CancelledError that would be an API change.

Yes. Unless we implement what I'm suggesting in #95761 (comment) -- to make a special EG type that is a subclass of CancelledError.

It's really quite elegant -- if an error group contains a CancelledError it itself becomes a CancelledError.

if self._task.uncancel() == 0 and exc_type is exceptions.CancelledError:

This can be fixed to work with my proposal - we just need to replace is with issubclass. Using is for checks like this is an anti-pattern anyways (we never documented CancelledError as a type that must not be subclassed).


I'm approving this PR because it's better to land something than nothing in this case.

But I think that its suboptimal in its design because it obstructs the cancellation in certain cases and can make things very hard to understand.

Consider the traceback of the Thomas' script. With this PR:

child task got error: type(e)=<class 'AttributeError'> e=AttributeError("'Sock' object has no attribute 'close'")
child task got error: type(e)=<class 'AttributeError'> e=AttributeError("'Sock' object has no attribute 'close'")
handled attribute error

With the approach I'm suggesting in #95761 (comment):

child task got error: type(e)=<class 'AttributeError'> e=AttributeError("'Sock' object has no attribute 'close'")
child task got error: type(e)=<class 'AttributeError'> e=AttributeError("'Sock' object has no attribute 'close'")
handled attribute error
  + Exception Group Traceback (most recent call last):
  |   File "/Users/yury/dev/python/cpython/t3.py", line 30, in <module>
  |     asyncio.run(main())
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/runners.py", line 187, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/runners.py", line 119, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/base_events.py", line 650, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "/Users/yury/dev/python/cpython/t3.py", line 20, in main
  |     async with asyncio.TaskGroup() as tg:
  |   File "/Users/yury/dev/python/cpython/Lib/asyncio/taskgroups.py", line 157, in __aexit__
  |     raise me from None
  | asyncio.exceptions.CancelledErrorGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/yury/dev/python/cpython/t3.py", line 25, in main
    |     await asyncio.sleep(2)
    |   File "/Users/yury/dev/python/cpython/Lib/asyncio/tasks.py", line 644, in sleep
    |     return await future
    |            ^^^^^^^^^^^^
    | asyncio.exceptions.CancelledError
    |
    | During handling of the above exception, another exception occurred:
    |
    | Traceback (most recent call last):
    |   File "/Users/yury/dev/python/cpython/t3.py", line 24, in main
    |     async with contextlib.aclosing(sock):
    |   File "/Users/yury/dev/python/cpython/Lib/asyncio/tasks.py", line 644, in sleep
    |     return await future
    |            ^^^^^^^^^^^^
    | asyncio.exceptions.CancelledError
    +------------------------------------

Guido, I'm curious what you think specifically about my idea.


That said this PR is still a huge improvement over what's in the main branch. I'm OK to land it now and continue the discussion if you are open to it.

@graingert
Copy link
Contributor

@1st1 can you make a draft PR for your CancelledError as an ExceptionGroup change? I suspect it will end up hiding exceptions again in async with timeout() and I think changing EG semantics would actually be easier

@gvanrossum
Copy link
Member Author

When we cancel a task (or a future -- remember tasks are futures), and the task handles the cancellation cleanly, then we expect waiting for the task to raise CancelledError. If under certain circumstances it will now instead an ExceptionGroup wrapping a CancelledError that would be an API change.

Yes. Unless we implement what I'm suggesting in #95761 (comment) -- to make a special EG type that is a subclass of CancelledError.

It's really quite elegant -- if an error group contains a CancelledError it itself becomes a CancelledError.

I think it will be full of surprises. I did a few experiments, and it seems that if you have a a subclass of SomeException and ExceptionGroup, and you raise one of those, if you write except* SomeException: then it will catch just that -- other except* blocks that match one or more of the exceptions in the group will not trigger.

(Even if we were to decide that that as a bug in the exception group implementation, it would be too late to fix in 3.11.)

if self._task.uncancel() == 0 and exc_type is exceptions.CancelledError:

This can be fixed to work with my proposal - we just need to replace is with issubclass. Using is for checks like this is an anti-pattern anyways (we never documented CancelledError as a type that must not be subclassed).

It may be an antipattern, but the taskgroup code I inherited from you also uses this, and I suspect it's a common pattern, because people don't expect CancelledError to be subclassed.

In addition, in the scenario where we bubble up the CancelledError plus some other "side exceptions", it would not be okay for timeout() to recognize the CancelledError and raise TimeoutError, dropping the side errors on the floor. The same would be true for other scenarios where CancelledError is being handled.

So I think that, alas, your idea is not going to work, and we're better of with my "conservative" approach which replaces the CancelledError with an exception group bundling the side errors (if there are any).

I'm approving this PR because it's better to land something than nothing in this case.

Thanks -- I will give it a little more time but not too much since RC2 is already looming.

@graingert
Copy link
Contributor

graingert commented Aug 12, 2022

It may be an antipattern, but the taskgroup code I inherited from you also uses this, and I suspect it's a common pattern, because people don't expect CancelledError to be subclassed.

I believe this anti-pattern comes from people trying to emulate the except T as e: vs isinstance(e, T) differences exposed to implementers of __exit__ https://discuss.python.org/t/expose-pyerr-givenexceptionmatches-in-inspect/17420/4

@graingert
Copy link
Contributor

graingert commented Aug 12, 2022

(Even if we were to decide that that as a bug in the exception group implementation, it would be too late to fix in 3.11.)

As long as the door is open for a fix in 3.12, 3.11.1? We can fix it in the backport, people are going to need to use the backport until Oct 2026 already hopefully they're happy to wait till October 2027

@graingert
Copy link
Contributor

What Trio does is not necessarily what we should do, they have different constraints (its MultiError or whatever behaves differently than ExceptionGroup).

We're porting to EG already so I'm not sure what you mean by this

@gvanrossum
Copy link
Member Author

It may be an antipattern, but the taskgroup code I inherited from you also uses this, and I suspect it's a common pattern, because people don't expect CancelledError to be subclassed.

I believe this anti-pattern comes from people trying to emulate the except T as e: vs isinstance(e, T) differences exposed to implementers of __exit__ https://discuss.python.org/t/expose-pyerr-givenexceptionmatches-in-inspect/17420/4

And what is in your opinion the difference between except T as e: vs isinstance(e, T)? There are differences related to virtual base classes but nothing that would lead people to write type(e) is T. The example by Steven D'Aprano is also irrelevant (it tries to do something clever about raise T vs. raise T() but that is also irrelevant to this discussion.

@gvanrossum
Copy link
Member Author

(Even if we were to decide that that as a bug in the exception group implementation, it would be too late to fix in 3.11.)

As long as the door is open for a fix in 3.12, 3.11.1?

That depends on whether it is backwards compatible. And whether we actually think it is a useful feature. I wouldn't get my hopes up.

We can fix it in the backport, people are going to need to use the backport until Oct 2026 already hopefully they're happy to wait till October 2027

And by "the backport" you mean the emulation on PyPI, right? That seems of limited relevance given that we're talking about the semantics of except* here (and in particular the bit of semantics that allows us to write except* FooError and catch a bare FooError, i.e. one that isn't wrapped in an ExceptionGroup).

My personal opinion: give up on this idea, it is more complicated than you think.

@gvanrossum
Copy link
Member Author

What Trio does is not necessarily what we should do, they have different constraints (its MultiError or whatever behaves differently than ExceptionGroup).

We're porting to EG already so I'm not sure what you mean by this

IIRC this was when we were still deciding whether to throw away "side errors" or not. That has been decided (we don't), so I see no reason to further litigate this.

@graingert
Copy link
Contributor

I'm talking about with exceptiongroup.catch(): where we have the same flexibility as except*

@graingert
Copy link
Contributor

graingert commented Aug 12, 2022

The example by Steven D'Aprano is also irrelevant (it tries to do something clever about raise T vs. raise T() but that is also irrelevant to this discussion.

🤦 I meant to link to the top of the discussion, but scrolling discourse changes the URL in the browser. It is indeed totally irrelevant

@gvanrossum
Copy link
Member Author

gvanrossum commented Aug 12, 2022

@graingert Please focus. Do you agree with Yury that the current PR is an improvement over what’s in main? Do you have any specific feedback on the PR?

(Do any others still reading have feedback? @Tinche ?)

Copy link
Contributor

@Tinche Tinche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM.

I was warming up to the cleverness of Yury's idea but if you think there be dragons I'm convinced.

@gvanrossum
Copy link
Member Author

@pablogsal told me that he's open to backporting this to 3.11, so I'll go through the movements.

@gvanrossum gvanrossum added the needs backport to 3.11 only security fixes label Aug 17, 2022
@miss-islington
Copy link
Contributor

Thanks @gvanrossum for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Aug 17, 2022
@bedevere-bot
Copy link

GH-96048 is a backport of this pull request to the 3.11 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Aug 17, 2022
…pythonGH-95761)

When a task catches CancelledError and raises some other error,
the other error should not silently be suppressed.

Any scenario where a task crashes in cleanup upon cancellation
will now result in an ExceptionGroup wrapping the crash(es)
instead of propagating CancelledError and ignoring the side errors.

NOTE: This represents a change in behavior (hence the need to
change several tests).  But it is only an edge case.

Co-authored-by: Thomas Grainger <tagrain@gmail.com>
(cherry picked from commit f51f54f)

Co-authored-by: Guido van Rossum <guido@python.org>
miss-islington added a commit that referenced this pull request Aug 17, 2022
…5761)

When a task catches CancelledError and raises some other error,
the other error should not silently be suppressed.

Any scenario where a task crashes in cleanup upon cancellation
will now result in an ExceptionGroup wrapping the crash(es)
instead of propagating CancelledError and ignoring the side errors.

NOTE: This represents a change in behavior (hence the need to
change several tests).  But it is only an edge case.

Co-authored-by: Thomas Grainger <tagrain@gmail.com>
(cherry picked from commit f51f54f)

Co-authored-by: Guido van Rossum <guido@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants