Support async cancellations. #642

Zac-HD · 2022-12-22T12:41:36Z

Over at @anthropics we're keen users of httpx on Trio, and I've noticed that we sometimes have issues where cancelling a task doesn't get all the teardown logic right. For example, if you try to await (or async for/with) in an except BaseException: or finally: block, Trio will immediately re-raise a Cancelled exception instead of running your teardown logic.

This can be pretty subtle, since it's happening as you abandon a task anyway, but can cause various subtle or unsubtle problems that motivated us to build flake8-trio, including the TRIO102 error for this case. I'm pretty sure that httpcore has multiple such issues, e.g.

httpcore/httpcore/_async/connection_pool.py

Lines 251 to 253 in f0657cb

    
           except BaseException as exc: 
        
               await self.response_closed(status) 
        
               raise exc

doesn't look cancel-safe to me. Do you consider "this doesn't close the connection properly when cancelled" a bug? If so, please consider this a fairly general bug-report!

(I've also found it infeasible to consistently get this right without tool support. If you want to try flake8-trio many of the checks are framework-agnostic; and if it'd be useful for httpcore and httpx we'd be happy to add anyio support to the linter 🙂)

The text was updated successfully, but these errors were encountered:

tomchristie · 2022-12-24T12:12:37Z

Hi Zac,

Good to hear from you.

Do you consider "this doesn't close the connection properly when cancelled" a bug?

It sounds like one, yup.

I'd expect there to be plenty of subtleties around this...

How can we start narrowing this down? What broken state can we end up in? Are we able to replicate this in a test case at all?

Zac-HD · 2023-01-24T12:15:09Z

(sorry, this dropped into the holiday void; followups should be faster)

Pulling out https://github.com/encode/httpcore/blob/f0657cb43cb707d1672b93b61bb53b4cfb166820/httpcore/_async/connection_pool.py#LL226-L234C30 as a self-contained example:

            try:
                connection = await status.wait_for_connection(timeout=timeout)
            except BaseException as exc:
                # If we timeout here, or if the task is cancelled, then make
                # sure to remove the request from the queue before bubbling
                # up the exception.
                async with self._pool_lock:
                    self._requests.remove(status)
                    raise exc

Suppose that we're running this task inside a Trio nursery or anyio taskgroup, which gets cancelled while we await status.wait_for_connection(). Then we'll be in a cancelled state during the except: block, which means that the attempt to aquire the lock will fail (by raising Cancelled) because in this state Trio only allows a checkpoint (ie await, async for, or checkpoint on enter or exit of an async with) if you've opened a shielded cancel scope.

The impact in this case is that we leak status into self._requests, which probably isn't a huge problem. I'm confident that you could reproduce this with a careful test that started waiting for a connection, slept part of the timeout, and then cancelled the nursery/taskgroup. The self.response_closed(status)-not-running case in my top post is the same principle, just for a larger chunk of code.

This is easily solved by wrapping with anyio.CancelScope(shield=True): around async with self._pool_lock:, and though I expect there are more serious problems latent if we go looking they should also be pretty easy to fix. The real problem is in consistently remembering all these fiddly invariants, which is why I've been working on flake8-trio to flag them for us.

So my suggested solution is to add flake8-trio to your CI system, and then we'll get to work on python-trio/flake8-async#61 so that the currently-trio-specific checks also handle anyio - and then it should 'just' be a matter of fixing lint warnings as they come up!

PJ-Schulz · 2023-03-07T14:26:48Z

Hello,
I have found an equal problem with trio.Cancelled and excepting BaseException. I want to make requests with httpx.AsyncClient and cancel them with a cancel_scope after some time, example:

with trio.move_on_after(5):
    response = await client.get(url)

This snippet is called regularly with the same client. After some time a PoolTimeout exception is thrown. The client can't get a connection from the connection_pool because all 100 default connections are not available.

What happened? If trio.Cancelled is called at the time of sending the request and waiting for the response, the connection gets to an unusable state. At the following Point, the trio.Cancelled Exception is catched by BaseException. The asynchronous code in the block will then not be executed. The connection is not cleared and all connections in the connection pool are occupied after a while.

httpcore/httpcore/_async/http11.py

Lines 113 to 116 in 95406e6

    
           except BaseException as exc: 
        
               async with Trace("http11.response_closed", request) as trace: 
        
                   await self._response_closed() 
        
               raise exc

A PoolTimeout then occurs when a new request is made, since no more connections are available.

This behavior can be shown in this example:

async def bar():
    await trio.sleep(0.5)
    print("World")

async def foo():
    try:
        print("Hello")
        await trio.sleep(2)
    except BaseException as ex:
        await bar()

async def main():
    with trio.move_on_after(1):
        await foo()
        
trio.run(main)
# >>> Hello

The example prints 'Hello' but not 'World'

Zac-HD · 2023-03-07T18:36:32Z

See also trio.abc.AsyncResource.aclose() and trio.aclose_forcefully() (anyio has the same functions, but less commentary in the docs):

IMPORTANT: This method may block in order to perform a “graceful” shutdown. But, if this fails, then it still must close any underlying resources before returning. An error from this method indicates a failure to achieve grace, not a failure to close the connection.

So self._response_closed() could be considered buggy, because if the task has been cancelled then it will raise when entering async with self._state_lock: and never call the aclose() method at all. I think this can be fixed by wrapping it in a shielded cancelscope, and maybe that could even go inside the method?

PJ-Schulz · 2023-03-08T06:41:54Z

The problem with the shielded cancelscope is that when aclose() errors or hangs, the whole app gets stuck at that position.

Furthermore, if then aclose for whatever reason, needs for example 2 seconds in my use case, the timeout I have set in the example to 5 seconds is now at one time at 7 seconds.

So you need a timeout and the shielded cancelscope:

with trio.move_on_after(CLEANUP_TIMEOUT) as cleanup_scope:
    cleanup_scope.shield = True
    await aclose()

The question here is, which value is sensible to choose for CLEANUP_TIMEOUT

Zac-HD · 2023-03-08T07:02:44Z

In this case I'd probably go for fail_after(5), but pick any small number that seems to work - but I was wrong to suggest a shielded cancel scope above. The linter warning assumes that you always use async resources as context managers, rather than explicitly .aclose()ing them 😅

The proper solution is to ensure that if await aclose() is cancelled, it still closes the resource before control flow returns. In this case, that means returning the connection to the pool.

stereobutter · 2023-03-08T11:32:11Z

I have not dug too deep into this but maybe it is possible to split the connection pool logic (i.e. whether or not to return the connection to the pool) from async stuff? Also when a BaseException like trio.Cancelled is encountered in

httpcore/httpcore/_async/http11.py

Lines 113 to 116 in 95406e6

    
           except BaseException as exc: 
        
               async with Trace("http11.response_closed", request) as trace: 
        
                   await self._response_closed() 
        
               raise exc

I believe the checks in

httpcore/httpcore/_async/http11.py

Lines 216 to 228 in 95406e6

    
           async def _response_closed(self) -> None: 
        
               async with self._state_lock: 
        
                   if ( 
        
                       self._h11_state.our_state is h11.DONE 
        
                       and self._h11_state.their_state is h11.DONE 
        
                   ): 
        
                       self._state = HTTPConnectionState.IDLE 
        
                       self._h11_state.start_next_cycle() 
        
                       if self._keepalive_expiry is not None: 
        
                           now = time.monotonic() 
        
                           self._expire_at = now + self._keepalive_expiry 
        
                   else: 
        
                       await self.aclose()

are not needed anyways and the connection should always be closed?

encode/httpcore#642

tomchristie · 2023-07-04T08:40:22Z

Closed by #726

Zac-HD mentioned this issue Mar 14, 2023

Should ASYNC102 special-case .aclose()? python-trio/flake8-async#156

Closed

tomchristie added the bug Something isn't working label May 19, 2023

This was referenced May 31, 2023

asyncio.exceptions.CancelledError not releasing connection from the connection pool #658

Closed

Fix response clean up #704

Closed

tomchristie changed the title ~~Teardown errors when tasks are cancelled or time out~~ Support async cancellations. Jun 14, 2023

Zaczero added a commit to Zaczero/osm-relatify that referenced this issue Jun 20, 2023

Avoid long-lasting http pools since they leak

9b658fc

encode/httpcore#642

tomchristie mentioned this issue Jul 4, 2023

Support async cancellations. #726

Merged

5 tasks

tomchristie closed this as completed Jul 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support async cancellations. #642

Support async cancellations. #642

Zac-HD commented Dec 22, 2022

tomchristie commented Dec 24, 2022

Zac-HD commented Jan 24, 2023

PJ-Schulz commented Mar 7, 2023

Zac-HD commented Mar 7, 2023

PJ-Schulz commented Mar 8, 2023

Zac-HD commented Mar 8, 2023

stereobutter commented Mar 8, 2023

tomchristie commented Jul 4, 2023

Support async cancellations. #642

Support async cancellations. #642

Comments

Zac-HD commented Dec 22, 2022

tomchristie commented Dec 24, 2022

Zac-HD commented Jan 24, 2023

PJ-Schulz commented Mar 7, 2023

Zac-HD commented Mar 7, 2023

PJ-Schulz commented Mar 8, 2023

Zac-HD commented Mar 8, 2023

stereobutter commented Mar 8, 2023

tomchristie commented Jul 4, 2023