-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In some cases asyncio.wait_for can lead to socket leak. #81839
Comments
In some cases Condensed example: async def _create_connection(timeout=60, ssl_obj):
loop = asyncio.get_event_loop()
connector = loop.create_connection(MyEchoClientProtocol, '127.0.0.1', 5000, ssl=ssl_obj)
connector = asyncio.ensure_future(connector)
tr, pr = await asyncio.wait_for(connector, timeout=timeout, loop=loop)
return tr, pr
async def main():
...
res = await asyncio.wait_for(_acquire_impl(), timeout=timeout, loop=loop) If my understanding is correct
I provided source code for client and server so the problem can be easily reproduced on your system. certificate and key can be easily generated with I found out that if I catch try:
tr, pr = await asyncio.wait_for(connector, timeout=timeout, loop=loop)
return tr, pr
except asyncio.CancelledError as e:
connector.add_done_callback(_done_callback)
raise e then inside of I run The output depends on your hardware so you might need to tweak the timeout parameter |
server example: https://gist.github.com/hexrain/bc92aa70eebc229365f0ce4bcccf7fc4 |
Issue bpo-42130 that was recently filed appears related to this change. |
The current solutions doesn't look correct. It swallows cancelling causing task to hang: https://bugs.python.org/issue42130 . Proposed test case calls cancel for inner future and set_result for outer task in the same loop step. The old (prior to this patch) behaviour in this case looks correct to me. We have to elaborate on original problem to find the source of actual race. Therefore I've tried to reproduce the problem with original code sample and published the adapted version of it here: https://github.com/ods/bpo-37658 . Nikita Ilyasov, please correct me if I've misunderstood what you meant. My result is that proposed solution actually doesn't solve the problem, but rather lowers the chance to get it. Namely, I get the traceback quoted in README both with 3.8.5, 3.9.4, and current code in main. |
Some discussion leading up to that change is here MagicStack/asyncpg#548 and in the issues it links. |
The original problem can be fixed by wrapping await into try-except block:
I've updated my example to reproduce this: ods/bpo-37658@eca3d81 I believe it's general problem with maintaining atomicity in async code, and not a bug in |
Wrapping every resource allocating call like that is what we were trying to avoid, since it makes wait_for go from a simple one-line helper to something you have to be very careful with. Conceptually, a user should expect that wait_for should behave the exact same as awaiting the underlying awaitable, just with auto-cancellation. The problem with the current wait_for is that there is a gap where the underlying task may have completed but a cancellation arrives. In this case, we need to raise the cancellation to be a good asyncio citizen, but the underlying task has no opportunity to act on the cancellation (to free the resource) since it is already complete and cannot be re-entered. So the resource returned by the completed task gets stuck in limbo, since we can't return it and we can't assume a generic 'close' behaviour. See my comment in the PR for a suggestion about an alternative structure for wait_for, which may avoid this gap and hence prevent the leak (but I have not tested it!) |
I did a little research on how the The repository (on the linked branch) has tox set-up and a test that asserts the behaviour of builtin and alternate implementations. Quick summary:
About the actually working alternate implementation I made: async def create_something():
# this acquires some resource that needs explicit cleanup after some work
return object()
def cleanup_something(inst):
inst.close()
t = asyncio.ensure_future(create_something())
x = await asyncio.wait_for(t, timeout=10, cancel_handler=cleanup_something)
try:
pass # normal work with x
finally:
cleanup_something(x) # cleanup at normal execution The inner task is still responsible for handling the resource before it returns, which means if the inner task is cancelled, there must be no leak if the implementation is correct. If no cancellation happens, everything is "fine". Finally, the waiter task would be able to handle its cancellation and the simultaneous completion of the inner task if the caller code provides a callback. Unfortunately this requires the user of |
Re: msg393586
Unfortunately this didn't close the gap, so the leak persisted. I did some research on the source of the error. There are circular references between instances of _SelectorSocketTransport and _SSLProtocolTransport (see attached image), therefore these objects are destroyed by garbage collector. The order might be arbitrary. When _SelectorSocketTransport is destroyed after _SSLProtocolTransport everything is ok, but with the opposite order we get the error printed. Here is description on how to reproduce this: |
Can I get a review? Seems like a simple mistake given the original description of this issue:
The try/except blocks clearly add a 3rd condition, where the inner task is completed and a TimeoutError is raised without returning the result. |
@asvetlov Anything left to do here or can this be closed ? |
There is still an issue here. I proposed a possible solution at: #28149 (comment) You can also scroll up through the lengthy discussion to see how I reached that conclusion. I've not had time yet to actually try implementing something yet. |
any question about this issue? if not, i think it is better to close it |
Yes, see above:
|
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: