New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asyncio as_completed() thrashes adding and removing callbacks #64765
Comments
In asyncio, tasks.py as_completed() appears to trigger adding and removing callbacks multiple times for the pending set of futures, each time a single future completes. For example, to wait for 5 futures which complete at different times:
The worst case is if as_completed() is called to wait for all of a larger number of futures, which complete at different times. For example, with 100 futures worst case, ~10,000 callback adds and removes would be performed. (I am very new to the asyncio code, so I don't have a patch to offer at this point). |
Yup, I remember feeling a bit guilty doing it this way, but at least the semantics are correctly, and there were bigger fish to fry. Thanks for the test code that proves the issue! I assume it can be fixed by not using _wait() but some custom approach. If we get this done by RC2, fine, otherwise we'll just have to documented that this is O(N**2) and not to use it for large numbers of tasks until the fix lands, perhaps in 3.4.1. (Usually there's a solution that avoids as_completed() altogether.) I've created upstream bug http://code.google.com/p/tulip/issues/detail?id=127 to track this. |
BTW, just curious: Glenn, what led you to discover this? |
It was found by code inspection. I recently started looking at concurrent.futures, really just curious as to how futures were implemented. Because one of the concurrent.futures bugs I raised also applied to asyncio, I started poking into the source for it as well. |
I'm looking into a solution for this. The idea is pretty straightforward: http://codereview.appspot.com/61210043. This needs more code to support the optional timeout feature, and it now returns Futures instead of coroutines (which I think is fine). But to my surprise, test_as_completed_reverse_wait() failed. After nearly an hour of debugging my own code I realized that this test specifically verifies the following weird behavior: if you get two values (futures/coroutines) out of as_completed() without waiting for either, and then wait for the *second* one, it will wait for the *first* result. I guess this is defensible because it is the first one you wait for, but I find it hard to believe that this is desirable behavior -- even though I wrote the code and the test! (http://code.google.com/p/tulip/source/detail?r=674355412f33.) So I'd like permission to just change these semantics. They aren't clear from the docs or from PEP-3156, and concurrent.futures.as_completed() doesn't have the same issue (there, __next__() on the iterator blocks until the result is ready). |
Everyone interested, I plan to push the latest version on view in Rietveld tomorrow: http://codereview.appspot.com/61210043 It's not as drastic a rewrite as my original attempt; Glenn's idea of using a Queue worked out great! |
Note: The new version does *not* change the semantics as mentioned in msg210709. Nobody should depend on those semantics anyway. |
New changeset 6e04027ed53e by Guido van Rossum in branch 'default': |
New changeset b52113fb58a5 by Guido van Rossum in branch '3.4': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: