Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncio as_completed() thrashes adding and removing callbacks #64765

Closed
glangford mannequin opened this issue Feb 8, 2014 · 9 comments
Closed

asyncio as_completed() thrashes adding and removing callbacks #64765

glangford mannequin opened this issue Feb 8, 2014 · 9 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@glangford
Copy link
Mannequin

glangford mannequin commented Feb 8, 2014

BPO 20566
Nosy @gvanrossum, @pitrou, @vstinner, @1st1
Files
  • test_thrash.py: Test code
  • output.txt: Test output (with instrumented code in _wait()
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-02-13.01:58:53.772>
    created_at = <Date 2014-02-08.20:18:14.254>
    labels = ['library', 'performance']
    title = 'asyncio as_completed() thrashes adding and removing callbacks'
    updated_at = <Date 2014-03-17.06:30:46.450>
    user = 'https://bugs.python.org/glangford'

    bugs.python.org fields:

    activity = <Date 2014-03-17.06:30:46.450>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-02-13.01:58:53.772>
    closer = 'gvanrossum'
    components = ['Library (Lib)']
    creation = <Date 2014-02-08.20:18:14.254>
    creator = 'glangford'
    dependencies = []
    files = ['34000', '34001']
    hgrepos = []
    issue_num = 20566
    keywords = []
    message_count = 9.0
    messages = ['210679', '210680', '210683', '210690', '210709', '211054', '211055', '211122', '213801']
    nosy_count = 6.0
    nosy_names = ['gvanrossum', 'pitrou', 'vstinner', 'python-dev', 'yselivanov', 'glangford']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue20566'
    versions = ['Python 3.4', 'Python 3.5']

    @glangford
    Copy link
    Mannequin Author

    glangford mannequin commented Feb 8, 2014

    In asyncio, tasks.py as_completed() appears to trigger adding and removing callbacks multiple times for the pending set of futures, each time a single future completes.

    For example, to wait for 5 futures which complete at different times:

    • as_completed() uses _wait()
    • _wait() will add 5 callbacks, one for each future
    • when one future completes, a callback is triggered and all 5 callbacks are removed; this is because _wait() was called with FIRST_COMPLETED
    • for the 4 remaining futures - 4 callbacks have to be added back again
    • when the next future completes, the 4 callbacks are removed
      etc…

    The worst case is if as_completed() is called to wait for all of a larger number of futures, which complete at different times. For example, with 100 futures worst case, ~10,000 callback adds and removes would be performed.

    (I am very new to the asyncio code, so I don't have a patch to offer at this point).

    @glangford glangford mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Feb 8, 2014
    @gvanrossum
    Copy link
    Member

    Yup, I remember feeling a bit guilty doing it this way, but at least the semantics are correctly, and there were bigger fish to fry. Thanks for the test code that proves the issue!

    I assume it can be fixed by not using _wait() but some custom approach. If we get this done by RC2, fine, otherwise we'll just have to documented that this is O(N**2) and not to use it for large numbers of tasks until the fix lands, perhaps in 3.4.1. (Usually there's a solution that avoids as_completed() altogether.)

    I've created upstream bug http://code.google.com/p/tulip/issues/detail?id=127 to track this.

    @gvanrossum
    Copy link
    Member

    BTW, just curious: Glenn, what led you to discover this?

    @glangford
    Copy link
    Mannequin Author

    glangford mannequin commented Feb 8, 2014

    Glenn, what led you to discover this?

    It was found by code inspection. I recently started looking at concurrent.futures, really just curious as to how futures were implemented. Because one of the concurrent.futures bugs I raised also applied to asyncio, I started poking into the source for it as well.

    @gvanrossum
    Copy link
    Member

    I'm looking into a solution for this. The idea is pretty straightforward: http://codereview.appspot.com/61210043.

    This needs more code to support the optional timeout feature, and it now returns Futures instead of coroutines (which I think is fine).

    But to my surprise, test_as_completed_reverse_wait() failed. After nearly an hour of debugging my own code I realized that this test specifically verifies the following weird behavior: if you get two values (futures/coroutines) out of as_completed() without waiting for either, and then wait for the *second* one, it will wait for the *first* result. I guess this is defensible because it is the first one you wait for, but I find it hard to believe that this is desirable behavior -- even though I wrote the code and the test! (http://code.google.com/p/tulip/source/detail?r=674355412f33.)

    So I'd like permission to just change these semantics. They aren't clear from the docs or from PEP-3156, and concurrent.futures.as_completed() doesn't have the same issue (there, __next__() on the iterator blocks until the result is ready).

    @gvanrossum
    Copy link
    Member

    Everyone interested, I plan to push the latest version on view in Rietveld tomorrow: http://codereview.appspot.com/61210043

    It's not as drastic a rewrite as my original attempt; Glenn's idea of using a Queue worked out great!

    @gvanrossum
    Copy link
    Member

    Note: The new version does *not* change the semantics as mentioned in msg210709. Nobody should depend on those semantics anyway.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 13, 2014

    New changeset 6e04027ed53e by Guido van Rossum in branch 'default':
    asyncio: Change as_completed() to use a Queue, to avoid O(N**2) behavior. Fixes issue bpo-20566.
    http://hg.python.org/cpython/rev/6e04027ed53e

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 17, 2014

    New changeset b52113fb58a5 by Guido van Rossum in branch '3.4':
    asyncio: Change as_completed() to use a Queue, to avoid O(N**2) behavior. Fixes issue bpo-20566.
    http://hg.python.org/cpython/rev/b52113fb58a5

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant