Skip to content

bpo-34210: Small improvements in heapq (refactoring)#8439

Closed
Amper wants to merge 1 commit intopython:masterfrom
Amper:small_heapq_improvements
Closed

bpo-34210: Small improvements in heapq (refactoring)#8439
Amper wants to merge 1 commit intopython:masterfrom
Amper:small_heapq_improvements

Conversation

@Amper
Copy link
Copy Markdown
Contributor

@Amper Amper commented Jul 24, 2018

@Amper Amper requested a review from rhettinger as a code owner July 24, 2018 15:08
@Amper Amper changed the title bpo-34210: Small improvements in heapq (refatoring) bpo-34210: Small improvements in heapq (refactoring) Jul 24, 2018
@rhettinger rhettinger closed this Jul 24, 2018
@Amper
Copy link
Copy Markdown
Contributor Author

Amper commented Jul 24, 2018

@rhettinger I'm sorry, maybe my mistake is that I sent too many things in one commit.
And I agree, paragraph 2 degrades readability in favor of speed. And code formatting was superfluous. I'll take these mistakes into account in the future.

But still the main change that I wanted to focus on in paragraph 1. It's about code:

if n >= size:
    return sorted(iterable, key=key)[:n]

In my opinion, using a slice here is unnecessary (because the number of elements is always known to be no greater than n) and it makes the code slower.

According to my measurements this branch in the code faster by 12-15% if you remove the slice using:

def nlargest(n, iterable, key=None):
    ...
    try:
        size = len(iterable)
    except (TypeError, AttributeError):
        pass
    else:
        if n >= size:
            return sorted(iterable, key=key, reverse=True)[:n]
    ...

def nlargest_new(n, iterable, key=None):
    ...
    try:
        size = len(iterable)
    except (TypeError, AttributeError):
        pass
    else:
        if n >= size:
            return sorted(iterable, key=key, reverse=True)
    ...

if __name__ == "__main__":

    ...

    from timeit import timeit
    from random import randrange

    data = [randrange(10000) for i in range(10000)]

    test = """
for i in (10000, 5000, 1000, 500, 100, 10, 5, 1):
    data = data[:i]
    for n in (i, i + 1, i + 10, i + 100, i + 1000, i + 5000, i + 10000):
        result = list(nlargest(n, data))
"""

    print('before:', timeit(test, setup="from __main__ import data, nlargest", number=100000))
    print('after:', timeit(test, setup="from __main__ import data, nlargest_new as nlargest", number=100000))

Here are the results of one of the launches:

before: 5.075591027
after: 4.450232958999999

I apologize for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants