Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nlargest() and nsmallest() to heapq. #40369

Closed
rhettinger opened this issue Jun 9, 2004 · 5 comments
Closed

Add nlargest() and nsmallest() to heapq. #40369

rhettinger opened this issue Jun 9, 2004 · 5 comments
Assignees
Labels
stdlib Python modules in the Lib dir

Comments

@rhettinger
Copy link
Contributor

BPO 969791
Nosy @tim-one, @rhettinger
Files
  • heap2.diff: Patch for nsmallest() and nlargest()
  • heapperf.py: Comparison counting suite
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/tim-one'
    closed_at = <Date 2004-06-10.04:53:26.000>
    created_at = <Date 2004-06-09.17:59:15.000>
    labels = ['library']
    title = 'Add nlargest() and nsmallest() to heapq.'
    updated_at = <Date 2004-06-10.04:53:26.000>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2004-06-10.04:53:26.000>
    actor = 'rhettinger'
    assignee = 'tim.peters'
    closed = True
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2004-06-09.17:59:15.000>
    creator = 'rhettinger'
    dependencies = []
    files = ['6020', '6021']
    hgrepos = []
    issue_num = 969791
    keywords = ['patch']
    message_count = 5.0
    messages = ['46148', '46149', '46150', '46151', '46152']
    nosy_count = 2.0
    nosy_names = ['tim.peters', 'rhettinger']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue969791'
    versions = ['Python 2.4']

    @rhettinger
    Copy link
    Contributor Author

    This patch adds a function that encapsulates a principal
    use case for heaps, finding the n largest from a dataset
    without doing a full sort.

    @rhettinger rhettinger added the stdlib Python modules in the Lib dir label Jun 9, 2004
    @rhettinger rhettinger added the stdlib Python modules in the Lib dir label Jun 9, 2004
    @rhettinger
    Copy link
    Contributor Author

    Logged In: YES
    user_id=80475

    Revised the patch to also include nsmallest().

    Note, the API intentionally does not provide default values of
    n. This is to encourage use of min() and sorted() for corner
    cases.

    @tim-one
    Copy link
    Member

    tim-one commented Jun 9, 2004

    Logged In: YES
    user_id=31435

    +1 on nlargest().

    -0 on nsmalles(), because it has radically different time and
    space requirements than nlargest() for the typical case (n is
    much less than the total # of elements). If the user can
    afford all that space, it will probably be faster for the user to
    sort a materialized list then peel off the first n. That's a
    simple one-liner. If so, that makes nsmallest an "attractive
    nuisance". If heapq had a more abstract interface, so that
    we had both min-heaps and max-heaps, this objection would
    of course go away. But we don't, so ...

    @rhettinger
    Copy link
    Contributor Author

    Logged In: YES
    user_id=80475

    For many values of n, nsmallest() still makes substantially
    fewer comparisons than sort(). Attaching comparison count
    test suite.

    Here are the comparison counts for a list with a 1000 random
    entries and various values of n:

    8649 sort

    999 1 nlargest
    1667 1 nsmallest

    1088 5 nlargest
    1709 5 nsmallest

    1238 10 nlargest
    1759 10 nsmallest

    1487 20 nlargest
    1860 20 nsmallest

    2052 40 nlargest
    2066 40 nsmallest

    2953 80 nlargest
    2466 80 nsmallest

    4516 160 nlargest
    3262 160 nsmallest

    6835 320 nlargest
    4811 320 nsmallest

    9177 640 nlargest
    7736 640 nsmallest

    9742 1000 nlargest
    10315 1000 nsmallest

    P.S. Using itertools, the nsmallest() can be made to run at
    C speed:

    def nsmallest(iterable, n):
        """Find the n smallest elements in a dataset.
    Equivalent to:  sorted(iterable)[:n]
    """
    h = list(iterable)
    heapify(h)
    return map(heappop, repeat(h, min(n, len(h))))
    

    @rhettinger
    Copy link
    Contributor Author

    Logged In: YES
    user_id=80475

    Taking this one back off Tim's busy plate.

    If I deluded myself with the attached tests, he can give me a
    hard time later ;-)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants