Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a key parameter (like sorted) to heapq.merge #57951

Closed
ssapin mannequin opened this issue Jan 9, 2012 · 21 comments
Closed

Add a key parameter (like sorted) to heapq.merge #57951

ssapin mannequin opened this issue Jan 9, 2012 · 21 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@ssapin
Copy link
Mannequin

ssapin mannequin commented Jan 9, 2012

BPO 13742
Nosy @rhettinger, @terryjreedy, @mdickinson, @giampaolo, @merwok, @ericsnowcurrently, @berkerpeksag, @serhiy-storchaka
Files
  • heapq_merge_key.patch: 'hg diff' output against rev ca2a35140e6a
  • benchmark_heapq_merge.py: benchmark several implementations
  • heapq_merge_key_duplicate.patch: 'hg diff' output against rev ca2a35140e6a
  • heap.diff: Rough draft (untested) for a Heap() class
  • heap2.diff: Update the draft Heap() class
  • keymerge.diff: Draft patch without doc updates
  • keymerge2.diff: Add both key=func and reverse=True options
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2014-05-30.09:29:28.655>
    created_at = <Date 2012-01-09.10:35:41.807>
    labels = ['type-feature', 'library']
    title = 'Add a key parameter (like sorted) to heapq.merge'
    updated_at = <Date 2015-01-10.08:47:47.975>
    user = 'https://bugs.python.org/ssapin'

    bugs.python.org fields:

    activity = <Date 2015-01-10.08:47:47.975>
    actor = 'terry.reedy'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2014-05-30.09:29:28.655>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2012-01-09.10:35:41.807>
    creator = 'ssapin'
    dependencies = []
    files = ['24184', '24185', '24248', '29578', '30063', '35374', '35399']
    hgrepos = []
    issue_num = 13742
    keywords = ['patch', 'needs review']
    message_count = 21.0
    messages = ['150927', '150928', '150931', '150954', '150969', '150983', '151369', '152802', '152984', '168070', '168116', '185259', '188066', '188067', '188080', '188085', '219380', '233792', '233793', '233795', '233804']
    nosy_count = 12.0
    nosy_names = ['rhettinger', 'terry.reedy', 'mark.dickinson', 'giampaolo.rodola', 'stutzbach', 'eric.araujo', 'python-dev', 'eric.snow', 'berker.peksag', 'ssapin', 'serhiy.storchaka', 'Tommy.Carstensen']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue13742'
    versions = ['Python 3.5']

    @ssapin
    Copy link
    Mannequin Author

    ssapin mannequin commented Jan 9, 2012

    Hi,

    The attached patch adds a 'key' optional parameter to the heapq.merge function that behaves as in sorted().

    Related discussion: http://mail.python.org/pipermail/python-ideas/2012-January/013295.html

    This is my first contribution to CPython.

    @ssapin ssapin mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jan 9, 2012
    @ssapin
    Copy link
    Mannequin Author

    ssapin mannequin commented Jan 9, 2012

    The attached script benchmarks the basline (current implementation) against 3 new implementations, as suggested on http://mail.python.org/pipermail/python-ideas/2012-January/013296.html

    On my machine, the output is:

    merge_baseline
    per run, min of 3 = 7.527 ms
    
    merge_1
    per run, min of 3 = 9.894 ms
    131.449 % of baseline
    
    merge_2
    per run, min of 3 = 7.948 ms
    105.594 % of baseline
    
    merge_3
    per run, min of 3 = 7.581 ms
    100.716 % of baseline
    

    On this particular input, merge_2 adds 6% of overhead when the key parameter is not used. While merge_3 only adds 1% of overhead, it almost doubles the amount of code. (Which was admittedly not that long to begin with.)

    The patch in the previous message is with the merge_2 implementation, which seemed like the best compromise to me.

    @ssapin
    Copy link
    Mannequin Author

    ssapin mannequin commented Jan 9, 2012

    Oops, the patch to the documentation would also need 'New in 3.3: the key parameter', with the right Sphinx directive. But that depends on whether this change ends up in 3.3 or 3.4.

    Does 3.3 still get new features?

    @merwok
    Copy link
    Member

    merwok commented Jan 9, 2012

    Yes, 3.3 is still in the early development stage, and new features will be accepted until the first beta (in June, see PEP-398). “.. versionadded:: 3.3 The *key* parameter” will do.

    @rhettinger
    Copy link
    Contributor

    Simon, please keep the original version fast by creating two code paths:

       if key is None:
          original_code
       else:
          new_code using the key_function

    @ssapin
    Copy link
    Mannequin Author

    ssapin mannequin commented Jan 9, 2012

    Raymond, please have a look at merge_3 in benchmark_heapq_merge.py. It is implemented as you say.

    Do you think the speed is worth the code duplication?

    @ssapin
    Copy link
    Mannequin Author

    ssapin mannequin commented Jan 16, 2012

    heapq_merge_key_duplicate.patch is a new patch with two code path. It also updates the function’s docstring (which the previous patch did not). Raymond, do you think the speed is worth the DRY violation?

    @rhettinger
    Copy link
    Contributor

    I'll look at this in the next couple of weeks. Hang tight :-)

    @terryjreedy
    Copy link
    Member

    FWIW, Guido approves of the idea, msg152969 in bpo-4356

    @ssapin
    Copy link
    Mannequin Author

    ssapin mannequin commented Aug 13, 2012

    I just remembered about this. I suppose it is too late for 3.3?

    @merwok
    Copy link
    Member

    merwok commented Aug 13, 2012

    Yes, 3.3 is already in beta.

    @rhettinger
    Copy link
    Contributor

    Attaching a rough draft implementation for a fully encapsulated Heap() class that is thread-safe, supports minheaps and maxheaps, and efficiently implements key-functions (called no more than once per key).

    @mdickinson
    Copy link
    Member

    heap2.diff contains only a single line's change. Wrong file attached?

    @mdickinson
    Copy link
    Member

    Ah, I see the new file now (I'd failed to refresh my browser); sorry for the noise.

    @mdickinson
    Copy link
    Member

    Looks pretty good to me.

    • There's a bonus print call in the diff.

    • Should the "len(self._data)" call be protected by the lock? I can't immediately think of any reason why that would be necessary (e.g., pushpop nd poppush never change the size of self._data, so there's no risk of getting a bogus length there), but the lack of the lock makes me nervous.

    • Support for iter() seems a bit out of place to me. What are the use-cases for this? Would it make sense to leave this out (for now)?

    @serhiy-storchaka
    Copy link
    Member

    There is already one heap class in the stdlib: queue.PriorityQueue. Why create a duplicate instead extend queue.PriorityQueue with desired features?

    May be name the maxheap parameter as reverse?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 30, 2014

    New changeset f5521f5dec4a by Raymond Hettinger in branch 'default':
    Issue bpo-13742: Add key and reverse parameters to heapq.merge()
    http://hg.python.org/cpython/rev/f5521f5dec4a

    @TommyCarstensen
    Copy link
    Mannequin

    TommyCarstensen mannequin commented Jan 10, 2015

    I noticed 3.5 alpha1 is not released until February 1st. Is there any way I can get my hands on this new functionality?

    @berkerpeksag
    Copy link
    Member

    Hi Tommy, the patch is already committed to Python 3.5. See https://docs.python.org/3.5/library/heapq.html#heapq.merge

    @TommyCarstensen
    Copy link
    Mannequin

    TommyCarstensen mannequin commented Jan 10, 2015

    Yes, but 3.5 has not been pre-released yet.

    @terryjreedy
    Copy link
    Member

    You can set up mecurial on your machine, make a read-only clone of the cpython repository, and compile it just as do other people, whether core-developers or otherwise. See docs.python.org/devguide for details.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants