Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heapq should provide iterators: itersmallest, iterlargest #41663

Closed
scoder opened this issue Mar 7, 2005 · 4 comments
Closed

heapq should provide iterators: itersmallest, iterlargest #41663

scoder opened this issue Mar 7, 2005 · 4 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@scoder
Copy link
Contributor

scoder commented Mar 7, 2005

BPO 1158313
Nosy @rhettinger, @scoder

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2005-03-08.16:15:34.000>
created_at = <Date 2005-03-07.14:15:01.000>
labels = ['type-feature', 'library']
title = 'heapq should provide iterators: itersmallest, iterlargest'
updated_at = <Date 2005-03-08.16:15:34.000>
user = 'https://github.com/scoder'

bugs.python.org fields:

activity = <Date 2005-03-08.16:15:34.000>
actor = 'rhettinger'
assignee = 'none'
closed = True
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2005-03-07.14:15:01.000>
creator = 'scoder'
dependencies = []
files = []
hgrepos = []
issue_num = 1158313
keywords = []
message_count = 4.0
messages = ['54410', '54411', '54412', '54413']
nosy_count = 2.0
nosy_names = ['rhettinger', 'scoder']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue1158313'
versions = []

@scoder
Copy link
Contributor Author

scoder commented Mar 7, 2005

The current C-based heapq implementation provides
implementations of min-heaps and max-heaps for the
"nsmallest" and "nlargest" functions. I would like to
see them used to provide iterators over the
smallest/largest items of a heap: "itersmallest(heap)"
and "iterlargest(heap)".

Why:

The functions nsmallest() and nlargest() are efficient
(and sufficient) if n is known. However, if n is *not*
known in advance, it would still be more efficient than
sorting to run heapify() over a list and then have an
iterator run heappop on each iteration. While this is
easily implemented for min-heaps using heappop,
max-heaps are not part of the Python-Implementation.
Currently, they are only implemented in C.

Possible counter-arguments:

The straight-forward iterator implementation will
manipulate the heap. This could be seen as a
side-effect. It should, however, be enough to document
that. Similar problems are documented for mutating
sequences during iteration.

@scoder scoder closed this as completed Mar 7, 2005
@scoder scoder added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Mar 7, 2005
@scoder scoder closed this as completed Mar 7, 2005
@scoder scoder added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Mar 7, 2005
@rhettinger
Copy link
Contributor

Logged In: YES
user_id=80475

The idea for introducing heapiter() function was
unsuccessfully proposed on python-dev:

http://mail.python.org/pipermail/python-dev/2004-June/045348.html

The use cases were rare and the pure python solution was
both easy and fast.

If you need C coded max-heaps, that could be a separate
feature request. There would need to be sufficient use
cases to warrant building out the module's API.

@scoder
Copy link
Contributor Author

scoder commented Mar 8, 2005

Logged In: YES
user_id=313935

"easy and fast" was the python solution for min-heaps, you
mean. Their API is sufficient for a few-line iterator
implementation. The possible solutions for max-heaps are
comparatively ugly and generally less efficient, the best
ways being either a custom per-case solution
(decorate-useheap-undecorate) or a generic indirection
through an item wrapper class that mirrors the __le__
operator with the additional overhead of python's method
call infrastructure.

I don't think max-heaps are less common than min-heaps in
any way. It just doesn't show that well since custom
solutions are easy to write most of the time (each and every
time).

The major issue with the current heapq implementation is the
design choice of not making it a class. I agree that it is a
major advantage to have it operate on generic lists, but it
come at the price of preventing easy integration of keyword
arguments as in sort() and sorted(). A usage like

h = heap(myIterator, reverse=true)
for item in h: print item

would be so handy...

For the use-cases: As I said before, nsmallest/nlargest are
enough in many cases, but they actually handle a very
special case where n is known. On the other hand, iterators
have become a major first-level construct for Python and I
consider iteration over the ordered elements of a heap a
very comon use case. The step towards an augmented API that
provides efficient iterators both for min-heaps and
max-heaps would both underline Python's paradigm shift and
considerably simplify code that currently suffers from
custom solutions.

And again: most of the code is already there.

Another possible solution: what about a module function
heapified(seq_or_iter, key=..., cmp=..., reverse=...)
in the style of sorted() but returning an iterator?
Advantage over sorted() being the higher efficiency if the
iterator is thrown away after a small number of calls.

Just an idea...

@rhettinger
Copy link
Contributor

Logged In: YES
user_id=80475

I am -1 on this feature request. The design of the module
(as functions on lists rather than as a class with
encapsulation) does not mesh well these proposals.

Building out the API for maxheap functions makes the module
harder to learn and it introduces a risk of a user
mismatching functions (i.e. heappop with maxheapify).
Maxheap functionality can already be obtained by decorating
with class which redefines __cmp__.

The case for an iterator wrapper is not strong. It was not
well received when proposed on python-dev. The risks of
mutating the heap during iteration are greater than the
equivalent situation for lists because there is no simple
way to verify that the heap condition has remained intact
between iteration steps. If truly needs, a iterator around
the existing minheap is trivial to write,

Most user needs are met by sort(), nlargest(), nsmallest(),
the existing minheap functions, or the existing minheap
functions applied to decorated data.

IMO, introducing iterators and maxheaps do not add enough
value to warrant complicating the module and introducing new
risks of misuse (altering the heapcondition during iteration
or mismatching minheap and maxheap functions).

The story would be somewhat different if the heaps had
originally been encapsulated in a class. If they had,
iterators could be added and given protection from mutation.
Also, if there were a heap class, the API would easily
support key= and reversed= options. But in the absence of a
heap class, it would be a mistake to force these buildouts.

The OP's corner use case presented on comp.lang.python was
met with solutions using the existing toolset. The corner
case was wanting a maxheap to read all data into memory,
heapify it, and extract n largest elements AND n is not
known in advance AND n is small enough that sort(data) was
deemed to have too many comparisons. Not knowing n in
advance arose because the data contained duplicate records
which the OP was unwilling to filter-out in advance with a
single pass O(n) step using a dictionary to condense
equivalence groups to a single element.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants