-
-
Notifications
You must be signed in to change notification settings - Fork 32.9k
Description
Feature or enhancement
Proposal:
Introduction
Discussion has not suffered any terminal blows since its inception and has been silent for a while now. So I am taking an opportunity to make one more attempt to suggest addition of ilen
function.
Given the current state and coverage of itertools
module, I think ilen
can be a useful and timely addition.
Past attmepts
#53756 (comment). There is a clear indication that it is not the first time this was proposed. Although repreated rejection does not shine a positive light on this proposal, it can also indicate a reocurring attempt to satisfy a need.
Addressing criticism
Statement: "The core issue is that it is not a very useful operation because it consumes the iterator." - #53756 (comment)
Response: "I think the utility of consuming an iterator immediately to find its length, without regard to the contents, is greater than some might expect. After all, there’s a Unix utility (wc) dedicated to exactly that." - https://discuss.python.org/t/itertools-ilen-iterable/53002/23
Support by Use Case count
56 stdlib: `ripgrep '\Wlen\(list\(' | wc -l`
305K github: `/.*\Wlen\(list\(.*/ language:Python`
1.3K github: `/.*\Wilen\(.*\).*/ language:Python`
Also, it is most likely that len(list(...))
is the most commonly used solution, which is the least memory efficient among them all (see below).
My personal experience of usefulness
ilen
is used 6 times in the module of iterator recipes which contains 40 functions.- Often used as infinite consumer (instead of
collections.deque(it, maxlen=0)
)- Often in iterator recipe benchmarks. Although I mostly use
black_hole = collections.deque(maxlen=0).extend
now.
- Often in iterator recipe benchmarks. Although I mostly use
Python packages that implement it
Any iterator library will inevitably going to re-implement it:
Implementations in other languages
- Rust: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.count
- C++: https://cplusplus.com/reference/iterator/distance/
- JavaScript: https://github.com/stdlib-js/iter-length
- JAVA(guava): https://guava.dev/releases/snapshot/api/docs/com/google/common/collect/Iterators.html#size(java.util.Iterator)
Stack threads
- https://stackoverflow.com/questions/3345785/getting-number-of-elements-in-an-iterator-in-python
- https://stackoverflow.com/questions/390852/is-there-any-built-in-way-to-get-the-length-of-an-iterable-in-python
- https://stackoverflow.com/questions/5384570/how-can-i-count-the-number-of-items-in-an-arbitrary-iterable-such-as-a-generato
Solutions that are currently used (or have been considered)
import more_itertools
import iteration_utilities
@cython.locals(i=cython.int)
def cython_ilen(iterable):
i = 0
for _ in iterable:
i += 1
return i
a = range(100_000)
%timeit sum(1 for _ in iter(a)) # 6.1 ms
%timeit sum(map(lambda i: 1, iter(a))) # 5.5 ms
%timeit coll.deque(enumerate(iter(a)), maxlen=1) # 4.4 ms
%timeit more_itertools.ilen(iter(a)) # 3.5 ms
%timeit sum(map(len, itl.batched(iter(a), 5))) # 2.9 ms (suboptimal memory consumption)
%timeit len(list(iter(a))) # 2.6 ms (highest memory consumption)
%timeit sum(map(len, itl.batched(iter(a), 1000))) # 2.2 ms (suboptimal memory consumption)
%timeit iteration_utilities.count_items(iter(a)) # 1.5 ms (more complex than it needs to be)
%timeit cython_ilen(iter(a)) # 1.5 ms
%timeit proposed_ilen(iter(a)) # 1.45 ms
%timeit coll.deque(iter(a), maxlen=0) # 1.42 ms (fastest consumer in stdlib)
Couple of use cases
os.path.commonprefix
# Current:
def commonprefix(m):
...
s1 = min(m)
s2 = max(m)
for i, c in enumerate(s1):
if c != s2[i]:
return s1[:i]
return s1
# Alternative:
def commonprefix(m):
...
s1 = min(m)
i = ilen(takewhile(bool, map(opr.eq, s1, max(m))))
return s1[:i]
takewhile_select
def takewhile_select(selectors, iterable):
"""Take while parallel elements in selectors are true
Examples:
>>> list(takewhile_select([1, 1, 0, 0], [0, 1, 2, 3]))
[0, 1]
"""
n = ilen(takewhile(bool, selectors))
return islice(iterable, n)
Implementation considerations
Flexibility such as offered in iteration_utilities.count_items
has been considered. However, I think it is best to keep it simple. Preprocessing of any kind can be done before calling ilen
.
Maintenance cost
Minimal. It would be the simplest function in itertools
.
Final thoughts
To me personally, this is the one missing function in itertools
. Currently, I am using cython_ilen
when the module is cython-compiled and a copy of more_itertools.ilen
when it is not. This addition would simplify the matters.
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
https://discuss.python.org/t/itertools-ilen-iterable/53002