Skip to content

itertools.ilen addition #120478

@dg-pb

Description

@dg-pb

Feature or enhancement

Proposal:

Introduction

Discussion has not suffered any terminal blows since its inception and has been silent for a while now. So I am taking an opportunity to make one more attempt to suggest addition of ilen function.
Given the current state and coverage of itertools module, I think ilen can be a useful and timely addition.

Past attmepts

#53756 (comment). There is a clear indication that it is not the first time this was proposed. Although repreated rejection does not shine a positive light on this proposal, it can also indicate a reocurring attempt to satisfy a need.

Addressing criticism

Statement: "The core issue is that it is not a very useful operation because it consumes the iterator." - #53756 (comment)
Response: "I think the utility of consuming an iterator immediately to find its length, without regard to the contents, is greater than some might expect. After all, there’s a Unix utility (wc) dedicated to exactly that." - https://discuss.python.org/t/itertools-ilen-iterable/53002/23
Support by Use Case count

56      stdlib: `ripgrep '\Wlen\(list\(' | wc -l`
305K    github: `/.*\Wlen\(list\(.*/ language:Python`
1.3K    github: `/.*\Wilen\(.*\).*/ language:Python`

Also, it is most likely that len(list(...)) is the most commonly used solution, which is the least memory efficient among them all (see below).

My personal experience of usefulness

  • ilen is used 6 times in the module of iterator recipes which contains 40 functions.
  • Often used as infinite consumer (instead of collections.deque(it, maxlen=0))
    • Often in iterator recipe benchmarks. Although I mostly use black_hole = collections.deque(maxlen=0).extend now.

Python packages that implement it

Any iterator library will inevitably going to re-implement it:

Implementations in other languages

Stack threads

Solutions that are currently used (or have been considered)

import more_itertools
import iteration_utilities

@cython.locals(i=cython.int)
def cython_ilen(iterable):
    i = 0
    for _ in iterable:
        i += 1
    return i

a = range(100_000)
%timeit sum(1 for _ in iter(a))                     # 6.1 ms
%timeit sum(map(lambda i: 1, iter(a)))              # 5.5 ms
%timeit coll.deque(enumerate(iter(a)), maxlen=1)    # 4.4 ms
%timeit more_itertools.ilen(iter(a))                # 3.5 ms
%timeit sum(map(len, itl.batched(iter(a), 5)))      # 2.9 ms (suboptimal memory consumption)
%timeit len(list(iter(a)))                          # 2.6 ms (highest memory consumption)
%timeit sum(map(len, itl.batched(iter(a), 1000)))   # 2.2 ms (suboptimal memory consumption)
%timeit iteration_utilities.count_items(iter(a))    # 1.5 ms (more complex than it needs to be)
%timeit cython_ilen(iter(a))                        # 1.5 ms
%timeit proposed_ilen(iter(a))                      # 1.45 ms
%timeit coll.deque(iter(a), maxlen=0)               # 1.42 ms (fastest consumer in stdlib)

Couple of use cases

  1. os.path.commonprefix
# Current:
def commonprefix(m):
    ...
    s1 = min(m)
    s2 = max(m)
    for i, c in enumerate(s1):
        if c != s2[i]:
            return s1[:i]
    return s1

# Alternative:
def commonprefix(m):
    ...
    s1 = min(m)
    i = ilen(takewhile(bool, map(opr.eq, s1, max(m))))
    return s1[:i]
  1. takewhile_select
def takewhile_select(selectors, iterable):
    """Take while parallel elements in selectors are true
    Examples:
        >>> list(takewhile_select([1, 1, 0, 0], [0, 1, 2, 3]))
        [0, 1]
    """
    n = ilen(takewhile(bool, selectors))
    return islice(iterable, n)

Implementation considerations

Flexibility such as offered in iteration_utilities.count_items has been considered. However, I think it is best to keep it simple. Preprocessing of any kind can be done before calling ilen.

Maintenance cost

Minimal. It would be the simplest function in itertools.

Final thoughts

To me personally, this is the one missing function in itertools. Currently, I am using cython_ilen when the module is cython-compiled and a copy of more_itertools.ilen when it is not. This addition would simplify the matters.

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/itertools-ilen-iterable/53002

Linked PRs

Metadata

Metadata

Assignees

Labels

type-featureA feature request or enhancement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions