itertools: takedowhile() #88737

pavel-lexyr · 2021-07-05T22:33:29Z

BPO	44571
Nosy	@tim-one, @rhettinger, @serhiy-storchaka, @miss-islington, @pavel-lexyr
PRs	bpo-44571: Add itertool recipe for a variant of takewhile() #28167 [3.10] bpo-44571: Add itertool recipe for a variant of takewhile() (GH-28167) #28173
Files	tmp_takewhile.py: API experiments

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2021-09-05.05:10:54.130>
created_at = <Date 2021-07-05.22:33:28.631>
labels = ['type-feature', 'library', '3.11']
title = 'itertools: takedowhile()'
updated_at = <Date 2021-10-11.09:15:19.771>
user = 'https://github.com/pavel-lexyr'

bugs.python.org fields:

activity = <Date 2021-10-11.09:15:19.771>
actor = 'phr'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2021-09-05.05:10:54.130>
closer = 'rhettinger'
components = ['Library (Lib)']
creation = <Date 2021-07-05.22:33:28.631>
creator = 'pavel-lexyr'
dependencies = []
files = ['50142']
hgrepos = []
issue_num = 44571
keywords = ['patch']
message_count = 17.0
messages = ['397025', '397028', '397030', '397032', '397165', '397202', '397205', '397229', '397253', '397312', '397336', '397337', '397338', '401068', '401069', '403623', '403638']
nosy_count = 6.0
nosy_names = ['tim.peters', 'rhettinger', 'phr', 'serhiy.storchaka', 'miss-islington', 'pavel-lexyr']
pr_nums = ['28167', '28173']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue44571'
versions = ['Python 3.11']

pavel-lexyr · 2021-07-05T22:33:29Z

As described in the documentation, itertools.takewhile() returns all the elements until the first one that does not match the provided criterion. In case of a destructive iterator, or one with side effects, not yielding an element downstream may render takewhile() unsuitable for use.

Proposed is itertools.takedowhile() - an alternate function that yields the first false element as well, and returns after. The behaviour is identical otherwise.

rhettinger · 2021-07-05T23:25:59Z

Thanks for the suggestion. I agree that the loss of the non-matching element is an irritant. The suggestion to return the first false element would solve that problem but is itself hard to work with. The result would be difficult to reason about because all the elements are except one are true, the last is false, and you can't know that you have gotten a false element until one more call to next() to determine that no more elements are forthcoming.

Also, I'm reluctant to create any variants for takewhile() or dropwhile(). Those have been the least successful itertools. If I had it to do over again, they would not have been included. For the most part, generator based solutions are superior in terms of readability, flexibility, and performance.

pavel-lexyr · 2021-07-05T23:34:45Z

I see. If the syntax allows for better ways to do it now, perhaps a move towards deprecation would be a better idea then? This would agree with the Zen.

Also, please elaborate more on the generator-based solutions you have in mind. The suggestion stems from a very real use case - and the lambda function we ended up using looks like a poor hack.

serhiy-storchaka · 2021-07-06T08:26:03Z

What if set the last item as an attribute of the takewhile iterator?

rhettinger · 2021-07-08T19:31:27Z

What if set the last item as an attribute of the takewhile iterator?

Perhaps raise an attribute error unless the falsifying element is set?

rhettinger · 2021-07-09T16:42:44Z

I've done some API experiments using a data munging example. See attached file.

The proposed API for takewhile() to save the last attribute is somewhat awkward to use:

    it = iter(report)
    tw_it = takewhile(is_header, it)
    for line in takewhile(is_header, tw_it):
        print('Header:', repr(line))
    if hasattr(tw_it, 'odd_element'):
        it = chain([tw_it.odd_element], it)
    print(mean(map(int, it)))

What is needed is a new itertool recipe to cover this use case:

headers, data = before_and_after(is_header, report)
for line in headers:
    print('Header:', repr(line))
print(mean(map(int, data)))

serhiy-storchaka · 2021-07-09T17:44:59Z

For convenience, the takewhile iterator can also have additional attributes: a boolean attribute which indicates that the falsifying element is set, and dynamic attribute which is equal to orig_iterator or chain([odd_element], orig_iterator).

rhettinger · 2021-07-09T22:31:53Z

For convenience, the takewhile iterator can also have
additional attributes: a boolean attribute which indicates
that the falsifying element is set, and dynamic attribute
which is equal to orig_iterator
or chain([odd_element], orig_iterator).

Rather than graft a funky and atypical API onto takewhile(), it would be better to have a new tool that returns two iterators, the true steam, and a stream for the remaining values. Either stream may be empty. There is no need for a boolean flag attribute or a remaining stream attribute. This design fits in better with the other itertools.

FWIW, we can already do this using groupby(), but it is only easy if we assume the first part of the stream is all true and the remainder of the stream is all false. That isn't good enough for general application.

tim-one · 2021-07-11T01:17:32Z

I agree Raymond's before_and_after() looks like an elegant, efficient, and usable approach to this.

One minor nit: there's no need for the iter() call in:

    yield from iter(transition)

Indeed, it confused me at first, because yield from x does its own iter(x) call under the covers, and since everyone knows that ;-) , I wondered what deep trickery calling it again was intended to pull off.

But I persuaded myself there was no such subtle intent - it's just redundant.

pavel-lexyr · 2021-07-12T14:16:43Z

There is a core part of the takedowhile proposal's use case that I am having trouble envisioning via the alternative before_and_after proposal. If the after part of the iterator the user does not engage with, the transitional elements will be stuck indefinitely. What would a correct usage be, in case one wants the following two conditions to hold true:

the amount of elements after the first falsifying one is minimal, i.e. 0
all the yielded elements are processed no matter what?

tim-one · 2021-07-12T16:30:51Z

If you don't use the 'after` iterator, then of course you'll never see the values (if any) it would have yielded.

How could it possibly be otherwise? By design and construction, the before iterator ends before yielding the first (if any) transitional element.

As Raymond said at the start, the takedowhile() proposal appears much harder to use correctly, since there's no reasonably sane way to know that the last value it yields is the transitional element (or, perhaps, that there was no transitional element, and the underlying iterable was just exhausted without finding one).

If the proposal were instead for takewhile_plus_one_more_if_any(), then at least the ugly name would warn about the surprising intended behavior ;-)

tim-one · 2021-07-12T16:40:43Z

That said, if you really do want those semantics, it's easy to build on top of Raymond's API:

def takewhile_plus_one_more_if_any(pred, iterable):
    from itertools import islice, chain
    before, after = before_and_after(pred, iterable)
    return chain(before, islice(after, 1))

pavel-lexyr · 2021-07-12T16:53:57Z

Thank you - that answers the questions. The use case where we would want to know if the last element is transitional or not completely slipped my mind for some reason.

rhettinger · 2021-09-05T05:09:33Z

New changeset 91be41a by Raymond Hettinger in branch 'main':
bpo-44571: Add itertool recipe for a variant of takewhile() (GH-28167)
91be41a

miss-islington · 2021-09-05T05:30:44Z

New changeset 656b0bd by Miss Islington (bot) in branch '3.10':
bpo-44571: Add itertool recipe for a variant of takewhile() (GH-28167)
656b0bd

phr · 2021-10-11T07:09:53Z

Oh wow, before_and_after will go into the itertools module per that patch? I found this issue while looking for a way to this, but had written the following implementation:

def span(pred, xs):
    # split xs into two iterators a,b where a() is the prefix of xs             
    # that satisfies the predicate, and b() is the rest of xs.                  
    # Similar to Data.List.Span in Haskell.                                     

    ixs = iter(xs)
    t = None
    def a():
        nonlocal t
        for x in ixs:
            if pred(x): yield x
            else: break
        t = x
    def b():
        return itertools.chain([t], ixs)
    return a, b

def tspan():  # test
    xs = [1,3,5,2,4,6,8]
    def odd(x): return x%2==1
    # This should print [1,3,5] then [2,4,6,8]                                  
    for p in span(odd, xs):
        print(list(p()))

phr · 2021-10-11T09:15:20Z

Bah, the above doesn't work in the cases where the iterator is empty or (different symptom) where the tail part is empty. Rather than posting a corrected version (unless someone wants it) I'll just say not to use that code fragment, but that the intended API still looks reasonable. I do support having some kind of solution but don't want to keep stretching out an already closed discussion unless people think there is more to say.

pavel-lexyr mannequin added type-feature A feature request or enhancement labels Jul 5, 2021

rhettinger assigned rhettinger Jul 5, 2021

rhettinger added stdlib Python modules in the Lib dir 3.11 only security fixes labels Jul 5, 2021

rhettinger closed this as completed Sep 5, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

itertools: takedowhile() #88737

itertools: takedowhile() #88737

pavel-lexyr mannequin commented Jul 5, 2021

pavel-lexyr mannequin commented Jul 5, 2021

rhettinger commented Jul 5, 2021

pavel-lexyr mannequin commented Jul 5, 2021

serhiy-storchaka commented Jul 6, 2021

rhettinger commented Jul 8, 2021

rhettinger commented Jul 9, 2021

serhiy-storchaka commented Jul 9, 2021

rhettinger commented Jul 9, 2021

tim-one commented Jul 11, 2021

pavel-lexyr mannequin commented Jul 12, 2021

tim-one commented Jul 12, 2021

tim-one commented Jul 12, 2021

pavel-lexyr mannequin commented Jul 12, 2021

rhettinger commented Sep 5, 2021

miss-islington commented Sep 5, 2021

phr mannequin commented Oct 11, 2021

phr mannequin commented Oct 11, 2021

itertools: takedowhile() #88737

itertools: takedowhile() #88737

Comments

pavel-lexyr mannequin commented Jul 5, 2021

pavel-lexyr mannequin commented Jul 5, 2021

rhettinger commented Jul 5, 2021

pavel-lexyr mannequin commented Jul 5, 2021

serhiy-storchaka commented Jul 6, 2021

rhettinger commented Jul 8, 2021

rhettinger commented Jul 9, 2021

serhiy-storchaka commented Jul 9, 2021

rhettinger commented Jul 9, 2021

tim-one commented Jul 11, 2021

pavel-lexyr mannequin commented Jul 12, 2021

tim-one commented Jul 12, 2021

tim-one commented Jul 12, 2021

pavel-lexyr mannequin commented Jul 12, 2021

rhettinger commented Sep 5, 2021

miss-islington commented Sep 5, 2021

phr mannequin commented Oct 11, 2021

phr mannequin commented Oct 11, 2021