Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

itertools: takedowhile() #88737

Closed
pavel-lexyr mannequin opened this issue Jul 5, 2021 · 17 comments
Closed

itertools: takedowhile() #88737

pavel-lexyr mannequin opened this issue Jul 5, 2021 · 17 comments
Assignees
Labels
3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@pavel-lexyr
Copy link
Mannequin

pavel-lexyr mannequin commented Jul 5, 2021

BPO 44571
Nosy @tim-one, @rhettinger, @serhiy-storchaka, @miss-islington, @pavel-lexyr
PRs
  • bpo-44571: Add itertool recipe for a variant of takewhile() #28167
  • [3.10] bpo-44571: Add itertool recipe for a variant of takewhile() (GH-28167) #28173
  • Files
  • tmp_takewhile.py: API experiments
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2021-09-05.05:10:54.130>
    created_at = <Date 2021-07-05.22:33:28.631>
    labels = ['type-feature', 'library', '3.11']
    title = 'itertools: takedowhile()'
    updated_at = <Date 2021-10-11.09:15:19.771>
    user = 'https://github.com/pavel-lexyr'

    bugs.python.org fields:

    activity = <Date 2021-10-11.09:15:19.771>
    actor = 'phr'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2021-09-05.05:10:54.130>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2021-07-05.22:33:28.631>
    creator = 'pavel-lexyr'
    dependencies = []
    files = ['50142']
    hgrepos = []
    issue_num = 44571
    keywords = ['patch']
    message_count = 17.0
    messages = ['397025', '397028', '397030', '397032', '397165', '397202', '397205', '397229', '397253', '397312', '397336', '397337', '397338', '401068', '401069', '403623', '403638']
    nosy_count = 6.0
    nosy_names = ['tim.peters', 'rhettinger', 'phr', 'serhiy.storchaka', 'miss-islington', 'pavel-lexyr']
    pr_nums = ['28167', '28173']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue44571'
    versions = ['Python 3.11']

    @pavel-lexyr
    Copy link
    Mannequin Author

    pavel-lexyr mannequin commented Jul 5, 2021

    As described in the documentation, itertools.takewhile() returns all the elements until the first one that does not match the provided criterion. In case of a destructive iterator, or one with side effects, not yielding an element downstream may render takewhile() unsuitable for use.

    Proposed is itertools.takedowhile() - an alternate function that yields the first false element as well, and returns after. The behaviour is identical otherwise.

    @pavel-lexyr pavel-lexyr mannequin added type-feature A feature request or enhancement labels Jul 5, 2021
    @rhettinger
    Copy link
    Contributor

    Thanks for the suggestion. I agree that the loss of the non-matching element is an irritant. The suggestion to return the first false element would solve that problem but is itself hard to work with. The result would be difficult to reason about because all the elements are except one are true, the last is false, and you can't know that you have gotten a false element until one more call to next() to determine that no more elements are forthcoming.

    Also, I'm reluctant to create any variants for takewhile() or dropwhile(). Those have been the least successful itertools. If I had it to do over again, they would not have been included. For the most part, generator based solutions are superior in terms of readability, flexibility, and performance.

    @rhettinger rhettinger added stdlib Python modules in the Lib dir 3.11 only security fixes labels Jul 5, 2021
    @pavel-lexyr
    Copy link
    Mannequin Author

    pavel-lexyr mannequin commented Jul 5, 2021

    I see. If the syntax allows for better ways to do it now, perhaps a move towards deprecation would be a better idea then? This would agree with the Zen.

    Also, please elaborate more on the generator-based solutions you have in mind. The suggestion stems from a very real use case - and the lambda function we ended up using looks like a poor hack.

    @serhiy-storchaka
    Copy link
    Member

    What if set the last item as an attribute of the takewhile iterator?

    @rhettinger
    Copy link
    Contributor

    What if set the last item as an attribute of the takewhile iterator?

    Perhaps raise an attribute error unless the falsifying element is set?

    @rhettinger
    Copy link
    Contributor

    I've done some API experiments using a data munging example. See attached file.

    The proposed API for takewhile() to save the last attribute is somewhat awkward to use:

        it = iter(report)
        tw_it = takewhile(is_header, it)
        for line in takewhile(is_header, tw_it):
            print('Header:', repr(line))
        if hasattr(tw_it, 'odd_element'):
            it = chain([tw_it.odd_element], it)
        print(mean(map(int, it)))   

    What is needed is a new itertool recipe to cover this use case:

    headers, data = before_and_after(is_header, report)
    for line in headers:
        print('Header:', repr(line))
    print(mean(map(int, data)))
    

    @serhiy-storchaka
    Copy link
    Member

    For convenience, the takewhile iterator can also have additional attributes: a boolean attribute which indicates that the falsifying element is set, and dynamic attribute which is equal to orig_iterator or chain([odd_element], orig_iterator).

    @rhettinger
    Copy link
    Contributor

    For convenience, the takewhile iterator can also have
    additional attributes: a boolean attribute which indicates
    that the falsifying element is set, and dynamic attribute
    which is equal to orig_iterator
    or chain([odd_element], orig_iterator).

    Rather than graft a funky and atypical API onto takewhile(), it would be better to have a new tool that returns two iterators, the true steam, and a stream for the remaining values. Either stream may be empty. There is no need for a boolean flag attribute or a remaining stream attribute. This design fits in better with the other itertools.

    FWIW, we can already do this using groupby(), but it is only easy if we assume the first part of the stream is all true and the remainder of the stream is all false. That isn't good enough for general application.

    @tim-one
    Copy link
    Member

    tim-one commented Jul 11, 2021

    I agree Raymond's before_and_after() looks like an elegant, efficient, and usable approach to this.

    One minor nit: there's no need for the iter() call in:

        yield from iter(transition)
    

    Indeed, it confused me at first, because yield from x does its own iter(x) call under the covers, and since everyone knows that ;-) , I wondered what deep trickery calling it again was intended to pull off.

    But I persuaded myself there was no such subtle intent - it's just redundant.

    @pavel-lexyr
    Copy link
    Mannequin Author

    pavel-lexyr mannequin commented Jul 12, 2021

    There is a core part of the takedowhile proposal's use case that I am having trouble envisioning via the alternative before_and_after proposal. If the after part of the iterator the user does not engage with, the transitional elements will be stuck indefinitely. What would a correct usage be, in case one wants the following two conditions to hold true:

    1. the amount of elements after the first falsifying one is minimal, i.e. 0
    2. all the yielded elements are processed no matter what?

    @tim-one
    Copy link
    Member

    tim-one commented Jul 12, 2021

    If you don't use the 'after` iterator, then of course you'll never see the values (if any) it would have yielded.

    How could it possibly be otherwise? By design and construction, the before iterator ends before yielding the first (if any) transitional element.

    As Raymond said at the start, the takedowhile() proposal appears much harder to use correctly, since there's no reasonably sane way to know that the last value it yields is the transitional element (or, perhaps, that there was no transitional element, and the underlying iterable was just exhausted without finding one).

    If the proposal were instead for takewhile_plus_one_more_if_any(), then at least the ugly name would warn about the surprising intended behavior ;-)

    @tim-one
    Copy link
    Member

    tim-one commented Jul 12, 2021

    That said, if you really do want those semantics, it's easy to build on top of Raymond's API:

    def takewhile_plus_one_more_if_any(pred, iterable):
        from itertools import islice, chain
        before, after = before_and_after(pred, iterable)
        return chain(before, islice(after, 1))

    @pavel-lexyr
    Copy link
    Mannequin Author

    pavel-lexyr mannequin commented Jul 12, 2021

    Thank you - that answers the questions. The use case where we would want to know if the last element is transitional or not completely slipped my mind for some reason.

    @rhettinger
    Copy link
    Contributor

    New changeset 91be41a by Raymond Hettinger in branch 'main':
    bpo-44571: Add itertool recipe for a variant of takewhile() (GH-28167)
    91be41a

    @miss-islington
    Copy link
    Contributor

    New changeset 656b0bd by Miss Islington (bot) in branch '3.10':
    bpo-44571: Add itertool recipe for a variant of takewhile() (GH-28167)
    656b0bd

    @phr
    Copy link
    Mannequin

    phr mannequin commented Oct 11, 2021

    Oh wow, before_and_after will go into the itertools module per that patch? I found this issue while looking for a way to this, but had written the following implementation:

    def span(pred, xs):
        # split xs into two iterators a,b where a() is the prefix of xs             
        # that satisfies the predicate, and b() is the rest of xs.                  
        # Similar to Data.List.Span in Haskell.                                     
    
        ixs = iter(xs)
        t = None
        def a():
            nonlocal t
            for x in ixs:
                if pred(x): yield x
                else: break
            t = x
        def b():
            return itertools.chain([t], ixs)
        return a, b
    
    def tspan():  # test
        xs = [1,3,5,2,4,6,8]
        def odd(x): return x%2==1
        # This should print [1,3,5] then [2,4,6,8]                                  
        for p in span(odd, xs):
            print(list(p()))

    @phr
    Copy link
    Mannequin

    phr mannequin commented Oct 11, 2021

    Bah, the above doesn't work in the cases where the iterator is empty or (different symptom) where the tail part is empty. Rather than posting a corrected version (unless someone wants it) I'll just say not to use that code fragment, but that the intended API still looks reasonable. I do support having some kind of solution but don't want to keep stretching out an already closed discussion unless people think there is more to say.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants