New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
itertools: takedowhile() #88737
Comments
As described in the documentation, itertools.takewhile() returns all the elements until the first one that does not match the provided criterion. In case of a destructive iterator, or one with side effects, not yielding an element downstream may render takewhile() unsuitable for use. Proposed is itertools.takedowhile() - an alternate function that yields the first false element as well, and returns after. The behaviour is identical otherwise. |
Thanks for the suggestion. I agree that the loss of the non-matching element is an irritant. The suggestion to return the first false element would solve that problem but is itself hard to work with. The result would be difficult to reason about because all the elements are except one are true, the last is false, and you can't know that you have gotten a false element until one more call to next() to determine that no more elements are forthcoming. Also, I'm reluctant to create any variants for takewhile() or dropwhile(). Those have been the least successful itertools. If I had it to do over again, they would not have been included. For the most part, generator based solutions are superior in terms of readability, flexibility, and performance. |
I see. If the syntax allows for better ways to do it now, perhaps a move towards deprecation would be a better idea then? This would agree with the Zen. Also, please elaborate more on the generator-based solutions you have in mind. The suggestion stems from a very real use case - and the lambda function we ended up using looks like a poor hack. |
What if set the last item as an attribute of the takewhile iterator? |
Perhaps raise an attribute error unless the falsifying element is set? |
I've done some API experiments using a data munging example. See attached file. The proposed API for takewhile() to save the last attribute is somewhat awkward to use: it = iter(report)
tw_it = takewhile(is_header, it)
for line in takewhile(is_header, tw_it):
print('Header:', repr(line))
if hasattr(tw_it, 'odd_element'):
it = chain([tw_it.odd_element], it)
print(mean(map(int, it))) What is needed is a new itertool recipe to cover this use case:
|
For convenience, the takewhile iterator can also have additional attributes: a boolean attribute which indicates that the falsifying element is set, and dynamic attribute which is equal to orig_iterator or chain([odd_element], orig_iterator). |
Rather than graft a funky and atypical API onto takewhile(), it would be better to have a new tool that returns two iterators, the true steam, and a stream for the remaining values. Either stream may be empty. There is no need for a boolean flag attribute or a remaining stream attribute. This design fits in better with the other itertools. FWIW, we can already do this using groupby(), but it is only easy if we assume the first part of the stream is all true and the remainder of the stream is all false. That isn't good enough for general application. |
I agree Raymond's One minor nit: there's no need for the
Indeed, it confused me at first, because But I persuaded myself there was no such subtle intent - it's just redundant. |
There is a core part of the
|
If you don't use the 'after` iterator, then of course you'll never see the values (if any) it would have yielded. How could it possibly be otherwise? By design and construction, the As Raymond said at the start, the If the proposal were instead for |
That said, if you really do want those semantics, it's easy to build on top of Raymond's API: def takewhile_plus_one_more_if_any(pred, iterable):
from itertools import islice, chain
before, after = before_and_after(pred, iterable)
return chain(before, islice(after, 1)) |
Thank you - that answers the questions. The use case where we would want to know if the last element is transitional or not completely slipped my mind for some reason. |
Oh wow, before_and_after will go into the itertools module per that patch? I found this issue while looking for a way to this, but had written the following implementation: def span(pred, xs):
# split xs into two iterators a,b where a() is the prefix of xs
# that satisfies the predicate, and b() is the rest of xs.
# Similar to Data.List.Span in Haskell.
ixs = iter(xs)
t = None
def a():
nonlocal t
for x in ixs:
if pred(x): yield x
else: break
t = x
def b():
return itertools.chain([t], ixs)
return a, b
def tspan(): # test
xs = [1,3,5,2,4,6,8]
def odd(x): return x%2==1
# This should print [1,3,5] then [2,4,6,8]
for p in span(odd, xs):
print(list(p())) |
Bah, the above doesn't work in the cases where the iterator is empty or (different symptom) where the tail part is empty. Rather than posting a corrected version (unless someone wants it) I'll just say not to use that code fragment, but that the intended API still looks reasonable. I do support having some kind of solution but don't want to keep stretching out an already closed discussion unless people think there is more to say. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: