Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

itertools: clearer all_equal recipe #92628

Closed
pochmann opened this issue May 10, 2022 · 17 comments
Closed

itertools: clearer all_equal recipe #92628

pochmann opened this issue May 10, 2022 · 17 comments
Labels
docs Documentation in the Doc dir

Comments

@pochmann
Copy link
Contributor

pochmann commented May 10, 2022

The Itertools Recipes have this recipe:

def all_equal(iterable):
    "Returns True if all the elements are equal to each other"
    g = groupby(iterable)
    return next(g, True) and not next(g, False)

The next(g, True) tries to read a first group and if there isn't one, then... uh... actually next(g, True) is always true, so that's squeezed into the return expression just for the side effect of skipping the first group. Psych!

I think for official recipes, it would be better to write clearer code. Two ideas for replacing it:

Proposal 1:

def all_equal(iterable):
    "Returns True if all the elements are equal to each other"
    g = groupby(iterable)
    next(g, None)
    return not next(g, False)

This makes the first next call a statement that makes it obvious we're not using its result and just do it for skipping the first group. I also replaced True with None to further indicate this.

Proposal 2:

def all_equal(iterable):
    "Returns True if all the elements are equal to each other"
    g = groupby(iterable)
    return not next(g, False) or not next(g, False)

This reads as "There is no first group or there is no second group". This actually uses the result of the first next(g, False): If it shows that there is no first group, then the function returns True immediately. It doesn't even try to look for a second group (unlike the original recipe and my first proposal, which both do). Also, it's simpler and nice to have the exact same expression not next(g, False) twice.

Really my point is clarity, but for completeness let me also show benchmarks. My first proposal was faster than the original in every case, and the most significant difference is my second proposal being faster for empty iterables.

Python 3.10.4 on Debian (on a Google Compute Engine instance):

iterable = ()
 235 ns   235 ns   235 ns  proposal2 True
 265 ns   265 ns   265 ns  proposal1 True
 265 ns   265 ns   265 ns  original True

iterable = (1,)
 308 ns   308 ns   308 ns  proposal1 True
 314 ns   314 ns   314 ns  original True
 316 ns   316 ns   316 ns  proposal2 True

iterable = (1, 2)
 361 ns   361 ns   361 ns  proposal1 False
 366 ns   366 ns   366 ns  original False
 384 ns   384 ns   384 ns  proposal2 False

Python 3.10.4 on Windows (my laptop):

iterable = ()
 926 ns   928 ns   929 ns  proposal2 True
1063 ns  1065 ns  1065 ns  proposal1 True
1067 ns  1068 ns  1069 ns  original True

iterable = (1,)
1225 ns  1228 ns  1230 ns  proposal1 True
1251 ns  1253 ns  1253 ns  original True
1257 ns  1258 ns  1261 ns  proposal2 True

iterable = (1, 2)
1397 ns  1409 ns  1410 ns  proposal1 False
1417 ns  1417 ns  1418 ns  proposal2 False
1427 ns  1429 ns  1432 ns  original False

The benchmark code:

from timeit import timeit
from random import shuffle
from bisect import insort
from itertools import groupby

def original(iterable):
    "Returns True if all the elements are equal to each other"
    g = groupby(iterable)
    return next(g, True) and not next(g, False)

def proposal1(iterable):
    "Returns True if all the elements are equal to each other"
    g = groupby(iterable)
    next(g, None)
    return not next(g, False)

def proposal2(iterable):
    "Returns True if all the elements are equal to each other"
    g = groupby(iterable)
    return not next(g, False) or not next(g, False)

funcs = [original, proposal1, proposal2]

for iterable in (), (1,), (1, 2):
    print(f'{iterable = }')
    times = {func: [] for func in funcs}
    for _ in range(1000):
        shuffle(funcs)
        for func in funcs:
            number = 1000
            t = timeit(lambda: func(iterable), number=number) / number
            insort(times[func], t)
    for func in sorted(funcs, key=times.get):
        print(*('%4d ns ' % round(t * 1e9) for t in times[func][:3]), func.__name__, func(iterable))
    print()
@pochmann pochmann added the docs Documentation in the Doc dir label May 10, 2022
@rhettinger
Copy link
Contributor

Thank you for the suggestion, but I find the current one to be more clear (perhaps that's just me).

Also, part of the purpose of the examples is to show various techniques for using the language. The use of "and" is one such technique. Python's "and" is different from some other languages in that it can return non-boolean values. It is okay to use that feature as designed.

Thanks again for the suggestion, but I will decline.

@pochmann
Copy link
Contributor Author

pochmann commented May 10, 2022

it can return non-boolean values. It is okay to use that

Not sure why you're saying that. That's not what happens there. The result of that and and thus the result of the function is boolean. Also, in general I have no issue with using and/or on non-booleans. That's not at all my complaint about it.

The result of the next(g, True) is completely irrelevant for the outcome. Like I said its true in all cases (either a non-empty tuple or True), so in all cases, the and evaluates its right-hand-side and returns the value of that.

That and is rather used for "control flow" there, but not even that, really, since it's always true. It's really just abused to stuff a statement into the expression to save a line of code (at the expense of creating a longer misleading more complicated line).

I've certainly done that myself, I'm no stranger to it. But that was mostly inside list comprehensions, where I don't have the "luxury" of being able to write a separate statement.

I'm interested to see what others think. Should I post this in some mailing list for discussion instead?

@pochmann
Copy link
Contributor Author

One more thing, what do you think about these?

def proposal3(iterable):
    g = groupby(iterable)
    return not any(g) or not any(g)

def proposal4(iterable):
    g = groupby(iterable)
    return not (any(g) and any(g))

(same speed as proposal2)

@blhsing
Copy link
Contributor

blhsing commented Feb 20, 2024

def proposal4(iterable):
    g = groupby(iterable)
    return not (any(g) and any(g))

proposal4 looks the best to me. It's concise, readable, and involves the least number of operations.

I would like to respond to @rhettinger by pointing out that the main problem with the current recipe is that it does not short-circuit when given an empty iterable, making an unnecessary second call to next. @pochmann's proposals remedy this logical issue.

I would also like to add that the wording of the current docstring makes it unclear that the function would return True when given an empty iterable. I suggest that the docstring can instead read: "Returns False only if any element differs from the others."

@alicederyn
Copy link

alicederyn commented Feb 20, 2024

FWIW I find the use of any to partially use up an iterator more obfuscating than using next, which is clearly intended to have that side-effect. I would need to think hard about exactly what any guarantees to do to an iterator every time I read that code.

@pochmann3
Copy link
Contributor

pochmann3 commented Feb 20, 2024

@blhsing To be clear, for me it's really a clarity issue, I find squeezing the always-true next(g, True) into the and misleading. When I see return x and y, I expect x to be possibly false. When that's not the case, it steers me in the wrong direction and takes me extra mental effort to recognize that and course-correct. Makes it harder to understand.

I don't see the unnecessary second next call as the main problem or even really a problem at all. That's rare (only unnecessary if the iterable is empty) and doesn't cost that much time. But yes, avoiding it (without causing other issues) would be nice.

I disagree with the docstring change. The current docstring is fine, matches the function name, and is "positive" instead of "doubly negative" like the suggestion.

(For clarity: I'm the issue creator, new account due to current PC trouble.)

@pochmann3
Copy link
Contributor

pochmann3 commented Feb 23, 2024

FWIW I find the use of any to partially use up an iterator more obfuscating than using next, which is clearly intended to have that side-effect. I would need to think hard about exactly what any guarantees to do to an iterator every time I read that code.

Hmm, I agree that next is even simpler, but any does exactly what you'd expect it to, what you'd naturally do yourself as well: go through the values, stop when you find a true one or reach the end.

And actually, people being less familiar with any could be a reason for using it there. As @rhettinger said above, "part of the purpose of the examples is to show various techniques for using the language." While next is showcased in four recipes, any/all don't appear in any :-(

Yet another, which I think reads particularly easily, as it tells you in words exactly what it does:

def all_equal(iterable):
    groups = groupby(iterable)
    for first in groups:
        for second in groups:
            return False
    return True

But I already know this isn't everyone's cup of tea.

Again, speed is not my point, but I'm curious, so here's an updated benchmark:

iterable = ()
True  242 ± 0.1 ns  for_for_False_True
True  242 ± 0.5 ns  for_for_False_True_True
True  261 ± 0.1 ns  not_any_or_not_any
True  261 ± 0.2 ns  not__any_and_any
True  264 ± 0.2 ns  not__next_and_next
True  264 ± 0.3 ns  not_next_or_not_next
True  296 ± 0.4 ns  next__not_next
True  300 ± 0.2 ns  original

iterable = (1,)
True  343 ± 0.2 ns  for_for_False_True_True
True  359 ± 0.6 ns  for_for_False_True
True  375 ± 0.2 ns  next__not_next
True  380 ± 0.4 ns  not__any_and_any
True  384 ± 0.3 ns  not_any_or_not_any
True  387 ± 0.2 ns  original
True  387 ± 0.4 ns  not__next_and_next
True  388 ± 0.2 ns  not_next_or_not_next

iterable = (1, 2)
False  457 ± 0.3 ns  for_for_False_True
False  457 ± 0.3 ns  for_for_False_True_True
False  484 ± 0.4 ns  next__not_next
False  490 ± 0.7 ns  not__any_and_any
False  495 ± 0.2 ns  not__next_and_next
False  495 ± 0.7 ns  not_any_or_not_any
False  496 ± 0.2 ns  not_next_or_not_next
False  496 ± 0.4 ns  original

Python: 3.12.0 (main, Oct  7 2023, 10:42:35) [GCC 13.2.1 20230801]
Benchmark script

Attempt This Online!

def original(iterable):
    g = groupby(iterable)
    return next(g, True) and not next(g, False)

def next__not_next(iterable):
    g = groupby(iterable)
    next(g, None)
    return not next(g, False)

def not_next_or_not_next(iterable):
    g = groupby(iterable)
    return not next(g, False) or not next(g, False)

def not_any_or_not_any(iterable):
    g = groupby(iterable)
    return not any(g) or not any(g)

def not__any_and_any(iterable):
    g = groupby(iterable)
    return not (any(g) and any(g))

# blhsing's from https://discuss.python.org/t/slight-improvement-to-the-all-equal-recipe-in-itertools-doc/46390
def not__next_and_next(iterable):
    g = groupby(iterable)
    return not (next(g, False) and next(g, False))

def for_for_False_True(iterable):
    groups = groupby(iterable)
    for first in groups:
        for second in groups:
            return False
    return True

def for_for_False_True_True(iterable):
    groups = groupby(iterable)
    for first in groups:
        for second in groups:
            return False
        return True
    return True

funcs = [original, next__not_next, not_next_or_not_next, not_any_or_not_any, not__any_and_any, not__next_and_next, for_for_False_True, for_for_False_True_True]

iterables = (), (1,), (1, 2)

from timeit import timeit
from statistics import mean, stdev
import sys
import random
from itertools import groupby

for iterable in iterables:
    print(f'{iterable = }')
    times = {f: [] for f in funcs}
    def stats(f):
        ts = [t * 1e9 for t in sorted(times[f])[:10]]
        return f' {mean(ts):3.0f} ± {stdev(ts):3.1f} ns '
    for _ in range(100):
        random.shuffle(funcs)
        for f in funcs:
            t = timeit(lambda: f(iterable), number=1000) / 1000
            times[f].append(t)
    for f in sorted(funcs, key=stats):
        print(f(iterable), stats(f), f.__name__)
    print()

print('Python:', sys.version)

@alicederyn
Copy link

Use of any here relies on knowing that function's return value, its exact side-effect, how truthiness works and how it interacts with what this function is returning in its iterator. If your intention is to set a trick puzzle for code golf then it's very clever, but I didn't think that was the goal of Python's sample code.

@pochmann3
Copy link
Contributor

In my opinion, those things are all worth knowing. Do you not agree? No, neither trick puzzle nor code golf is the intention. Clarity is. Like I said at Stack Overflow, it reads like "All elements are equal iff ... there is no first group or there is no second group.". Or specifically for the any solution: "iff ... there isn't any group or there isn't any (second) group." Very close to natural language. In contrast, return next(g, True) and not next(g, False) has no resemblance to anything I'd say, and I have to carefully decipher that step by step even despite being familiar with it.

@alicederyn
Copy link

alicederyn commented Feb 23, 2024

In my opinion, those things are all worth knowing.

Knowing, yes. Having to think about all four at once just to work out the code is checking the length of an iterator? Disagree.

any (second) group.

That "second" you parenthesized is kind of the point though. The code doesn't say what you say it says. It says "x and x". And "any" is not normally read like that anyway, it's read as "any true". So what you have actually written reads at first as "is not (any group true or is any group true)" which naturally converts to "is any group false or is any group false" to anyone used to booleans. Relying on side-effects and truthiness to turn that into what you meant is what I consider unreadable code golf about it.

I don't think I'm going to convince you at this point, though :)

@alicederyn
Copy link

Oh just to add! I really appreciate your taking the time to explain, even when I don't agree. Thanks!

@CAM-Gerlach CAM-Gerlach reopened this Feb 24, 2024
@pochmann3
Copy link
Contributor

@alicederyn To me it does read like I said, but that's indeed with me already understanding it. Not how it reads "at first", as you said. Thanks for your thoughtful feedback. I'd like to ask for another, about the original issue, which still bothers me. The current recipe and my proposal 1 and proposal 2. How would you judge those three regarding clarity?

@alicederyn
Copy link

I think proposal 2 is the clearest. I like that the empty case is handled explicitly rather than "falling through" the first next call, which is a little bit more mental load otherwise.

@pochmann3
Copy link
Contributor

Thanks. For me, proposal 1 is clearest. Its standalone next call makes it clear that it's done solely to skip the first group. And I don't have to think about what it returns and how the result is used, since it isn't. Also, two short lines look less intimidating than one long one, and allow a full mental stop between them.

But yes, I also agree with your reason for finding proposal 2 clearest. For me, it's in the middle of the three. Meaningfully using the first next call instead of "overruling" whatever it finds like the current recipe makes more sense to me.

@blhsing Do you also find proposal 2 clearer than the current recipe? It's equivalent to yours (except you De Morgan'ed it), but your point was efficiency, you didn't comment on its clarity.

@pochmann3
Copy link
Contributor

This was also discussed on Discourse and the recipe has been changed to the following, making this issue obsolete:

def all_equal(iterable, key=None):
    "Returns True if all the elements are equal to each other."
    return len(take(2, groupby(iterable, key))) <= 1

@hugovk
Copy link
Member

hugovk commented Mar 4, 2024

PR for reference: #116081

@hugovk hugovk closed this as completed Mar 4, 2024
@alicederyn
Copy link

Very clean. Love it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
None yet
Development

No branches or pull requests

7 participants