Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior of isdistinct #89

Closed
eriknw opened this issue Nov 11, 2013 · 2 comments
Closed

Behavior of isdistinct #89

eriknw opened this issue Nov 11, 2013 · 2 comments

Comments

@eriknw
Copy link
Member

eriknw commented Nov 11, 2013

itertoolz.isdistinct does not accept iterators, and it does not short-circuit. Both of these issues can be resolved as follows:

def isdistinct(seq):
    seen = set()
    for item in seq:
        if item in seen:
            return False
        seen.add(item)
    return True

Here are benchmarks when all elements are distinct:

l1 = range(1)
l10 = range(10)
l100 = range(100)
l1000 = range(1000)

%timeit isdistinct(l1)
Old: 1000000 loops, best of 3: 968 ns per loop
New: 1000000 loops, best of 3: 1.02 µs per loop

%timeit isdistinct(l10)
Old: 100000 loops, best of 3: 1.86 µs per loop
New: 100000 loops, best of 3: 3.77 µs per loop

%timeit isdistinct(l100)
Old: 100000 loops, best of 3: 8.49 µs per loop
New: 10000 loops, best of 3: 30.1 µs per loop

%timeit isdistinct(l1000)
Old: 10000 loops, best of 3: 65.9 µs per loop
New: 1000 loops, best of 3: 283 µs per loop

Thoughts? Can you think of a better way? Should we allow the current method to be chosen (such as if seq is expected to be distinct nearly all the time)?

@mrocklin
Copy link
Member

Probably not a big deal in either way. I'd suggest either going with what you have now or using the old method if isinstance(seq, [list, tuple, set]). I'm ambivalent between the two.

@mrocklin
Copy link
Member

#94 was merged. Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants