Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using key in list for performance needs more qualification #70

Open
tacaswell opened this issue Jul 30, 2015 · 6 comments
Open

using key in list for performance needs more qualification #70

tacaswell opened this issue Jul 30, 2015 · 6 comments
Assignees

Comments

@tacaswell
Copy link

http://docs.quantifiedcode.com/python-anti-patterns/performance/using_key_in_list_to_check_if_key_is_contained_in_a_list.html

def f_a():
    my_list = list(range(500))

    bool(3 in my_list)

def f_b():
    my_set = set(range(500))

    bool(3 in my_set)

gives

In [30]: %timeit f_a()
100000 loops, best of 3: 7.7 µs per loop

In [31]: %timeit f_b()
100000 loops, best of 3: 17.5 µs per loop

It is only worth paying the cost of creating a set (O(n log n)) to get the O(log n) loop up speed if you are going to be doing it many times.

@vogelsgesang
Copy link
Contributor

Yes, you are right, we need an additional of the pros and cons of lists vs. keys. I will write it within the next week. Would be perfect if you could be available as a reviewer :)

By the way: I just tried to reproduce your numbers with a slightly different testcase:

def f_a():
    my_list = list()
    for i in range(500):
        my_list.add(i)

    bool(3 in my_list)

def f_b():
    my_set = set()
    for i in range(500):
        my_set.add(i)

    bool(3 in my_set)

%timeit f_a()
# 10000 loops, best of 3: 27.6 µs per loop

%timeit f_b()
# 10000 loops, best of 3: 33 µs per loop

This time the difference in execution speed is not that big...

@tacaswell
Copy link
Author

That makes sense though, creating a set by appending is still O(n log n), whereas creating a list by appending is (worst case) O(n**2) (python over allocates so to avoid doing a fully copy every time a new element is appended so it isn't as bad as it could be).

Yes, ping me in the PR.

[edit because I am wrong]

@coady
Copy link

coady commented Jul 31, 2015

Python sets are implemented with hash tables, so their creation is still O(n). Although there's more constant-time overhead for sets vs. lists - as well as for adding/appending vs. pre-allocating - it's all asymptotically O(n).

And the timing examples are combining the creation time with just 1 membership test. Naturally if you already have a list, and are only testing membership once, then there's no point in converting to a set.

But I agree there's a general anti-pattern here; it just needs more context, such as repeated membership testing in a loop. In mentoring Pythoneers, I have definitely seen the pattern of over list-ifying, instead of using the right data structure. It seems to stem from not trusting duck-typing and iteration as a protocol.

@tacaswell
Copy link
Author

Yeah, I am wrong. Should not post comments before coffee.

A bit more context was all I was asking for.

@vogelsgesang
Copy link
Contributor

Ok, we will provide more context in the article.

It would be great to have some links concerning the time complexities. I only know https://wiki.python.org/moin/TimeComplexity. But this page does not contain any informations about the time needed for creation of list/set. Do you know some good links to include in the article?

@rdkr
Copy link

rdkr commented Apr 21, 2016

I found this issue having just read and tested this anti-pattern, so I agree some further clarification would be good. Perhaps simply changing the title to 'repeatedly using key in list', with an explanation of the complexities and practicalities below (O(n) set creation + O(1) checks vs O(n) checks)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants