using key in list for performance needs more qualification #70

tacaswell · 2015-07-30T14:18:51Z

http://docs.quantifiedcode.com/python-anti-patterns/performance/using_key_in_list_to_check_if_key_is_contained_in_a_list.html

def f_a():
    my_list = list(range(500))

    bool(3 in my_list)

def f_b():
    my_set = set(range(500))

    bool(3 in my_set)

gives

In [30]: %timeit f_a()
100000 loops, best of 3: 7.7 µs per loop

In [31]: %timeit f_b()
100000 loops, best of 3: 17.5 µs per loop

It is only worth paying the cost of creating a set (O(n log n)) to get the O(log n) loop up speed if you are going to be doing it many times.

The text was updated successfully, but these errors were encountered:

vogelsgesang · 2015-07-31T12:40:46Z

Yes, you are right, we need an additional of the pros and cons of lists vs. keys. I will write it within the next week. Would be perfect if you could be available as a reviewer :)

By the way: I just tried to reproduce your numbers with a slightly different testcase:

def f_a():
    my_list = list()
    for i in range(500):
        my_list.add(i)

    bool(3 in my_list)

def f_b():
    my_set = set()
    for i in range(500):
        my_set.add(i)

    bool(3 in my_set)

%timeit f_a()
# 10000 loops, best of 3: 27.6 µs per loop

%timeit f_b()
# 10000 loops, best of 3: 33 µs per loop

This time the difference in execution speed is not that big...

tacaswell · 2015-07-31T12:55:36Z

That makes sense though, creating a set by appending is still O(n log n), whereas creating a list by appending is (worst case) O(n**2) (python over allocates so to avoid doing a fully copy every time a new element is appended so it isn't as bad as it could be).

Yes, ping me in the PR.

[edit because I am wrong]

coady · 2015-07-31T23:24:13Z

Python sets are implemented with hash tables, so their creation is still O(n). Although there's more constant-time overhead for sets vs. lists - as well as for adding/appending vs. pre-allocating - it's all asymptotically O(n).

And the timing examples are combining the creation time with just 1 membership test. Naturally if you already have a list, and are only testing membership once, then there's no point in converting to a set.

But I agree there's a general anti-pattern here; it just needs more context, such as repeated membership testing in a loop. In mentoring Pythoneers, I have definitely seen the pattern of over list-ifying, instead of using the right data structure. It seems to stem from not trusting duck-typing and iteration as a protocol.

tacaswell · 2015-08-01T00:48:54Z

Yeah, I am wrong. Should not post comments before coffee.

A bit more context was all I was asking for.

vogelsgesang · 2015-08-01T08:30:12Z

Ok, we will provide more context in the article.

It would be great to have some links concerning the time complexities. I only know https://wiki.python.org/moin/TimeComplexity. But this page does not contain any informations about the time needed for creation of list/set. Do you know some good links to include in the article?

rdkr · 2016-04-21T02:36:36Z

I found this issue having just read and tested this anti-pattern, so I agree some further clarification would be good. Perhaps simply changing the title to 'repeatedly using key in list', with an explanation of the complexities and practicalities below (O(n) set creation + O(1) checks vs O(n) checks)?

programmdesign assigned vogelsgesang Jul 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using key in list for performance needs more qualification #70

using key in list for performance needs more qualification #70

tacaswell commented Jul 30, 2015

vogelsgesang commented Jul 31, 2015

tacaswell commented Jul 31, 2015

coady commented Jul 31, 2015

tacaswell commented Aug 1, 2015

vogelsgesang commented Aug 1, 2015

rdkr commented Apr 21, 2016

using key in list for performance needs more qualification #70

using key in list for performance needs more qualification #70

Comments

tacaswell commented Jul 30, 2015

vogelsgesang commented Jul 31, 2015

tacaswell commented Jul 31, 2015

coady commented Jul 31, 2015

tacaswell commented Aug 1, 2015

vogelsgesang commented Aug 1, 2015

rdkr commented Apr 21, 2016