# 2.5. Performance of Python Data Structures
- Big-O performance for the operations on Python lists and dictionaries.
- timing experiments that illustrate the costs and benefits of using certain operations on each data structure.
- efficiency of these Python data structures because they are the building blocks we will use as we implement other data structures in the remainder of the book.

# 2.6. Lists
> Two common operations are indexing and assigning to an index position. Both of these operations take the same amount of time no matter how large the list becomes. When an operation like this is independent of the size of the list they are $O(1)$.

> Another very common programming task is to grow a list. There are two ways to create a longer list. You can use the append method or the concatenation operator. The append method is $O(1)$. However, the concatenation operator is $O(k)$. where $k$ is the size of the list that is being concatenated. This is important for you to know because it can help you make your own programs more efficient by choosing the right tool for the job.



In [None]:
import timeit
 
 
def test1():
    l = []
    for i in range(1000):
        l = l + [i]
 
def test2():
    l = []
    for i in range(1000):
        l.append(i)
 
def test3():
    l = [i for i in range(1000)]
 
def test4():
    l = list(range(1000)) 
 
t1 = timeit.Timer("test1()", "from __main__ import test1")
print("concat ",t1.timeit(number=1000), "milliseconds")
t2 = timeit.Timer("test2()", "from __main__ import test2")
print("append ",t2.timeit(number=1000), "milliseconds")
t3 = timeit.Timer("test3()", "from __main__ import test3")
print("comprehension ",t3.timeit(number=1000), "milliseconds")
t4 = timeit.Timer("test4()", "from __main__ import test4")
print("list range ",t4.timeit(number=1000), "milliseconds")

reference: https://docs.python.org/3.4/library/timeit.html#

The profiler shows that with respect to speed, the forth method is the fastest, followed by the third (using generator).
append is much more efficient than 'extend'. The former is O(1) while the latter is O(n).

In [None]:
popzero = timeit.Timer("x.pop(0)",
                       "from __main__ import x")
popend = timeit.Timer("x.pop()",
                      "from __main__ import x")

x = list(range(2000000))
print (popzero.timeit(number=1000))

x = list(range(2000000))
print (popend.timeit(number=1000))

> Python's list implementation uses a dynamically resized C array under the hood, removing elements usually requires you to move elements following after up to prevent gaps.
list.pop() with no arguments removes the last element. Accessing that element can be done in constant time. There are no elements following so nothing needs to be shifted.
list.pop(0) removes the first element. All remaining elements have to be shifted up one step, so that takes O(n) linear time.

reference : https://stackoverflow.com/questions/34633178/why-is-the-big-o-of-pop-different-from-pop0-in-python

### Operation <-> Big-O Efficiency
|Operation|Big-O Efficiency|
|-----|-----|
|index []|$O(1)$|
|index assignment|$O(1)$|
|append|$O(1)$|
|pop()|$O(1)$|
|pop(i)|$O(n)$|
|insert(i,item)|$O(n)$|
|del operator|$O(n)$|
|iteration|$O(n)$|
|contains (in)|$O(n)$|
|get slice [x:y]|$O(k)$|
|del slice|$O(n)$|
|set slice|$O(n+k)$|
|reverse|$O(n)$|
|concatenate|$O(k)$|
|sort|$O(n\log{}n)$|
|multiply|$O(nk)|

more information : http://pythontutor.com/visualize.html#mode=display

In [None]:
popzero = timeit.Timer("x.pop(0)",
                "from __main__ import x")
popend = timeit.Timer("x.pop()",
               "from __main__ import x")
print("pop(0)   pop()")
pt_list=[]
pz_list=[]
for i in range(1000,100001,1000):
    x = list(range(i))
    pt = popend.timeit(number=10)
    pt_list.append(pt)
    x = list(range(i))
    pz = popzero.timeit(number=10)
    pz_list.append(pz)
    print("%15.5f, %15.5f" %(pz,pt))

In [None]:
%matplotlib inline

from matplotlib import pyplot as pl
import numpy as np

fig = pl.figure()
t = range(1000,100001,1000)
pl.plot(t, pz_list, 'bo', label=u'pop(0)')
pl.plot(t, pt_list, 'r-', label=u'pop( )')

pl.xlabel('list size')
pl.ylabel('time(sec)')

pl.legend(loc='upper left')

# 2.7. Dictionaries
- **As you probably recall, dictionaries differ from lists in that you can access items in a dictionary by a key rather than a position.**
- The thing that is most important to notice right now is that the get item and set item operations on a dictionary are $O(1)$.
- Another important dictionary operation is the contains operation. => $O(1)$
- Checking to see whether a key is in the dictionary or not is also $O(1)$

|Operation|Big-O Efficiency|
|-----|-----|
|copy|$O(1)$|
|get item|$O(1)$|
|set item|$O(1)$|
|delete item|$O(1)$|
|contains (in)|$O(1)$|
|iteration|$O(n)$|

In [None]:
import random


lst_list=[]
dic_list=[]

for i in range(1000,100001,2000):
    t = timeit.Timer("random.randrange(%d) in x"%i,
                     "from __main__ import random,x")
    x = list(range(i))
    lst_time = t.timeit(number=10)
    lst_list.append(lst_time)
    x = {j:None for j in range(i)}
    dic_time = t.timeit(number=10)
    dic_list.append(dic_time)
    print("%d,%10.3f,%10.3f" % (i, lst_time, dic_time))

In [None]:
fig = pl.figure()
t = range(1000,100001,2000)
pl.plot(t, lst_list, 'bo', label=u'list')
pl.plot(t, dic_list, 'r-', label=u'dictionary')


pl.xlabel('size')
pl.ylabel('time(sec)')

pl.legend(loc='upper left')

# 2.8. Summary
- Algorithm analysis is an implementation-independent way of measuring an algorithm.
- Big-O notation allows algorithms to be classified by their dominant process with respect to the size of the problem.

Reference : 
- http://interactivepython.org/runestone/static/pythonds/AlgorithmAnalysis/toctree.html
- https://github.com/physhik/Python-algorithm-study/blob/master/algorithm%20chapter%203%20(%20analysis%20and%20list%20).ipynb