# Week 3, Day 2, Afternoon Session
## Merge Sort

In [3]:
import time
import functools
%run boaz_utils.ipynb

**SelectionSort** takes $O(n^2)$ time. Can we sort faster?

In [10]:
# A and B are each individually sorted, will return
# the merge of them, also sorted
def merge(A, B):
    C = []
    A_idx = 0
    B_idx = 0
    # "while neither list is empty"
    # (we are trying to merge A[A_idx:] and B[B_idx:])
    while A_idx < len(A) and B_idx < len(B):
        if A[A_idx] < B[B_idx]:
            C.append(A[A_idx])
            A_idx += 1
        else:
            C.append(B[B_idx])
            B_idx += 1
    return C + A[A_idx:] + B[B_idx:]
            
def merge_sort(L):
    if len(L) == 0:
        return []
    elif len(L) == 1:
        return [L[0]]
    A = merge_sort(L[:len(L)//2])
    B = merge_sort(L[len(L)//2:])
    return merge(A, B)

In [11]:
merge_sort([5,4,3,2,1])

[1, 2, 3, 4, 5]

In [18]:
# can also use a recursive implementation of merge
def recursive_merge(A, B):
    if len(A) == 0:
        return B[:]
    elif len(B) == 0:
        return A[:]
    else:
        if A[0] < B[0]:
            return [A[0]] + recursive_merge(A[1:], B)
        else:
            return [B[0]] + recursive_merge(A, B[1:])

In [20]:
merge_sort([5,4,3,2,1])

[1, 2, 3, 4, 5]

In [21]:
print(list(range(10,0,-1)))

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]


In [22]:
big_list = list(range(20000,0,-1))

In [30]:
merge_sort(big_list)
'finished'

'finished'

In [31]:
# remember this slow O(n^2) implementation... def iterative_selection_sort(L):
def iterative_selection_sort(L):
    A = L[:]
    for i in range(len(A)):
        # try to find index idx of the min element in L[i:],
        # then move it to L[i]
        idx = i
        for j in range(i+1, len(A)):
            if A[j] < A[idx]:
                idx = j
        
        # swap contents of A[i] and A[idx]
        tmp = A[i]
        A[i] = A[idx]
        A[idx] = tmp
    return A

In [32]:
iterative_selection_sort(big_list)
'finished'

'finished'

## Running time of MergeSort

## merging two lists each of size k takes about 2k steps

* We merge $n/2$ pairs of lists each of size $1$, taking $2\times \frac n2 = n$ steps
* We merge $n/4$ pairs of lists each of size $2$, taking $4\times \frac n4 = n$ steps
* We merge $n/8$ pairs of lists each of size $4$, taking $8\times \frac n8 = n$ steps

and so forth. So, the total number of steps is $n \times ($the number of list sizes we process$)$. The last list size we process is lists is when each list is half the original list, so the list size $2^k = n/2$. This value of $k$ is then $\log_2(n/2) = \log_2(n) - 1$. The number of list sizes we process is $k+1 = \log_2 n$, so the total number of steps is proportional to $nk = n\log_2 n$.

Central to computer science is the idea of _how to represent information_, and then how to create algorithms or programs that process those representations to compute useful things.

For example, I may be a shopkeeper and want to keep track of the items I've sold throughout the day, and how much I was paid for them.

`L = [ ['soap', 250], ['yam', 500] , ['yam', 500], ... ]`

Then, given that representation I may want to compute various things, such as "what were my gross sales yesterday?", or "how many times did I sell a yam"? I can then write programs to process this list L to find these answers.

A graph is just yet another way to represent information, and in the case of graphs, I want to represent information about _connections_ between pairs of objects (for example the "objects" could be people, and a "connection" could be that one person follows another on Instagram --- or the "objects" could be road intersections, and the "connections" are the road segments connecting one intersection to another).

When we talk about graphs, we call these "objects" **vertices**, and we call the "connections" **edges**. We will see more about graphs, and how to run algorithms on graphs, in the afternoon.