# Sorting
We have already seen an implementation of the _selection sort_ and a demonstration of the _merge sort_. Today we shall look at how to think about the _merge sort_ and consequently how to implement it.

We'll also look at the time complexities of the sorts covered.



In [10]:
%config InteractiveShell.ast_node_interactivity="none"

In [None]:
%run boaz_utils.ipynb

Exception: File `'boaz_utils.ipynb.py'` not found.

## Merge Sort

Recall the procedure that some of you participated in to illustrate how the _merge sort_ works.

- Repeatedly divide the list into 2 halves until each half has only 1 or 2 elements
- If 2 elements out of order, swap them
- Repeatedly merge resulting lists until we have the original list sorted

## The _merge_ process

In [11]:
def rec_merge(lst1, lst2):
    """
    Return a new list containing the elements of both in sorted order
    lst1 and lst2 must already be in sorted order.
    """
    # Let us try a purely recursive approach
    # Base case(s)
    if len(lst1) == 0:
        return lst2
    elif len(lst2) == 0:
        return lst1
    else:
    # Recursive case(s)
        e1 = lst1[0]
        e2 = lst2[0]
        if e1 <= e2:
            return [e1] + rec_merge(lst1[1:], lst2)
        else:
            return [e2] + rec_merge(lst1, lst2[1:])


In [12]:
test_lst1 = [10, 19, 32, 50, 71]
test_lst2 = [8, 12, 15, 21, 33, 80]
print(rec_merge(test_lst1, test_lst2))

[8, 10, 12, 15, 19, 21, 32, 33, 50, 71, 80]


In [13]:
def iter_merge(lst1, lst2):
    """
    Return a new list containing the elements of both in sorted order
    lst1 and lst2 must already be in sorted order.
    """
    # Let us try this using for / while loops
    # What do we need to keep track of? (ie the _state_ of the computation)
    # How does it change throughout the loop?
    # How does it affect the termination of the loop?
    n1 = len(lst1)
    n2 = len(lst2)
    result = []
    i = 0  # index into lst1
    j = 0  # index into lst2
    while i < n1 and j < n2:
        e1 = lst1[i]
        e2 = lst2[j]
        if e1 <= e2:
            pass
        else:
            pass
    if i >= n1:
        pass
    else:
        pass
    return result


In [None]:
print(iter_merge(test_lst1, test_lst2))

Note that our recursive solution depended upon Python list slices. They are convenient to express, but if we care about performance, they might be more expensive than we can afford, because they make a copy of the segment of the list denoted by the slice.

We can still leverage the power of recursion though, by avoiding the slicing, and thinking in terms of the indices we are working with, in order to design the "wishful thinking" step. Some people refer to such a recursive function as a _helper_ function.  If you look closely enough, you will see similarities with the iterative solution. Here is an example:

In [None]:
def rec2_merge(lst1, lst2):
    """
    Return a new list containing the elements of both in sorted order
    lst1 and lst2 must already be in sorted order.
    """
    # A recursive implementation that does not use slices
    # What are the components of the _state_ of the computation?
    # What can we say is true about the state?
    n1 = len(lst1)
    n2 = len(lst2)
    def helper(i, j, result):
        if i >= n1:
            pass
        elif j >= n2:
            pass
        else:
            e1 = lst1[i]
            e2 = lst2[j]
            pass
    return None # Fix this

In [None]:
print(rec2_merge(test_lst1, test_lst2))

[8, 10, 12, 15, 19, 21, 32, 33, 50, 71, 80]


Let us settle on one of the implementations, and have that be called _merge_

In [None]:
def merge(lst1, lst2):
    return rec_merge(lst1, lst2)

## Returning to _Merge Sort_

Now that we have figured out how to merge two lists, we can return to the overall process of the merge sort to think about how to implement it.

If we try to follow the details, as the computer would do, it looks very complicated.  Questions that naturally arise include:
- How do we write a loop to subdivide the right number of times?
- How are we keeping track of where we are in the process?
- Where are we storing the interim sorted sub-lists?

There is an easier way: think recursively. Use wishful thinking to imagine that the sub-problem has already been solved, and ask ourselves how we would combine the answers if we had them.

In [None]:
def merge_sort(lst):
    """Return a list containing the elements of lst in sorted order"""
    # What are good base cases?
    # What do we do in the general case?
    n = len(lst)
    if n <= 1:
        return lst
    else:
        m = n//2
        left = lst[:m]
        right = lst[m:]
        return merge(merge_sort(left),
                    merge_sort(right))

In [None]:
test_list = [12, 19, 7, 3, 21, 42, 71, 31, 53]
print(merge_sort(test_list))

[3, 7, 12, 19, 21, 31, 42, 53, 71]
