# [CptS 215 Data Analytics Systems and Algorithms](https://github.com/gsprint23/cpts215)
[Washington State University](https://wsu.edu)

[Gina Sprint](http://eecs.wsu.edu/~gsprint/)
# Merge Sort

Learner objectives for this lesson:
* Implement the merge sort algorithm
* Perform algorithm analysis of merge sort

## Acknowledgments
Content used in this lesson is based upon information in the following sources:
* [Dr. Ananth Kalyanaraman](http://www.eecs.wsu.edu/~ananth/)'s CptS 223 notes

## Merge Sort
Merge sort is a divide-and-conquer algorithm to sort a sequence. The divide and conquer algorithm strategy recursively breaks a problem into two or more subproblems, until the subproblems become simple enough to be solved directly. The solutions to the subproblems are then combined to give a solution to the original problem. Merge sort is naturally implemented recursively, though it can be implemented iteratively.

Big picture: Continue dividing a sequence into 2 subsequences, until the subsequences are singletons (single item sequences). Successively merge the subsequences into sorted order until a single sequence is merged.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Merge_sort_algorithm_diagram.svg/1064px-Merge_sort_algorithm_diagram.svg.png" width="500">
(image from [https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Merge_sort_algorithm_diagram.svg/1064px-Merge_sort_algorithm_diagram.svg.png](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Merge_sort_algorithm_diagram.svg/1064px-Merge_sort_algorithm_diagram.svg.png))

Algorithm $MergeSort(A)$:
1. Let $A$ be an array and $n$ be the number of elements in the array $A$
1. If $n\leq 1$ return $A$
1. $MergeSort(A[0:\frac{n}{2}]$)
1. $MergeSort(A[\frac{n}{2}:n]$)
1. Merge the two sorted halves $A[0:\frac{n}{2}]$ and $A[\frac{n}{2}:n]$
    
Python implementation using lists:

In [1]:
import numpy.random as rand

def merge_sort(array):
    '''
    
    '''
    if len(array) <= 1:
        return
    mid = len(array) // 2
    # since array is a ndarray, need to do a copy
    # otherwise get a view
    left = array[:mid].copy()
    right = array[mid:].copy()
    merge_sort(left)
    merge_sort(right)
    merge(array, left, right)
    
def merge(array, lefthalf, righthalf):
    '''
    
    '''
    i=0; j=0; k=0
    while i < len(lefthalf) and j < len(righthalf):
        if lefthalf[i] < righthalf[j]:
            array[k]=lefthalf[i]
            i=i+1
        else:
            array[k]=righthalf[j]
            j=j+1
        k=k+1

    while i < len(lefthalf):
        array[k]=lefthalf[i]
        i=i+1
        k=k+1

    while j < len(righthalf):
        array[k]=righthalf[j]
        j=j+1
        k=k+1
   
data = rand.randn(20)
print(data)
merge_sort(data)
print(data)

[ 0.36131257  0.33463664  1.069625   -0.85734736  1.35750981 -2.55610712
  0.86628731 -0.13997897 -1.63650906 -1.28540182 -0.77923464 -0.65887848
 -1.16948865 -0.98777702 -1.09992097 -0.13780627  0.00390041  0.11927134
 -1.58032382 -1.91544264]
[-2.55610712 -1.91544264 -1.63650906 -1.58032382 -1.28540182 -1.16948865
 -1.09992097 -0.98777702 -0.85734736 -0.77923464 -0.65887848 -0.13997897
 -0.13780627  0.00390041  0.11927134  0.33463664  0.36131257  0.86628731
  1.069625    1.35750981]


#### Merge Sort Time Complexity
Let $T(n)$ be the time for sorting an array with $n$ numbers, using the Merge Sort algorithm. At each recursive step, two sorted arrays of size $\frac{n}{2}$ are *merged*. The latter merge step costs $\mathcal{O}(n)$ time in the worst-case. Therefore, the mathematical recurrence for sorting $n$ numbers is as follows:

\begin{eqnarray*}
 T(n) &=&  2\times T(\frac{n}{2}) + \mathcal{O}(n)      \nonumber \\
     &=&  2 (2\times T(\frac{n}{2^2}) + \mathcal{O}(\frac{n}{2})) + \mathcal{O}(n)    \nonumber \\
     &=&  2^2\times T(\frac{n}{2^2}) + \mathcal{O}(n) + \mathcal{O}(n)    \nonumber \\
      \textrm{(after k steps)} \ldots \nonumber \\
     &=& 2^k\times T(\frac{n}{2^k}) + \Sigma_{1}^{k}( \mathcal{O}(n))     \nonumber \\ 
\end{eqnarray*}

For termination, $\frac{n}{2^k}=1$ (i.e., when the problem size shrinks to 1). This implies, $k=\log n$. Therefore, $T(n)=\mathcal{O}(n\times \log n)$.

In summary:
* Average case: $\mathcal{O}(n log n)$
* Worst case: $\mathcal{O}(n log n)$
* Best case: $\Omega(n log n)$

## Practice Problems

### 1
For the following list, 34 50 25 16 60 82 76 5 25, walk through the merge sort algorithm and show the state of the list at each pass.

### 2
What determines whether you should use a quadratic sort or a logarithmic sort?