# _CLRS: Algorithms & Data Structures_
Nathan Sharp | 2019-2020
<br/><br/>
***
### ___$I$ Foundations___
# Chapter 2 Getting Started

In [3]:
import math
import random

In [4]:
def rand_arr(n):
    """Initialise a randomly sampled array of size n"""
    return[random.randrange(1, n, 1) for i in range(n)]
    
A = rand_arr(10)

# testing
print(A)

[7, 2, 4, 5, 7, 4, 8, 1, 8, 7]


## 2.1 Insertion Sort
***
Insertion sort iterates over all elements in the array, comparing each element to each of the descending sorted elements to insert in its place.

In [67]:
def insertSort(A):
    for j in range(1, len(A)):
        key = A[j] 
        # insert arr[j] into the sorted sequence arr[1..j-1]
        i = j-1
        while (i >= 0) and (A[i] > key):
            A[i+1] = A[i]
            i -= 1
        A[i+1] = key
    return A

# testing
insertSort(A)

[3, 3, 4, 4, 5, 5, 6, 7, 8, 8]

## 2.2 Analysing Algorithms
***
When analysing algorithms we assume the random-access machine (RAM) model of computation. The RAM model contains instructions commonly found in real computers: 

#### RAM Model of Computation
- arithmetic (add, subtract, multiply, divide, remainder, floor, ceiling)
- data movement (load, store, copy)
- control (conditional and unconditional branch, subroutine call, return)

All the above instructions take a constant amount of time in the RAM model.\
Bit shifting means that exponention can also be counted to run in constant time.

### Running Time of Insertion Sort
The running time of an algorithm is sum of the running time for each statement executed.\
In the following example $c_1$ to $c_7$ represent the cost at each step, and is multiplied by the number of times it is repeated 
(for an input of n values).

| insertSort()                                    | cost   | times                   |
|-------------------------------------------------|--------|-------------------------|
| `1      for j in range(1, len(A)):`             | $c_1$  | $n$                     |
| `2          key = A[j]`                         | $c_2$  | $n-1$                   | 
| `3          # insert A[a] into...`              |        |                         |
| `4          i = j-1`                            | $c_3$  | $n-1$                   |
| `5          while (i > 0) and (A[i] > key):`    | $c_4$  | $\sum_{j=2}^{n}t_j$     |
| `6              A[i+1] = A[i]`                  | $c_5$  | $\sum_{j=2}^{n}(t_j-1)$ |
| `7              i -= 1`                         | $c_6$  | $\sum_{j=2}^{n}(t_j-1)$ |
| `8          A[i+1] = key`                       | $c_7$  | $n-1$                   |

Giving total running time:

$$T(n)=c_1n+c_2(n-1)+c_3(n-1)+c_4\sum_{j=2}^{n}t_j+c_5\sum_{j=2}^{n}(t_j-1)+c_6\sum_{j=2}^{n}(t_j-1)+c_7(n-1)$$

#### Best case running time of Insertion Sort
The best case running time occurs when the array is already sorted. Thus $t_{j}$ = 1 for all j, giving: 

$$\begin{align} 
T(n)     & = c_1n+c_2(n-1)+c_3(n-1)+c_4(n-1)+c_7(n-1) \\ 
         & = (c_1+c_2+c_3+c_4+c_7)n - (c_2+c_3+c_4+c_7). \\
         & = (a)n + (b)
\end{align}
$$

We can express this running time as $an + b$ for constants $a$ and $b$ that depend on the statement costs. Hence the best case running time of insertion sort is a _linear function_ of $n$. Hence insertion sort is bounded below by $T(n) = \Omega(n)$.

#### Worst case running time of Insertion Sort

Even if the input size is given, the algorithm might depend on the order. We normally just find the worst case time as, (1) knowing this gives us an upper bound and, (2) the average case (which requires probabilistic analysis to calculate) is often very closly related to the worst case.

The worst case running time of insertion sort occurs when the array is in reverse sorted order. We must compare each element of $A[j]$ with each element in the entire sorted sub array, so $t_{j} = j$ for $j = 2,3,...,n.$\
Using the standard sumations: 

$$ \sum_{j=2}^{n}j = \frac{n(n+1)}{2}-1 \quad and \quad \sum_{j=2}^{n}j-1 = \frac{n(n+1)}{2}$$  

Substituting these sumation formula into our equation, the worst case running time for insertion sort is: 

$$
\begin{align}
T(n) & = c_1n + c_2(n-1) + c_3(n-1) + c_4\left(\frac{n(n+1)}{2}-1\right) + c_5\left(\frac{n(n+1)}{2}\right) + c_6\left(\frac{n(n+1)}{2}\right)+c_7(n-1) \\
 & = \left(\frac{c_4}{2} + \frac{c_5}{2} + \frac{c_6}{2}\right)n^2 + \left(c_1 + c_2 + c_3 + \frac{c_4}{2} - \frac{c_5}{2} - \frac{c_6}{2} + c_7\right)n - (c_2 + c_3 + c_4 + c_7) \\
 & = (a)n^{2} + (b)n + (c)
\end{align} $$

As we can express this worst case running time as $an^{2} + bn + c$, insertion sort is thus a _quadratic function_ of n. Hence insertion sort is bounded above by $T(n) = O (n^2)$

### Order of growth of Insertion Sort 
Generally in time complexity analysis of algorithms we drop all but the largest term as it will come to dominate all the other terms as the size of n increases (this is gererally where running time matters more). Additionally we drop this terms coefficient. Hence in conclusion we can say Insertion sort has a worst case runtime in $\Theta (n^{2})$, pronounced "theta of n-squared". (NS: is the lower bound satisfied?)

## 2.3 Designing Algorithms
***

### 2.3.1 Divide and conquer
Many useful algorithms are recursive, meaning they solve smaller sub-problems and cascade the solution to solve the problem.


### Merge Sort
_Input:_ A list `A` of natural numbers, `p`,`r`: $1 \leq$ `p` $\leq$ `r` $\leq$ `n`.\
`MergeSort` 'assumes' <sup>[1](#myfootnote1)</sup> `A[p..q]` and `A[q+1 ...r]` are in sorted order then calls `Merge`\
`Merge` repeatedly taking the lowest element avaliable comparing the front element from the two sub arrays:

<a name="myfootnote1">1</a> Assumption is guaranted by calling recursivly on sub array down to a singleton which is defacto sorted.

In [5]:
def mergeSort(A,p=0,r=-1):
    """ Sorts a list of integers by repeated recursive calls and a call to `merge` """
    # if initial call
    if r == -1:
        r = len(A) - 1
        
    if p < r:
        q = int(math.floor((p+r)/2))
        mergeSort(A,p,q)
        mergeSort(A,q+1,r)
        merge(A,p,q,r)
    return A


def merge(A,p,q,r):
    """ Merges two sorted lists by repeatedly taking the lowest element avaliable 
        by comparing the front element from the two sub arrays
    """
    left_size,right_size = q-p+1,r-q  
    # initialises new array in memory 
    left_arr  = [A[p+i] for i in range(left_size)]
    right_arr = [A[(q+1)+j] for j in range(right_size)]
    # +1 for infinity sentinel
    left_arr.append(math.inf)
    right_arr.append(math.inf)
    # loop over domain, replacing in A, but sorted 
    # with the next element = min(left_arr[0], right_arr[0]) 
    for i in range(r-p+1):
        if left_arr[0] <= right_arr[0]:
            A[p+i] = left_arr.pop(0)
        else:
            A[p+i] = right_arr.pop(0)
            
# testing
A = rand_arr(100)
B = A
B.sort()
mergeSort(A) == B

True

### 2.3.2 Analysing divide-and-conquer algorithms 

When an algorithm contains a recursive call to itself, we can oftendescribe its running time by a _recurrence equation_ which describes the overall running time on a problem of size $n$ in terms of running time of smaller inputs. We can then use mathematical tools to solve the recurrence and provide bounds on the performance of the algorithm.

### Running time of Merge Sort