# 2.1: Multiplication
## Imaginary nums
$(a+bi)(c+di) = ac - bd + (ad + bc)i$   
$ad + bc = (a+b)(c+d) - ac - bd$  
4 -> 3 multiplies.

$n$-bit add is $O(n)$, grade-school multiply is $O(n^2)$

## General case
$x = x_L x_R = 2^{\frac{n}{2}} x_L + x_R$  
$y = y_L y_R = 2^{\frac{n}{2}} y_L + y_R$  

Without Gauss:  
$xy = (2^{\frac{n}{2}} x_L + x_R)(2^{\frac{n}{2}} y_L + y_R) = 2^n x_L y_L + x_R y_R + 2^{\frac{n}{2}} (x_L y_R + x_R y_L)$  
$T(n) = 4 T(\frac{n}{2}) + O(n)$

With Gauss:  
$xy = 2^n x_L y_L + x_R y_R + 2^{\frac{n}{2}} ((x_L + x_R)(y_L + y_R) - x_L y_L - x_R y_R)$  
$T(n) = 3 T(\frac{n}{2}) + O(n)$

## Runtimes
Subproblems are halved in size every recursive call. At $\log_2(n)$ level, subproblems are size 1, recursion ends. Height of tree is $\log_2(n)$.

Without Gauss
* Level $k$, $4^k$ subproblems, each of size $O(\frac{n}{2^k})$, work is $4^k * O(\frac{n}{2^k}) = 2^k O(n)$.
* $\text{Work} = \sum_{k=0}^{\log_2(n)} 2^k O(n)$. Since $2>1$, the work is dominated by the work at the very bottom.
* At the bottom: $\text{Work} = 2^{\log_2(n)} O(n) = O(n^2)$

With Gauss
* Level $k$, $3^k$ subproblems, each of size $O(\frac{n}{2^k})$, work is $3^k * O(\frac{n}{2^k}) = (\frac{3}{2})^k O(n)$.
* $\text{Work} = \sum_{k=0}^{\log_2(n)} (\frac{3}{2})^k O(n)$. Since $\frac{3}{2}>1$, the work is dominated by the work at the very bottom.
* At the bottom: $\text{Work} = (\frac{3}{2})^{\log_2(n)} O(n) = 2^{\log_2(\frac{3}{2})^{\log_2(n)}} O(n) = n^{\log_2(\frac{3}{2})} O(n) \approx O(n^{1.58})$

Implementation: don't recurse down to 1 bit since 16 or 32 bit multiplication is a single operation on CPUs.

# 2.2: Master Theorem
Divide and conquer: solve problem of size $n$ by recursively solving $a$ subproblems of size $\frac{n}{b}$ and combining in $O(n^d)$ time.  
$T(n) = a T(\frac{n}{b}) + O(n)^d$

\# Levels: $\frac{n}{b^k} = 1 \to n = b^k \to k = \log_b(n)$  
\# Subproblems for level $k$: $a^k$  
\# Size of subproblem for level $k$: $O(\frac{n}{b^k})^d$

$T(n) = \sum_{k=0}^{\log_b(n)} a^k \times O(\frac{n}{b^k})^d = \sum_{k=0}^{\log_b(n)} \frac{a^k}{b^{dk}} \times O(n^d) = \sum_{k=0}^{\log_b(n)} (\frac{a}{b^d})^k \times O(n^d)$
* $\frac{a}{b^d} > 1$: bottom heavy, $(\frac{a}{b^d})^{\log_b(n)} \times O(n^d) = b^{\log_b(\frac{a}{b^d}) \times \log_b(n)} \times O(n^d) = n^{\log_b(\frac{a}{b^d})} \times O(n^d) = n^{\log_b(\frac{a}{b^d})} \times O(n^d) = n^{\log_b(a) - d} \times O(n^d) = O(n^{\log_b(a)})$
* $\frac{a}{b^d} = 1$ runtime same at every level: $\log_b(n) \times O(n^d) = O(n^d \log_b(n))$
* $\frac{a}{b^d} = 1$: top heavy, $O(n^d)$

# 2.3: Mergesort
Given an unsorted array, divide the array in half, recursively sort each half, the merge both halves together.

Merging 2 sorted arrays together takes $O(n)$ linear time.

$T(n) = 2T(\frac{n}{2}) + O(n)$  
$T(n) = \sum_{k=o}^{\log_2(n)} 2^k O(\frac{n}{2^k}) = \sum_{k=o}^{\log_2(n)} O(n) = O(n \log_2(n))$

In [1]:
def mergesort(L: list[int]) -> None:
    if len(L) <= 1:
        return L
    
    mid = len(L) // 2
    left, right = L[:mid], L[mid:]
    
    mergesort(left)
    mergesort(right)

    # Merge.
    i, j = 0, 0
    while (i < len(left)) and (j < len(right)):
        if left[i] < right[j]:
            L[i+j] = left[i]
            i += 1
        else:
            L[i+j] = right[j]
            j += 1

    L[i+j:] = left[i:] + right[j:]
    
L = [4, 2, 1, 6, 32, 3, 9, 7]
mergesort(L)
print(L)

[1, 2, 3, 4, 6, 7, 9, 32]


Comparison-based sorting can be represented as a binary tree with leaves having permutations of the array (one of them is sorted).

Tree sorts array of $n$ elements. Leaves are every permutation of ${1, 2, ..., n}$. $n!$ permutations, therefore $n!$ leaves.  
Binary tree with $n$ leaves has depth of $\log_2(n)$. Tree must then have a depth (and runtime) of at most O($\log_2(n!)) = O(n \log_2(n))$

Only applies to comparison-based sorting (not Radix and Counting Sort, which only work for integers).

# 2.4: Medians
Naive: Sort $O(n \log(n))$, then pick middle element.

Selection: get the $k$-th smallest element of $S$. Pick a number $v$ and split all elements in $S$ into $S_L$ (elements < $v$), $S_v$ (elements = $v$), and $S_R$ (elements > $v$).

$S = \{ 2, 36, 5, 21, 8, 13, 11, 20, 5, 4, 1 \}$  
$v = 5$  
$S_L = \{ 2, 4, 1 \}$, $S_v = \{ 5, 5 \}$, $S_R = \{ 36, 21, 8, 13, 11, 20 \}$

If we want the $8$-th smallest element of $S$, we immediately know that we need the $3$-rd smallest element of $S_R$. 

$
\text{selection}(S, k) = \begin{cases} 
    \text{selection}(S_L, k) & \text{if} & k \le |S_L| \\
    v & \text{if} & |S_L| < k \le |S_R| \\
    \text{selection}(S_R, k - |S_L| - |S_v|) & \text{if} & k > |S_L| + |S_v| \\
\end{cases}
$

Computing $S_L, S_v, S_R$ is linear time. Selection reduces elements from $|S|$ to $\max(|S_L|, |S_R|)$.  
Ideally we would pick $v$ so that $|S_L| = |S_R| = \frac{1}{2} |S|$, cutting the search space in half each time, $T(n) = 2T(\frac{n}{2}) + O(n)$.  
However, $v$ would have to be the median for that to be true, so instead we pick randomly.

Worst case for randomly picking $v$ is picking the min/max, cutting down only by 1 each time. $n + (n-1) + ... + \frac{n}{2} = \Theta(n^2)$.  
Best case of picking exactly the median is equally unlikely. $O(n)$.

$v$ is good if lies in $25-75$-th percentile (remaining search space will be at most $\frac{3}{4}$ the size).  
$v$ has a $50%$ change of being good. $v$ will be good an average every $2$ picks.  
After 2 splits on average, the search space will shrink to $\frac{3}{4}$ its size: $T(n) \le T(\frac{3}{4} n) + O(n)$  
$T(n) = O(n)$.

Quicksort: random split (partitioning the array), and sorting each partition. Also $O(n \log(n))$, but empirically faster than merge sort.

# 2.5: Matrix Multiplication
$XY = Z$, where $X, Y, Z$ are all $(n, n)$ matricies. Naively takes $O(n^3)$ operations, fill in all $n^2$ elements of $Z$, where each element is a $O(n)$ dot product.

Naive divide and conquer:  
$X = \begin{bmatrix} A & B \\ C & D \\ \end{bmatrix}, Y = \begin{bmatrix} E & F \\ G & H \\ \end{bmatrix}$  
$XY = \begin{bmatrix} AE + BG & AF  + BH \\ CE + DG & CF + DH \\ \end{bmatrix}$

$T(n) = 8T(\frac{n}{2}) + O(n^2)$  
$T(n) = \sum_{k=0}^{\log_2(n)} 8^k * O(\frac{n}{2^k})^2 = \sum_{k=0}^{\log_2(n)} 2^k * O(n^2) = O(n^3)$

Strassen:  
$XY$ only requires computing 7 $(\frac{n}{2}, \frac{n}{2})$ subproblems, not 8. (Quite a mess of algebra).

$T(n) = 7T(\frac{n}{2}) + O(n^2)$  
$T(n) = \sum_{k=0}^{\log_2(n)} 7^k * O(\frac{n}{2^k})^2 = \sum_{k=0}^{\log_2(n)} (\frac{7}{4})^k * O(n^2) = O(n^{\log_2(\frac{7}{4})}) * O(n^2) \approx O(n^{2.81})$

# 2.6: FFT