# Divide and Conquer, Sorting and Searching, and Randomized Algorithms

## Grade 3 multiplication algorithem
- $O(n^2)$

## Karatsuba multiplication:
- Given two numbers $5678, 1234$
- Define $a=56, b=78, c=12, d=34$
- compute 
$a*c \tag{1}$ 
$b*d \tag{2}$
$(a+b)*(c+d) \tag{3}$
$(3)-(2)-(1) = ad+bc \tag{4}$ 
- result
$10^{n}*(1) + 10^{n/2}*(4) + (2) \tag{5}$
$= 10^{n}ac + 10^{n/2}(ad+bc) + bd \tag{6}$
- recursively compute (3 multiplications - Gauss'trick)
$ac, bd, (a+b)(c+d) \tag{7}$

## Mergesort

Pseudocode
```
recursively sort 1st half of array
recursively sort 2nd half of array
C = output[length=n]
A = 1st sorted array[n/2]
B = 2st sorted array[n/2]
i = 1
j = 1

for k=1 to n
    if A(i) < B(j)
        C(k) = A(i)
        i++
    else B(j) < A(i)
        C(k)
        j++
```

Running time
- there are approximately $log_{2}n$ levels in the recursion tree
- at each level $j$, there are $2^{j}$ sub-problems whose size is $\dfrac{n}{2^{j}}$
- work per level is $(2^{j})(\dfrac{n}{2^{j}}) = n$
- thus, total work is approximately $(n)(logn)$

## Big-Oh

- $T(n) = O(f(n))$ iff there exists $c, n_{0}$ such that $T(n) \le cf(n)$ for all $n \ge n_{0}$
- $T(n) = \Omega(f(n))$ iff there exists $c, n_{0}$ such that $T(n) \ge cf(n)$ for all $n \ge n_{0}$
- $T(n) = \Theta(f(n))$ iff $T(n) = O(f(n))$ and $T(n) = \Omega(f(n))$

In [None]:
import math

In [None]:
def pad_number_with_zeros(number):
    """
    Pads a number with zeros until its length becomes a power of 2
    
    Args:
    number (string) -- number before zero-padding is applied
    
    Returns
    number (string) -- number after zero-padding is applied
    """
        
    number_of_zeros_to_pad = 2 ** math.ceil(math.log(len(number), 2)) - len(number)
    number = number.zfill(len(number) + number_of_zeros_to_pad)
    
    return number

In [None]:
def karatsuba(operand1, operand2):
    """
    Performs Karatsuba multiplication using Gauss's trick

    Args:
    operand1 (string) -- first operand of multiplication
    operand2 (string) -- second operand of multiplication

    Returns:
    result (integer) -- result of multiplication
    """
    
    # base case
    if int(operand1) < 10 or int(operand2) < 10:
        return int(operand1) * int(operand2)
    
    # if necessary, pad zeros to operand1 and/or operand2 (their length must equal) 
    if len(operand1) > len(operand2):
        operand1 = pad_number_with_zeros(operand1)
        operand2 = operand2.zfill(len(operand1))
    else:
        operand2 = pad_number_with_zeros(operand2)
        operand1 = operand1.zfill(len(operand2))
    
    # split both operands by half
    firsthalf_operand1 = operand1[:int(len(operand1)/2)] # a
    secondhalf_operand1 = operand1[int(len(operand1)/2):len(operand1)] # b
    firsthalf_operand2 = operand2[:int(len(operand2)/2)] # c
    secondhalf_operand2 = operand2[int(len(operand2)/2):len(operand2)] # d

    result = 0
        
    step1 = karatsuba(firsthalf_operand1, firsthalf_operand2)
    step2 = karatsuba(secondhalf_operand1, secondhalf_operand2)
    input1 = int(firsthalf_operand1) + int(secondhalf_operand1)
    input2 = int(firsthalf_operand2) + int(secondhalf_operand2)
    step3 = karatsuba(str(input1), str(input2)) 
    step4 = step3 - step2 - step1
    result = step1 * (10 ** len(operand1)) + step4 * (10 ** int(len(operand1)/2)) + step2 
    # recursive algorithm
#     step3 = karatsuba(firsthalf_operand1, secondhalf_operand2)
#     step4 = karatsuba(secondhalf_operand1, firsthalf_operand2)
#     result = step1 * (10 ** len(operand1)) + (step3 + step4) * (10 ** int(len(operand1)/2)) + step2 
    
    return result

In [None]:
assert(pad_number_with_zeros("47068") == "00047068")
assert(pad_number_with_zeros("5678") == "5678")

assert(karatsuba("12", "34") == 408)
assert(karatsuba("12", "345") == 4140)
assert(karatsuba("5678", "1234") == 7006652)
assert(karatsuba("12345678", "12345678") == 152415765279684)
assert(karatsuba("74639573", "94756283") == 7072568502187159)
assert(karatsuba("8475637284756461", "7483726374837363") == 63429350291486860416277938452343)
assert(karatsuba("3141592653589793238462643383279502884197169399375105820974944592", "2718281828459045235360287471352662497757247093699959574966967627") == 8539734222673567065463550869546574495034888535765114961879601127067743044893204848617875072216249073013374895871952806582723184)

## Inversion

Input 
- array A with numbers $1 \dots n$ with arbitrary order

Output
- number of inversions (number of pairs with $i \le j$ and $A[i] \ge A[j]$)

Example
- $(1,3,5,2,4,6)$ => $(3,2)$, $(5,2)$, $(5,4)$
- let $B = (1,3,5)$, $C = (2,4,6)$, $D = (1,2,3,4,5,6)$
- when $2$ is copied to $D$, inversions are $(3,2)$ and $(5,2)$
- when $4$ is copied to $D$, inversions are $(5,4)$
- thus, when an element of $C$ is copied, inversions are the remaining numbers in $B$  

Note
- largest possible number of inversions: $\binom{n}{2} = \dfrac{n(n-1)}{2}$

Pseudocode
```
sort_and_count(array A, length n)
if n=1, return 0
else
    (B, X) = sort_and_count(1st half of A, n/2)
    (C, Y) = sort_and_count(2st half of A, n/2)
    (D, Z) = count_split_inv(A, n)
return X+Y+Z
```

## Strassen's algorithm

- divide and conquer on matrix multiplication

$X = \begin{pmatrix}
A & B\\
C & D
\end{pmatrix}, Y = \begin{pmatrix}
E & F\\
G & H
\end{pmatrix}$
 
- then

$XY = \begin{pmatrix}
AE+BG & AF+BH\\
CE+DG & CF+DH
\end{pmatrix}$

- which takes 8 muliplications: $O(n^{3})$

$P1 = A(F-H), P2 = (A+B)H, P3 = (C+D)E, P4 = D(G-E), P5 = (A+D)(E+H), P6 = (B-D)(G+H), P7 = (A-C)(E+F)$
- then it only takes 7 multiplications only, !


## Closest pair

Input
- let $P = \{p_{1} \dots p_{n}\}$ points in ${\rm I\!R^{2}}$
- let $d(p_{i}, p_{j})$ be eucliean distance of $p_{i}, p_{j}$

Output
- $p, q$ such that $d(p,q)$ is the minumum

Naive approach
- sort points: $O(nlogn)$
- return the closest pair of adjacent points: $O(n)$
- in total: $O(n^{2})$

Divide-and-conquer

1. let $Q$ = left half of $P$ and $R$ = right half of $P$
2. form $Q_{x}, Q_{y}, R_{x}, R_{y}$
3. $(p_{1}, q_{1})$ = closest_pair$(Q_{x}, Q_{y})$
4. $(p_{2}, q_{2})$ = closest_pair$(R_{x}, R_{y})$
5. let $\delta = min\left[d(p_{1}, q_{1}), d(p_{2}, q_{2})\right]$
6. $(p_{3}, q_{3})$ = closest_split_pair$(p_{x}, p_{y}, \delta)$
7. return the best of $(p_{1}, q_{1})$, $(p_{2}, q_{2})$, $(p_{3}, q_{3})$

closest_split_pair$(p_{x}, p_{y}, \delta)$
- let $\bar{x}$ = the biggest x-coordinate in left of $P$
- let $S_{y}$ = points of $P$ with x-coordinates in sorted by y-coordinate
- initialize best = $\delta$, best_pair = None
- for $i = 1$ to $|S_{y}| - 7$
    - for $j = 1$ to $7$
        - let $p$ = $i$th point of $S_{y}$
        - let $q$ = $(i+j)$th point of $S_{y}$
        - if $d(p,q)$ < best
            - best_pair = $(p,q)$
            - best = $d(p,q)$

Correctness

- let $p \in Q, q \in R$ be a split pair with $d(p,q) \lt \delta$
- (A) $p$ and $q$ are members of $S_{y}$
- (B) $p$ and $q$ are at most 7 positions apart of $S_{y}$
- corollary: if the closest pair of $P$ is a split pair, then the closest_split_pair finds it

Proof of (A)
- let $p = (x_{1}, y_{1}) \in Q$
- let $q = (x_{2}, y_{2}) \in R$
- let $d(p,q) \le \delta$
- since $d(p,q) \le \delta$, $|x_{1} - x_{2}| \le \delta$ and $|y_{1} - y_{2}| \le \delta$ 
- $p \in Q$ => $x_{1} \le \bar{x}$, $q \in R$ => $x_{2} \ge \bar{x}$
- thus, $x_{1}, x_{2} \in [\bar{x} - \delta, \bar{x} + \delta]$

Proof of (B)
- consider $\delta / 2$ by $\delta / 2$ boxes with center $\bar{x}$ and bottom $min\{y_{1}, y_{2}\}$ 
- lemma 1: all points of $S_{y}$ with y-coordinate between those of $p$ and $q$ lie in one of 8 boxes
    - proof: y-coordinates of $p,q$ differ by less than $\delta$ and x-coordinates are between $\bar{x} - \delta, \bar{x} + \delta$
- lemma 2: at most one point of $P$ in each box
    - proof: suppose $a,b$ lie in the same box. Then, $a,b$ are either both in $Q$ or both in $R$. Then, $d(a,b) \le \dfrac{\delta}{2}\sqrt{2} \le \delta$. But, this contradicts the definition of $\delta$ 


## Master's method

$T(n) \le aT\left(\dfrac{n}{b}\right) + O(n^{d})$

$a$ = number of recursive steps

$b$ = input size factor

$d$ = running time of "combine step"

$T(n) = O(n^{d}\log{n})$ if $a = b^{d}$

$T(n) = O(n^{d})$ if $a \lt b^{d}$

$T(n) = O(n^{\log_{b}{a}})$ if $a \gt b^{d}$

In [None]:
def open_file(file_path):
    """
    Reads contents of a file
    
    Args:
    file_path (string) -- location of file where there are lots of numbers
    
    Returns:
    array (list) -- array of integers
    """
    
    with open(file_path, 'r') as line:
        array = line.read().split("\n")
        
    return array

In [None]:
def mergesort_and_inversion(integer_array):
    """
    Implements merge sort and computes the number of inversions

    Args:
    integer_array (list) -- array of integers
    
    Returns:
    sorted_array_and_inversion (tuple) -- sorted array and number of inversion
    """

    # base case (only one element in an array)
    if len(integer_array) == 1:
        num_inversion = 0
        return (integer_array, num_inversion)

    # base case (only two elements in an array)
    if len(integer_array) == 2:
        num_inversion = 0
        if int(integer_array[0]) > int(integer_array[1]):
            temp = integer_array[0]
            integer_array[0] = integer_array[1]
            integer_array[1] = temp
            num_inversion = 1
        return (integer_array, num_inversion)

    # split integer_array by half
    first_half = integer_array[:int(len(integer_array)/2)]
    second_half = integer_array[int(len(integer_array)/2):len(integer_array)]

    # recurse
    result_from_first_half = mergesort_and_inversion(first_half)
    result_from_second_half = mergesort_and_inversion(second_half)

    sorted_first_half = result_from_first_half[0]
    sorted_second_half = result_from_second_half[0]
    num_inversion_first_half = result_from_first_half[1]
    num_inversion_second_half = result_from_second_half[1]

    # combine step
    i = 0
    j = 0
    sorted_integer_array = []
    num_inversion = num_inversion_first_half + num_inversion_second_half
    
    for k in range(0, len(integer_array)):
        if int(sorted_first_half[i]) < int(sorted_second_half[j]):
            sorted_integer_array.append(sorted_first_half[i])
            if i < len(sorted_first_half)-1:
                i += 1
            # if finished with one array, just push elements of other sorted array
            else:
                for index in range(j, len(sorted_second_half)):
                    sorted_integer_array.append(sorted_second_half[index])
                break
        else:
            sorted_integer_array.append(sorted_second_half[j])
            # count inversion
            num_inversion += len(sorted_first_half[i:len(sorted_first_half)])

            if j < len(sorted_second_half)-1:
                j += 1
            # if finished with one array, just push elements of other sorted array
            else:
                for index in range(i, len(first_half)):
                    sorted_integer_array.append(sorted_first_half[index])
                break

    sorted_array_and_inversion = (sorted_integer_array, num_inversion)
    return sorted_array_and_inversion

In [None]:
assert(mergesort_and_inversion(open_file("data/mergesort-and_inversion.txt"))[1] == 2407905288)

## Quicksort

- $O(n\log{n})$ on average
- no additional space required
- if pivot is chosen in worst way possible, the algorithm could run in $O(n^2)$  
- choosing pivot "randomly" is a good idea

Partitioning
- rearrange array so that
    - left of pivot is less than pivot
    - right of pivot is greater than pivot
- this also puts pivot in its rightful position

```
Partition(A,l,r) # input = A[l ... r]
P = A[l] # for example, pick first element as pivot
i = l+1
for i = l+1 to r
    if A[j] < P
        swap A[j] and A[i]
        i++
swap A[l] and A[i-1]
```

```
quicksort(array A, length n)
if n=1
    return
p = choosepivot(A, n)
Partition A around P
recursively sort 1st part
recursively sort 2nd part
```

Correctness
- claim: quicksort correctly sort every array with $n \ge 1$
- base case $n = 1$: every array of length 1 is already sorted
- for $n \ge 2$, need to show if claim holds for $\forall k \lt n$, then claim holds for $n$ as well
    - let $k_{1}, k_{2}$ = length of 1st and 2nd parts of partitioned array around the pivot
    - but 1st and 2nd parts are sorted correctly by recursive calls
    - this, entire array is correctly sorted
    
Analysis
- let $\Omega$ = all possible outcomes of random choices in quicksort
- for $\sigma \in \Omega$, let $C(\sigma)$ = number of comparisons between two input elements made by quicksort
- lemma: running time of quicksort is dominated by number of comparisions
    - for $\sigma \in \Omega$, indices $i \lt j$: $X_{ij}(\sigma)$ = number of times $z_{i}, z_{j}$ get compared with pivot sequence $\sigma$
    - $X_{ij}(\sigma)$ = 1 one of $i$ and $j$ is pivot. 0 otherwise 
    - thus, $\forall \sigma, C(\sigma) = \displaystyle\sum_{i=1}^{n-1}\displaystyle\sum_{j=i+1}^{n}X_{ij}(\sigma)$ 
    - then, $E[C(\sigma)] = \displaystyle\sum_{i=1}^{n-1}\displaystyle\sum_{j=i+1}^{n}E[X_{ij}(\sigma)]$
    - also, $E[X_{ij}] = 0 * Pr[X_{ij} = 0] + 1 * Pr[X_{ij} = 1] = Pr[X_{ij} = 1]$
    - thus, $E[C(\sigma)] = \displaystyle\sum_{i=1}^{n-1}\displaystyle\sum_{j=i+1}^{n}Pr[z_{i},z_{i}$ comapred$]$
- claim: $\forall i \lt j$, $Pr[z_{i},z_{i}$ comapred$] = \dfrac{2}{j-i+1}$ 
    - two cases:
        - $z_{i}$ or $z_{j}$ gets chosen first, then compared
        - otherwise, not compared
    - thus, $Pr[z_{i},z_{i}$ comapred$]$ = choices for case 1 / total number of choices = $\dfrac{2}{j-i+1}$
- finally, $2\displaystyle\sum_{i=1}^{n-1}\displaystyle\sum_{j=i+1}^{n}\dfrac{1}{j-i+1} \le 2 * n * \displaystyle\sum_{k=2}^{n}\dfrac{1}{k} \le 2 * n * ln(n)$

In [None]:
def openfile(file_path):
    """
    Reads in a file and store the content into an array
    
    Args:
    file_path (string) -- path of file to be read
    
    Returns:
    array (list) -- array of integers
    """
    
    with open(file_path, 'r') as line:
        array = line.read().split("\n")
    return array

In [None]:
def list_of_string_to_integer(array):
    """
    Converts the contents of array from type string to type integer
    
    Args:
    array (list) -- array of integers (in type string)
    
    Returns
    None
    """
    
    for i in range(0, len(array)):
        array[i] = int(array[i])

In [None]:
def choose_median_item_as_pivot(integer_array, start_index, end_index):
    """
    Compares the first, middle, and last elements of an array and returns the median element
    
    Args:
    integer_array (list) -- array of intergers
    start_index (integer) -- beginning index of arary
    end_index (integer) -- ending index of array
    
    Returns:
    median (interger) -- median element of the first, middle, and last elements of array
    """
    
    middle_index = 0
    if (start_index - end_index) % 2 == 0:
        middle_index = int((end_index - start_index) / 2)
    else:
        middle_index = int((end_index - start_index - 1) / 2)
       
    num1 = integer_array[start_index]
    num2 = integer_array[end_index]
    num3 = integer_array[start_index + middle_index]

    median = 0
    if num1 > num2:
        if num1 < num3:
            median = num1
        elif num2 > num3:
            median = num2
        else:
            median = num3
    else:
        if num1 > num3:
            median = num1
        elif num2 < num3:
            median = num2
        else:
            median = num3
            
    return median

In [None]:
def partition(integer_array, start_index, end_index, pivot, comparison):
    """
    Performs partition around a specific item
    
    Args:
    integer_array (list) -- array of intergers to be partitioned
    start_index (integer) --  index of array to apply partitioning from
    end_index (integer) -- index of array to apply partitioning to
    pivot (integer) -- pivot element 
    comparison (list) -- array to store the number of comparions in all subroutines
    
    Returns:
    None
    """

    target_index = integer_array.index(pivot)

    temp = integer_array[start_index]
    integer_array[start_index] = integer_array[target_index]
    integer_array[target_index] = temp    

    i = start_index + 1
    for j in range(start_index + 1, end_index + 1):
        if integer_array[j] < pivot:
            temp = integer_array[i]
            integer_array[i] = integer_array[j]
            integer_array[j] = temp
            i += 1

    temp = integer_array[start_index]
    integer_array[start_index] = integer_array[i-1]
    integer_array[i-1] = temp

    comparison.append(end_index - start_index)

In [None]:
def quicksort(integer_array, start_index, end_index, comparison, pivot_strategy):
    """
    Implements quicksort and computes the number of comparisons in partition subroutine
    
    Args:
    integer_array (list) -- array of intergers in random order
    start_index (integer) -- index of array to apply sorting from
    end_index (integer) -- index of array to apply sorting to
    comparison (list) -- array to store the number of comparions in all partition subroutines
    pivot_strategy (string) -- a flag to specify how to pick the pivot element
    
    Returns:
    total_comparison (integer) -- total number of comparions in all partition subroutines
    """

    # base case: there is only 1 element in the array
    if end_index <= start_index:
        return

    pivot = 0
    
    if pivot_strategy == "first_item":
        pivot = integer_array[start_index]
        
        partition(integer_array, start_index, end_index, pivot, comparison)
        partition_index = integer_array.index(pivot)
        
        quicksort(integer_array, start_index, partition_index-1, comparison, "first_item")
        quicksort(integer_array, partition_index+1, end_index, comparison, "first_item")
        
    elif pivot_strategy == "last_item":
        pivot = integer_array[end_index]
        
        partition(integer_array, start_index, end_index, pivot, comparison)
        partition_index = integer_array.index(pivot)
        
        quicksort(integer_array, start_index, partition_index-1, comparison, "last_item")
        quicksort(integer_array, partition_index+1, end_index, comparison, "last_item")
        
    elif pivot_strategy == "median":
        pivot = choose_median_item_as_pivot(integer_array, start_index, end_index)
        
        partition(integer_array, start_index, end_index, pivot, comparison)
        partition_index = integer_array.index(pivot)
        
        quicksort(integer_array, start_index, partition_index-1, comparison, "median")
        quicksort(integer_array, partition_index+1, end_index, comparison, "median")
    
    total_comparison = sum(comparison)
    return total_comparison

In [None]:
array = openfile("data/quicksort1.txt")
list_of_string_to_integer(array)
assert(quicksort(array, 0, len(array)-1, [], "first_item") == 69)

array = openfile("data/quicksort1.txt")
list_of_string_to_integer(array)
assert(quicksort(array, 0, len(array)-1, [], "last_item") == 65)

array = openfile("data/quicksort1.txt")
list_of_string_to_integer(array)
assert(quicksort(array, 0, len(array)-1, [], "median") == 56)

array = openfile("data/quicksort.txt")
list_of_string_to_integer(array)
assert(quicksort(array, 0, len(array)-1, [], "first_item") == 162085)

array = openfile("data/quicksort.txt")
list_of_string_to_integer(array)
assert(quicksort(array, 0, len(array)-1, [], "last_item") == 164123)

array = openfile("data/quicksort.txt")
list_of_string_to_integer(array)
assert(quicksort(array, 0, len(array)-1, [], "median") == 138382)

## Randomized Selection

- Input: Array $A$ with $n$ distinct numbers ${1,2 \dots n}$, and a number
- Output: $i$th order statistics ($i$th smallest number)

Solutions
1. Do mergesort and return $i$th element of sorted array: $O(n\log{n})$
2. Randomized Selection: $O(n)$ on average

Pseudocode
```
RSelect(array A, length n, order statistic i)
    if n = 1
        return A[1]
    choose pivot p from A uniformly at random
    partition A around p
    let j = new index of p
    if j = i
        return p
    if j > i
        return RSelect(1st part of A, j-1, i)
    if j < i
        return RSelect(2nd part of A, n-j, i-j)
```

Analysis
- RSelect uses $\le cn$ operations outside of recursive call, for some constant $c$ (from partitioning)
- RSelect is in phase $j$ if current array size is between $(\dfrac{3}{4})^{j+1}n$ and $(\dfrac{3}{4})^{j}n$
- let $X_{j}$ = number of recursive calls during phase $j$
- thus, running time of RSelect $\le \displaystyle\sum_{j}X_{j}*c*(\dfrac{3}{4})^{j}*n$
    - if RSelect chooses a pivot giving 25-75 split (or better), then current phase ends!
    - probability of 25-75 split or better is 50%
    - thus, $E[X_{j}] \le$ expected number of times you need to flip a fair coin to get one "heads"
- finally, expected running time of RSelect $\le E\left[cn\displaystyle\sum_{j}(\dfrac{3}{4})^{j}X_{j}\right] = cn\displaystyle\sum_{j}(\dfrac{3}{4})^{j}E\left[X_{j}\right] \le 2cn\displaystyle\sum_{j}(\dfrac{3}{4})^{j} \le 8cn = O(n)$

Solutions
3. Deterministic Selection: $O(n)$ (but needs more space than RSelect)
    - the best pivot is median (50-50 split)
    - use median of medians !

Pseudocode
```
DSelect(array A, length n, order statistic i)
    break A into groups of 5, sort each group
    let c = n/5 ""middle elements"
    p = DSelect(c, n/5, n/10): recursively compute median of c
    partition A around p
    if j = i
        return p
    if j > i
        return DSelect(1st part of A, j-1, i)
    if j < i
        return DSelect(2nd part of A, n-j, i-j)
```

Analysis (from the pseudocode)
- step #1: $\theta(n)$
- step #2: $\theta(n)$
- step #3: $T(n/5)$
- step #4: $\theta(n)$
- step #5: $\theta(1)$
- step #6 and #7: $T(n) \le cn + T(n/5) + T(approx. 7/10n)$ => $O(n)$

## Sorting lower bound
- every "comparison-based" sorting algorithm runs in $\Omega(nlogn)$
- consider input arrays containing $\{1,2,3 \dots n\}$ in some order ($n!$ such inputs)
- suppose algorithm always makes $\le k$ comparisons to correctly sort these $n!$ inputs
- across all $n!$ possible inputs, algorithm exhibits $\le 2^{k}$ distinct executions (why because if $2^{k} < n!$, execute identically on two distinct inputs, which means must get one of them incorrect!)
- thus, $2^{k} \ge n! \ge (\dfrac{n}{2})^{\dfrac{n}{2}}$
- thus, $k \ge \dfrac{n}{2}log_{2}\dfrac{n}{2} = \Omega(nlogn)$

## Graph

- vertices $(V)$, edges $(E)$, number of vertices $(n)$, number of edges $(m)$
- cut of graph $G = (V,E)$ is a partition of $V$ into two non-empty sets $A$ and $B$
- crossing edges of a cut$(A,B)$ has one endpoint in each of $(A,B)$ 

Min-cut problem

- input: undirected $G = (V,E)$
- output: cut with fewest crossing edges

Graph representation
- adjacency matrix: $A_{ij} = 1$ if $G$ has i-j edge ($\theta(n^{2})$ space required)
- adjacency lists: each vertex points to edges incident to it ($\theta(n+m)$ space required)


Random contraction algorithm

Pseudocode
```
while there are more than 2 vertices
- pick a remaining edge (u,v) uniformly at random
- merge u and v into a single vertex
- remove self-loops (but parallel edges are allowed)

return cut represented by the final 2 vertices
```

Analysis
- let $k$ = number of edges crossing min-cut$(A,B)$ and call these edges $F$
    - then, $Pr[$output is $(A,B)] = Pr[$F is never contracted$]$ 
- let $S_{i}$ = event that an edge of $F$ contracted in interation $i$
    - then, we want to compute $Pr[\neg S_{1} \land \neg S_{2} \land \neg S_{3} \land \dots \land \neg S_{n-2}]$
- 1st iteration
    - note that degree (number of incident edges) of each vertex is at least $k$
        - thus, $Pr[S_{1}] = \dfrac{k}{m}$ (number of crossing edges / total number of edges)
    - since $\displaystyle\sum_{j}degree(v) = 2m$
        - $m = \dfrac{kn}{2}$
    - since $Pr[S_{1}] = \dfrac{k}{m}$
        - $Pr[S_{1}] \le \dfrac{2}{n}$
- 2nd iteration
    - note that $Pr[\neg S_{1} \land \neg S_{2}] = Pr[\neg S_{2} | \neg S_{1}]Pr[\neg S_{1}]$
    - also, $Pr[\neg S_{1}] \ge 1 - \dfrac{2}{n}$ and $Pr[\neg S_{1} \land \neg S_{2}] = 1 - (k$ / number of remaining edges$)$
    - thus, $Pr[\neg S_{2} | \neg S_{1}] \ge 1 - \dfrac{2}{n-1}$
- All iteration
    - probability of success = $Pr[\neg S_{1} \land \neg S_{2} \land \neg S_{3} \land \dots \land \neg S_{n-2}] = Pr[\neg S_{1}]Pr[\neg S_{2} | \neg S_{1}]Pr[\neg S_{3} | \neg S_{2} \land \neg S_{1}] \dots Pr[\neg S_{n-2} | \neg S_{1} \land \dots \land \neg S_{n-3}] \ge (1 - \dfrac{2}{n})(1 - \dfrac{2}{n-1})(1 - \dfrac{2}{n-2}) \dots (1 - \dfrac{2}{n-(n-4)})(1 - \dfrac{2}{n-(n-3)}) = \dfrac{2}{n(n-1)} \ge \dfrac{1}{n^{2}}$
- How many trials needed?
    - let $T_{i}$ = event that cut$(A,B)$ is found on $i$th try
    - $Pr[$all $N$ trials fail$] = Pr[\neg T_{1} \land \neg T_{2} \land \dots \land \neg T_{n}]$ 
    - because $T_{i}$'s are independent, $Pr[\neg T_{1} \land \neg T_{2} \land \dots \land \neg T_{n}] = \displaystyle\prod_{i=1}^{N}Pr[\neg T_{i}] \le (1 - \dfrac{1}{n^{2}})N$
    - take $N = n^{2}$, then $Pr[$all fail$] \le (e^{-\dfrac{1}{n^{2}}})^{n^{2}} = \dfrac{1}{e}$
    - take $N = n^{2}ln(n)$, then $Pr[$all fail$] \le (\dfrac{1}{e})^{ln(n)} = \dfrac{1}{n}$
    
Counting min-cuts
- claim: there are ${n}\choose{2}$ min-cuts in graph $G = (V,E)$
- proof: 
    - let $t$ = number of min-cuts
    - lower bound: 
        - each paif of $n$ vertices can define distinct min-cuts
        - thus, $t \ge$ ${n}\choose{2}$
    - upper bound: 
        - let $(A_{1}, B_{1}), (A_{2}, B_{2}) \dots (A_{t}, B_{t})$ be min-cuts
        - $Pr[output = (A_{i}, B_{i})] = Pr[S_{i}] \ge \dfrac{2}{n(n-1)} = \dfrac{1}{{n}\choose{2}} \forall{i = 1,2 \dots t}$
        - because $S_{i}$'s are disjoint events (only one can happen) their probabilities sum to at most 1
            - $\dfrac{t}{{n}\choose{2}} \le 1$ => $t \ge$ ${n}\choose{2}$
    - therefore, $t$ = ${n}\choose{2}$

In [None]:
import random

In [None]:
def openfile(file_path, split_index):
    """
    Read in a file and stores data into an array
    
    Args:
    file_path (string) -- path of file to be read
    split_index (string) -- delimiter to perform the "split"
    
    Returns:
    array (list of lists) -- array to hold input data 
    """
    
    with open(file_path, 'r') as line:
        array = line.read().split("\n")
        for i in range(0, len(array)): # last subarray is an empty array
            subarray = array[i].split(split_index)
            subarray = subarray[:-1] # remove last empty element
            array[i] = subarray
    return array

In [None]:
def convert_to_pair_representation(array):
    """
    Converts graph from adjacency list representation to pairs representation
    
    Args:
    array (list of lists) -- adjacency list representation of graph
    
    Returns:
    edges (list of tuples) -- pair representation of graph
    """
    
    edges = []
    for i in range(0, len(array)): 
        for j in range(1, len(array[i])):
            edges.append((array[i][0], array[i][j]))
    return edges

In [None]:
def get_new_adjacent_vertices(array, vertex1, vertex2):
    """
    Gets vertices that will be connected to vertex1 after vertex2 is merged into vertex1
    
    Args: 
    array (list of lists) -- adjacency list representation of graph
    vertex1 (string) -- first vertex of an edge to apply contraction
    vertex2 (string) -- second vertex of an edge to apply contraction
    
    Returns:
    new_adjacent_vertices (list) -- new sets of vertices that vertex1 will be connected to
    """
    
    new_adjacent_vertices = []
    for i in range(0, len(array)): 
        if array[i][0] == vertex1 or array[i][0] == vertex2:
            for vertex in array[i]:
                if vertex != vertex1 and vertex != vertex2: # don't include self-loop, remove edge (vertex1, vertex2)
                    new_adjacent_vertices.append(vertex)
    return new_adjacent_vertices

In [None]:
def remove_vertex2(array, vertex2):
    """
    Remove vertex2 from the graph (where vertex2 is the first vertex of edges)
    
    Args:
    array (list of lists) -- adjacency list representation of graph
    vertex2 (string) -- second vertex of an edge to apply contraction
    
    Returns:
    None
    """
    
    for i in range(0, len(array)):
        if array[i][0] == vertex2: # remove vertex2 information
            array.remove(array[i])
            return

In [None]:
def update_vertex1(array, vertex1, new_adjacent_vertices):
    """
    Updates vertex1 in the graph (where vertex1 is the first vertex of edges)
    
    Args:
    array (list of lists) -- adjacency list representation of graph
    vertex1 (string) -- first vertex of an edge to apply contraction
    new_adjacent_vertices (list) -- new sets of vertices that vertex1 will be connected to
    
    Returns:
    None
    """
    
    new_array = []
    for i in range(0, len(array)): 
        if array[i][0] == vertex1: # update vertex1 information
            new_array.append(array[i][0])
            new_array = new_array + new_adjacent_vertices
            array.remove(array[i])
            array.append(new_array)
            return

In [None]:
def replace_vertex2_with_vertex1(array, vertex1, vertex2):
    """
    Replaces vertex2 with vertex1 in the graph (where vertex1 and vertex2 are the second vertices of edges)
    
    Args:
    array (list of lists) -- adjacency list representation of graph
    vertex1 (string) -- first vertex of an edge to apply contraction
    vertex2 (string) -- second vertex of an edge to apply contraction
    
    Returns:
    None
    """
    
    for i in range(0, len(array)): 
        while vertex2 in array[i]:  # replace vertex2 with vertex1. There could be more than 1 vertex2
            array[i].remove(vertex2)
            array[i].append(vertex1) 

In [None]:
def mincut(adjacency_representation_array):
    """
    Performs minimum cut on graph algorithm (cut that crosses fewest number of edges)

    Args:
    adjacency_representation_array (list of lists) -- adjacency list representation of graph

    Returns:
    None
    """
    
    while len(adjacency_representation_array) > 2:
        pair_representation_array = convert_to_pair_representation(adjacency_representation_array)
        pick = random.choice(pair_representation_array) 
        new_adjacent_vertices = get_new_adjacent_vertices(adjacency_representation_array, pick[0], pick[1])        
        remove_vertex2(adjacency_representation_array, pick[1])
        update_vertex1(adjacency_representation_array, pick[0], new_adjacent_vertices)
        replace_vertex2_with_vertex1(adjacency_representation_array, pick[0], pick[1])
        mincut(adjacency_representation_array)

In [None]:
def do_trial(file_path, split_index, num_trial): 
    """
    Executes mincut many times to find the optimum answer
    
    Args:
    file_path (string) -- path of file to be read
    split_index (string) -- delimiter to perform the "split"
    num_trials -- how many times to try
    
    Returns:
    None
    """
    
    i = 0
    mincut_num = 10000
    while i < num_trial:
        adjacency_representation_array = openfile(file_path, split_index)
        mincut(adjacency_representation_array)
        if len(adjacency_representation_array[0]) < mincut_num:
            mincut_num = len(adjacency_representation_array[0])
        print(str(len(adjacency_representation_array[0])) + " ? current mincut is: " + str(mincut_num-1))
        i = i + 1
    return mincut_num-1

In [None]:
print(do_trial("data/mincut.txt", "\t", 50))
print(do_trial("data/mincut1.txt", " ", 20))
print(do_trial("data/mincut2.txt", " ", 20))
print(do_trial("data/mincut3.txt", " ", 20))
print(do_trial("data/mincut4.txt", " ", 20))
print(do_trial("data/mincut5.txt", " ", 20))
print(do_trial("data/mincut6.txt", " ", 20))