# Applied Probability and Randomized Algorithms: Quicksort algorithm and Karger's randomized algorithm

Alviona Mancho | p3200098@aueb.gr

## Preliminaries

In [70]:
import random
import math

## Quicksort

Quicksort is an efficient, general-purpose sorting algorithm. It was developed by British computer scientist Tony Hoare in 1959 and published in 1961.

Quicksort is a divide-and-conquer algorithm. It works by selecting a 'pivot' element from the array and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. The sub-arrays are then sorted recursively. This can be done in-place, requiring small additional amounts of memory to perform the sorting.

Mathematical analysis of quicksort shows that, on average, the algorithm takes $\mathcal{O}(n\log {n})$ comparisons to sort n items. In the worst case, it makes $\mathcal{O}(n^2)$ comparisons.

[Read More](https://en.wikipedia.org/wiki/Quicksort)

### Implementation

#### Auxilliary swap function

In [71]:
def swap(array, i, j):
    temp = array[i]
    array[i] = array[j]
    array[j] = temp

#### Partition

Partition is the key function in Quicksort.

- First we have to choose a pivot element. 
- Then, we place the pivot element in its final position by moving all elements that are $\leq pivot$ to the left and all the elements that are  $> pivot$ to the right.

In this implementation the first element serves as 'pivot'.

Here is an example:


<div align="center">
    <img alt="Partition" src="https://media.geeksforgeeks.org/wp-content/uploads/20221115132720/recursiontreedrawio.png" height="500">
</div>

In [72]:
def partition(array, p, r, comparisons_hist, i_hist):
    pivot = array[p] # using the first element as pivot
    i = p
    for j in range(p+1, r+1):
        comparisons_hist[i_hist] += 1
        if array[j] <= pivot:
            i = i+1
            swap(array, i, j)
    swap(array, i, p)
    q = i
    return q

#### Quicksort

In [73]:
def quicksort(array, p, r, comparisons_hist, i_hist):
    if p < r:
        q = partition (array, p, r, comparisons_hist, i_hist)
        quicksort (array, p, q-1, comparisons_hist, i_hist)
        quicksort (array, q+1, r, comparisons_hist, i_hist)

### Simulation

We are going to execute the algorithm a large number of times and estimate the average number of comparisons required to sort a random permutation. Then we are going to compare this with the theoretical upper bound for the average case complexity of Quicksort.

More specifically, we are going to execute the algorithm 100 times providing as input a random permutation of the numbers 1,2,..,50 where each possible permutation has equal probability.

Here is a simple function that given a sequence (i.e. 1,2,..,50 in our case) it returns a random permutation of the sequence, where all permutations are equiprobable. 

The way this is achieved is that in each step it randomly chooses a position within the sequence, takes the element in that position, removes it from the old sequence and then appends it in a new sequence (empty at first). In this way, each permutation occurs with probability $\frac{1}{n!}$.

In [74]:
def rand_permutation(sequence):
    sequence_cpy = sequence.copy()
    iters = len(sequence_cpy)
    permutation = []

    for i in range(iters) :
        j = random.randint(0, len(sequence_cpy)-1)
        permutation.append(sequence_cpy[j])
        sequence_cpy.pop(j)
        
    return permutation

Now we can estimate the average number of comparisons required to sort a random permutation of 1,2,..,50. As stated above, we are going to execute the algorithm 100 times.

In [143]:
n = 50
iters = 100

# Define the sequence of numbers we are going to use
sequence = list(range(1, n+1))

# Initialiaze a list for the number of comparisons that will be made in each run of the algorithm
comparisons_hist = [0]*iters

for i in range(iters):
    # Construct a random permutation of the initial sequence of numbers
    array = rand_permutation(sequence)

    # Use Quicksort to sort this permutation of numbers
    quicksort(array, 0, n-1, comparisons_hist, i)

# Calculate the average number of comparisons per run
average = sum(comparisons_hist)/len(comparisons_hist)
print("Average number of comparisons per run: {0:.2f}".format(average))

# Compare with the theoretical upper bound for average case complexity of Quicksort
print("Theoretical upper bound for average case complexity of Quicksort = O(nlogn), where nlogn = {0:.2f}".format(n*math.log(n,2)))

# Compare with the theoretical upper bound for worst case complexity of Quicksort
print("Theoretical upper bound for worst case complexity of Quicksort = O(n^2), where n^2 =  {0:.2f}".format(n**2))

Average number of comparisons per run: 254.09
Theoretical upper bound for average case complexity of Quicksort = O(nlogn), where nlogn = 282.19
Theoretical upper bound for worst case complexity of Quicksort = O(n^2), where n^2 =  2500.00


Indeed the average number of comparisons per run agrees with the theoretical upper bound for the average case complexity of the algorithm.

## Karger's Algorithm

Karger's algorithm is a randomized algorithm to compute a minimum cut of a connected graph. It was invented by David Karger and first published in 1993.

The idea of the algorithm is based on the concept of contraction of an edge $(u,v)$ in an undirected graph $G=(V,E)$. Informally speaking, the contraction of an edge merges the nodes $u$ and $v$ into one, reducing the total number of nodes of the graph by one. All other edges connecting either $u$ or $v$ are "reattached" to the merged node, effectively producing a multigraph. Karger's basic algorithm iteratively contracts randomly chosen edges until only two nodes remain; those nodes represent a cut in the original graph. By iterating this basic algorithm a sufficient number of times, a minimum cut can be found with high probability (i.e. <i>Probability Amplification</i>).

---

$$
\begin{align*}
&\textbf{Procedure Contract}(G = (V, E)): \\
&\quad \text{While } |V| > 2: \\
&\quad \quad \text{Choose } e \in E \text{ uniformly at random.} \\
&\quad \quad G \leftarrow G / e \\
&\text{Return the only cut in } G.
\end{align*}
$$

---

[Read more](https://en.wikipedia.org/wiki/Karger%27s_algorithm)


### Implementation

This implementation uses two lists, V and E, for the vertices/nodes and the edges respectively. 

- Each vertex/node of the graph is a string: e.g. ```'a', 'b', ...```

- Each edge is a list itself: e.g. ```['a', 'b']```

In [65]:
def karger_min_cut(V,E):
    n = len(V)

    for i in range(n-2):   # the goal is to be left with 2 vertices so n-2 iterations are required (in each iteration the number of vertices reduces by 1)
        m = len(E)

        # Pick a random edge
        edge_index = random.randint(0, m-1)
        edge = E[edge_index]
        u = edge[0]
        v = edge[1]

        # Contraction:
        # Create a super vertex consisting of the vertices u, v of the edge selected above (i.e. merge u and v)
        super_vertex = u + v
        print("\tContraction {0}: {1}".format(i+1, u+v))

        edges_to_remove = []
        edges_to_append = []
        
        # All other edges connecting either u or v are "reattached" to the super vertex
        for e in E:
            if u in e and v in e:
                edges_to_remove.append(e)
            elif u in e:
                new_edge = e.copy()
                new_edge.remove(u)
                new_edge.append(super_vertex)
                edges_to_remove.append(e)
                edges_to_append.append(new_edge)
            elif v in e:
                new_edge = e.copy()
                new_edge.remove(v)
                new_edge.append(super_vertex)
                edges_to_remove.append(e)
                edges_to_append.append(new_edge)

        # Update the edges (E) of the graph after contraction
        for edge in edges_to_remove:
            E.remove(edge)
        
        for edge in edges_to_append:
            E.append(edge)

        # Update the vertices (V) of the graph after contraction 
        V.append(super_vertex)
        V.remove(u)
        V.remove(v)

    # Return the final graph 
    return V, E

### Simulation

We are going to run the algorithm 20 times, and from the 20 results we will obtain, we are going to choose the smallest min-cut. 

- As we know, the probability of the algorithm failing to find the min-cut is  $\leq 1 - \frac{2}{n*(n-1)}$

- So with 20 independent executions, the probability becomes $\leq ( 1 - \frac{2}{n*(n-1)})^{20}$

The graph we are going to use is the following:

<div align="center">
    <img src="images/graph_for_karger_alg.png" alt="Graph" style="height:200px;">
</div>

This graph has $n = 8$ vertices and $m = 14$ edges. Hence, the probability of the algorithm failing to find the min-cut within 20 executions is $\leq ( 1 - \frac{2}{n*(n-1)})^{20} = ( 1 - \frac{2}{8*(8-1)})^{20} \approx 0.48 $

Note that this graph has a unique min-cut: $\{(B,E), (D,G)\}$ of size 2.

In [66]:
# Construct the graph (i.e. define V and E)
V = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']    # Vertices
E = []                                          # Edges

# Automate the construction by observing that the graph consists of two cliques (K_4)
cliques = []
cliques.append(['a', 'b', 'c', 'd'])
cliques.append(['e', 'f', 'g', 'h'])
for clique in cliques:
    for i in range(len(clique)):
        for j in range(i+1, len(clique)):
            E.append([clique[i], clique[j]])
E.append(['b', 'e'])
E.append(['d', 'g'])



# Perform 20 iterations and keep the minimum cut as the final result
iters = 20
min_cut = E
min_cut_size = len(E)

for i in range(iters):
    print("\nIteration {0}:".format(i+1))

    V_final, E_final = karger_min_cut(V.copy(), E.copy())
    
    print("\n\tV = {0}".format(V_final))
    print("\tE = {0}".format(E_final))

    if len(E_final) < len(min_cut):
        min_cut = E_final
        min_cut_size = len(E_final)

print("\n---------------------------  Karger's randomized algorithm  ---------------------------")
print("> Min-cut found: {0}".format(min_cut))
print("> Min-cut size: {0}".format(min_cut_size))


Iteration 1:
	Contraction 1: ad
	Contraction 2: bad
	Contraction 3: fg
	Contraction 4: cbad
	Contraction 5: efg
	Contraction 6: cbadefg

	V = ['h', 'cbadefg']
	E = [['h', 'cbadefg'], ['h', 'cbadefg'], ['h', 'cbadefg']]

Iteration 2:
	Contraction 1: ef
	Contraction 2: gh
	Contraction 3: ac
	Contraction 4: bef
	Contraction 5: dac
	Contraction 6: ghbef

	V = ['dac', 'ghbef']
	E = [['dac', 'ghbef'], ['dac', 'ghbef'], ['dac', 'ghbef'], ['dac', 'ghbef']]

Iteration 3:
	Contraction 1: eh
	Contraction 2: bd
	Contraction 3: cbd
	Contraction 4: ehcbd
	Contraction 5: gehcbd
	Contraction 6: fgehcbd

	V = ['a', 'fgehcbd']
	E = [['a', 'fgehcbd'], ['a', 'fgehcbd'], ['a', 'fgehcbd']]

Iteration 4:
	Contraction 1: ac
	Contraction 2: fh
	Contraction 3: efh
	Contraction 4: bd
	Contraction 5: acbd
	Contraction 6: gacbd

	V = ['efh', 'gacbd']
	E = [['efh', 'gacbd'], ['efh', 'gacbd'], ['efh', 'gacbd'], ['efh', 'gacbd']]

Iteration 5:
	Contraction 1: fg
	Contraction 2: ab
	Contraction 3: efg
	Contraction 4:

Indeed, the algorithm managed to find the min-cut within 20 executions. 

The interesting fact here is that it managed to find it not just 1 but 4 times within these specific 20 executions! (Iteration: 10, 15, 19, 20)

Of course this can vary, but what all possible groups of 20 executions have in common is that the overall probability of success (i.e. the probability of at least one of them ending in success) is $\approx 1-0.48 = 0.52 > \frac{1}{2}$