## Advanced Sorting Algorithms: QuickSort and Merge Sort

In [1]:
from metakernel import register_ipython_magics
register_ipython_magics()

In [2]:
## Define some function useful for testing
import random
import timeit

## generate an array of n random integers up to 10000
def get_random_array(n):
    return [random.randint(0, 100) for _ in range(n)]

def test_sorting_algorithm(algorithm):
    for _ in range(100):
        A = get_random_array(random.randint(0, 1000))
        A_sorted = algorithm(A)
        assert A_sorted == sorted(A), "FAIL!"
        
# testing testing function
test_sorting_algorithm(sorted)

## QuickSort 

Quicksort is a divide-and-conquer algorithm. It works by selecting a 'pivot' element from the array and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. The sub-arrays are then sorted recursively. This can be done **in-place**, requiring small additional amounts of memory to perform the sorting.

Thus, the most important part of QuickSort is its partition algorithm. 
Given a pivot element, the partition algorithm splits a subarray into three parts.

- Elements that are smaller than or equal to the pivot
- The pivot
- Elements that are greater than or equal to the pivot

The algorithm works in-place, i.e., it performs the partition within the subarray itself without any extra space.

![alt text](partition_algorithm.png "Partition: pseudocode")

![alt text](partition_figure.png "Figure")

![alt text](partition_running_example.png "Title")

### Exercise: binary vector
You are given a binary vector, i.e., each element is either 0 or 1. Implements an easy variant of partition to sort the vector.

In [3]:
import random 

In [4]:
def partition(A, low, high): 
    pivot = 0
    i = low-1
  
    for j in range(low, high): 
        if A[j] <= pivot: 
            i = i+1 
            A[i], A[j] = A[j], A[i] 
    A[i+1], A[high] = A[high], A[i+1] 
    return A

In [5]:
binary = [random.randint(0,1) for _ in range(20)]
print("Originale: ", binary)

print("\nSorted: ", partition(binary, 0, len(binary)-1)) 

Originale:  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0]

Sorted:  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]


### Exercise: QuickSort
Below an implementation of QuickSort. 

In this exercise you have to:
- Write detailed comments to describe crucial parts of the code below (to prove you have understand it)
- Implement a random selection of the pivot element

In [6]:
from random import randint

In [7]:
def randomPivot(A,low,high):
    randompivot = randint(low, high)
    A[high], A[randompivot] = A[randompivot], A[high]
    return partition(A, low, high)

def partition(A, low, high): 
    pivot = A[high] #scelgo l'ultimo valore come pivot
    i = low-1
    for j in range(low, high): 
        if A[j] <= pivot: #se l'elemento corrente è più piccolo del pivot 
            i = i+1 
            A[i], A[j] = A[j], A[i] #scambio l'elemento in posizione i con quello in posizione j, così porto a sinistra gli elementi più piccoli del pivot
    A[i+1], A[high] = A[high], A[i+1] #il pivot viene posizionato al centro dell'array
    return i+1 #ritorno la posizione del pivot 

def quickSort_rec(A, low, high):
    if low < high: 
        pi = randomPivot(A, low, high) 
        quickSort_rec(A, low, pi-1) #recursion nella parte sinistra dell'array
        quickSort_rec(A, pi+1, high) #recursion nella parte destra dell'array
        
def quickSort(B):
    A = B[:]  
    quickSort_rec(A, 0, len(A)-1)
    return A

In [8]:
a = get_random_array(10)
quickSort(a)

[3, 12, 25, 33, 42, 52, 62, 88, 98, 98]

In [9]:
test_sorting_algorithm(quickSort)

## Let's do some experiments

Is QuickSort faster than InsertionSort and SelectionSort in practice?

In [10]:
def insertionSort(coll):
    A = list(coll)
    for i in range(1, len(A)):
        curr = A[i]
        j = i-1
        while j >= 0 and curr < A[j]:
            A[j+1] = A[j]
            j -= 1
        A[j+1] = curr
    return A

In [11]:
def selectionSort(coll):
    A = list(coll)
    for i in range(len(A)): 
        # Find the minimum element in remaining unsorted array 
        min_idx = i 
        for j in range(i+1, len(A)): 
            if A[min_idx] > A[j]: 
                min_idx = j 

        # Swap the found minimum element with  
        # the first element         
        A[i], A[min_idx] = A[min_idx], A[i]
    return A

In [12]:
A = get_random_array(1000)

In [13]:
%timeit quickSort(A)

5.34 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [14]:
%timeit insertionSort(A)

70 ms ± 1.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [15]:
%timeit selectionSort(A)

55.7 ms ± 6.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


### More than 30x faster with arrays of length 1000. 

Try to run the experiments with an array of 10000. **Run insertion sort and selection sort just before you go to sleep.** 

### Let's see how time changes by increasing the length of the array

In [16]:
qs_t = []
is_t = []
ss_t = []

lens = [2**i for i in range(1, 11)]

for n in lens:
    A = get_random_array(n)
    result = %timeit -o quickSort(A)
    qs_t.append( result.best )
    result = %timeit -o insertionSort(A)
    is_t.append( result.best )
    result = %timeit -o selectionSort(A)
    ss_t.append( result.best )

4.09 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
973 ns ± 182 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.96 µs ± 444 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
10.5 µs ± 1.71 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
2.23 µs ± 499 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
4.41 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
19.1 µs ± 651 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
7.31 µs ± 813 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
11.5 µs ± 2.68 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
61.8 µs ± 22.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
21.8 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
26.6 µs ± 7.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
96.7 µs ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
72.8 µs ± 1.63 µs per loop (mean ±

In [17]:
import matplotlib.pyplot as plt # standard way to import

%matplotlib inline # any cell within the notebook that creates a plot will embed a PNG image of the resulting graphic
plt.rcParams["figure.figsize"] = (20,10)
plt.style.use('ggplot')

UsageError: unrecognized arguments: # any cell within the notebook that creates a plot will embed a PNG image of the resulting graphic


In [None]:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

ax.plot(lens, qs_t, "g--", label="QuickSort")
ax.plot(lens, is_t, "r--", label="InsertionSort")
ax.plot(lens, ss_t, "b--", label="SelectionSort")

_ = ax.legend(loc="best")

## Merge Sort
Merge Sort is an efficient, general-purpose, comparison-based sorting algorithm. Most implementations produce a **stable sort**, which means that the order of equal elements is the same in the input and output. 

Merge sort is a divide and conquer algorithm that was invented by John von Neumann in 1945.

Below a running example.

![alt text](mergesort_figure.png "Title")

### Exercise: Merge Sort
Complete the implementation of Merge Sort by implementing function ```merge()```.

In [34]:
def merge(A, l, m, r): 
    n1 = m - l + 1
    n2 = r - m 

    # create temp arrays 
    L = [0] * (n1) 
    R = [0] * (n2) 
    for i in range(0,n1): 
        L[i] = A[l + i] 
    for j in range(0,n2): 
        R[j] = A[m + 1 + j] 

    i = 0     # Initial index of first subarray 
    j = 0     # Initial index of second subarray 
    k = l     # Initial index of merged subarray 

    while i < n1 and j < n2 : 
        if L[i] <= R[j]: 
            A[k] = L[i] 
            i += 1
        else: 
            A[k] = R[j] 
            j += 1
        k += 1

    while i < n1: 
        A[k] = L[i] 
        i += 1
        k += 1

    while j < n2: 
        A[k] = R[j] 
        j += 1
        k += 1

In [35]:
def mergeSort_rec(A, l, r): 
    if l < r:       
        m = (l+(r-1))//2  # Same as (l+r)//2, but avoids overflow for large l and h 
    
        # Sort first and second halves 
        mergeSort_rec(A, l, m) 
        mergeSort_rec(A, m+1, r) 
        merge(A, l, m, r)

In [36]:
def mergeSort(B):
    A = B[:] # Copy the array just because we decided to return a sorted copy of the original array 
    mergeSort_rec(A, 0, len(A)-1)
    return A

In [37]:
arr = get_random_array(10)

print("Given array: ", arr) 
print("Sorted array: ", mergeSort(arr))

Given array:  [40, 31, 55, 5, 19, 50, 42, 9, 38, 14]
Sorted array:  [5, 9, 14, 19, 31, 38, 40, 42, 50, 55]


In [26]:
test_sorting_algorithm(mergeSort)

In [27]:
A = get_random_array(10000)

In [28]:
%timeit quickSort(A)

198 ms ± 17.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [29]:
%timeit mergeSort(A)

87.2 ms ± 3.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
