# Sorting algorithm

## Summary
There are many kind of sorting algorithm, which can be mostly divide to two catogories: *comparison based sorting* and *non-comparison based sorting*. The comparison based sorting can't breakthough the theoretical time complexity O(nlogn). The non-comparison based sorting usually has linear time complexity but the algorithms are only applicable to certain kind of data.

* Comparison based sorting: bubble sort, selection sort, insertion sort, shell sort, merge sort, quick sort, heap sort

* Non-comparison based sorting: counting sort, bucket sort, radix sort

In [26]:
# Timer decorator the time the sorting function
from functools import wraps
import time
import random

def timer(func):
    @wraps(func)
    def wrapper(*arg,**kw):
        start = time.time()
        func(*arg,**kw)
        end = time.time()
        print(end-start)
    return wrapper

### Bubble sort
Bubble sort have time complexity O(n^2) and space complexity O(1) for inplace sorting. Bubble sort is a stable sort. (Stable sort: if a=b and a is in front of b after the sort a must still in front of b.)

1. Compare the element in the array with its next element on by on. If the element is larger than its next element then switch those two element. **After the first loop the max element will be the last element in the array**
2. Loop step 1 for the whole array except the last element of the array.
3. Loop until the sorting done.

In [27]:
# bubble sort

@timer
def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        for j in range(n-i-1):
            if arr[j]>arr[j+1]:
                arr[j],arr[j+1] = arr[j+1],arr[j]
    return print(arr[:20])

random.seed(1)
sample = random.sample(range(100000),10000)
bubble_sort(sample)

[29, 31, 33, 44, 64, 77, 86, 88, 90, 103, 110, 112, 115, 118, 134, 137, 189, 191, 194, 196]
10.948261260986328


### Selection sort
Selection sort is very similar to bubble sort. They have the same time and space complexity. However, the selection sort is usually faster than the bubble sort because it has less switch operations.

1. Find the index of the min/max element in array and swith position with the first element.
2. Find the index of the min/max element in the array except the first one and swith with the second element
3. Loop the process until the sort done

In [28]:
# Selection sort

@timer
def selection_sort(arr):
    n = len(arr)

    for i in range(n):
        max_idx = i
        for j in range(i+1,n):
            if arr[j]<arr[max_idx]:
                max_idx = j
        arr[i],arr[max_idx] = arr[max_idx],arr[i]
    
    return print(arr[:20])

random.seed(1)
sample = random.sample(range(100000),10000)
selection_sort(sample)

[29, 31, 33, 44, 64, 77, 86, 88, 90, 103, 110, 112, 115, 118, 134, 137, 189, 191, 194, 196]
4.762071847915649


### Insertion sort

The idea of insertion sort is chosing a subarray and then insert the next element in the middle of that subarray. The time complexity of insertion sort is O(n^2) and space complexity is O(1) for inplace insertion sort. For inplace insertion sort:

1. Start from the second element in the array and compare with its previous element until it is smaller or equal.
2. Insert the element to its position and extend the subarray
3. Loop through the whole array.

In [29]:
# Insertion sort

@timer
def insertion_sort(arr):
    n = len(arr)
    
    for i in range(1,n):
        current = arr[i]
        idx = i-1
        while arr[idx] > current and idx >= 0:
            arr[idx+1] = arr[idx]
            idx -= 1
        arr[idx+1] = current
                
    return print(arr[:20])

random.seed(1)
sample = random.sample(range(100000),10000)
insertion_sort(sample)

[29, 31, 33, 44, 64, 77, 86, 88, 90, 103, 110, 112, 115, 118, 134, 137, 189, 191, 194, 196]
5.272757053375244


### Shell sort

The essential of insertion sort is moving the num to its correct position from small subarrays. However, consider the worst case, the operations to achieve the final state is trival if the num is far away from its correct spot. The shell sort starts with large step size subarray that move the far-located num faster to its spot and once the gap step of shell sort reduced to 1 shell sort is equivalent to insertion sort. The efficiency of shell sort is related to the choose of the gap size.

1. insertion sort the subarrays with certain step gap in the array.
2. move to the next step gap and insertion sort the array again.
3. Loop until the step gap is 1.

In [33]:
# shell sort
import math

@timer
def shell_sort(arr):
    n = len(arr)
    k = int(math.log(n,2))
    steps = [2**i for i in range(k,-1,-1)]
    
    for step in steps:
        for i in range(step,n):
            current = arr[i]
            idx = i-step
            while arr[idx] > current and idx>=0:
                arr[idx+step] = arr[idx]
                idx -= step
            arr[idx+step] = current
    
    return print(arr[:20])

random.seed(1)
sample = random.sample(range(100000),10000)
shell_sort(sample)

[29, 31, 33, 44, 64, 77, 86, 88, 90, 103, 110, 112, 115, 118, 134, 137, 189, 191, 194, 196]
0.13191723823547363


### Merge sort

Merge sort is using the concept 'Divide and Conquer' to divde the array to small arrays, sort and then  merge the sorted small array to large sorted array.

1. Divde the array with size n to two subarrays with size n/2.
2. Merge sort the divded n/2 array
3. combined the sorted subarrays

In [42]:
# Merge sort

def merge(a,b):
    results = []
    while a and b:
        if a[0]>=b[0]:
            results.append(b.pop(0))
        else:
            results.append(a.pop(0))
    if a:
        return results+a
    else:
        return results+b
        
@timer
def merge_timer(arr):
    def merge_sort(arr):
        n = len(arr)
        if n<=1:
            return arr
        mid = n//2
        left = merge_sort(arr[:mid])
        right = merge_sort(arr[mid:])
        return merge(left,right)
    return print(merge_sort(arr)[:20])

random.seed(1)
sample = random.sample(range(100000),10000)
merge_timer(sample)

[29, 31, 33, 44, 64, 77, 86, 88, 90, 103, 110, 112, 115, 118, 134, 137, 189, 191, 194, 196]
0.07495379447937012


### Quick sort

Quick sort is similar to merge sort that both of them use 'Divide and Conquer' idea. The quick sort is usual have a inplace version for saving space.

1. Choose a pivot from the array, place the num smaller or equal to pivot to one subarray and larger than pivot to another subarray.
2. Quick sort the subarrays.
3. Combine the smaller subarray, pivot and the large subarray

In [45]:
# quick sort

@timer
def quick_sort(arr):
    
    def divide(arr,start,end):
        n = len(arr)
        pivot = arr[end]
        idx = start
        for i in range(n-1):
            if arr[i]<=pivot:
                arr[i],arr[idx] = arr[idx],arr[i]
                idx += 1
        arr[idx],arr[end] = arr[end],arr[idx]
        return idx

    def sort(arr,start,end):
        if start>=end:
            return arr
        
        idx_pivot = divide(arr,start,end)
        sort(arr,start,idx_pivot-1)
        sort(arr,idx_pivot+1,end)
        return arr
    n = len(arr)
    sort(arr,0,n-1)
    return print(arr[:20])

random.seed(1)
sample = random.sample(range(100000),10000)
quick_timer(sample)

[29, 31, 33, 44, 64, 77, 86, 88, 90, 103, 110, 112, 115, 118, 134, 137, 189, 191, 194, 196]
0.03997659683227539


### Heap sort

Heap sort use the property of binary tree that the father node is always larger than the son nodes.

1. Build ordered heap that follow the heap rules.
2. Exchange the first element with the last leave.
3. Maintain the ordered heap without the last leave.
4. Loop until the last node of the heap.

In [None]:
# heap sort

@timer
def heap_sort(arr):
    global heap_size
    heap_size = len(arr)
    
    def maintain_maxheap(arr,i):
        while i<=(heap_size+1)//2:
            left = 2*i+1
            right = 2*i+2
            max_num = arr[i]
            if arr[left]>arr[i]:
                max_num = arr[left]
            if arr[right]>max_num:
                arr[right],arr[i] = arr[i],arr[right]
                i = right
                maintain_maxheap(arr,i)
            else:
                arr[left],arr[i] = arr[i],arr[left]
                i = left
                maintain_maxheap(arr,i)
        
    
    def build_maxheap(arr):
        n = len(arr)
        for i in range((n+1)//2,-1,-1):
            maintain_maxheap(arr,i)
    
    def sort(arr):
        build_maxheap(arr)
        arr[0]