# Sorting
Sorting is a classical problem in computer science. Algorithms that solve this problem by comparing elements of `arr` to each other are called **comparison sorts**.

## Problem statement
- **Input**: A collection of `n` integers called `arr`.
- **Output**: The same collection of `n` integers, rearranged so that for any `i` such that `0 <= i <= n-1`, `arr[i] < arr[i+1]` holds.
- **Constraints**: 
    - `arr` contains only integers, which may be non-unique
    - `-(2^31)-1 <= arr[i] <= (2^31)-1`
    - `0 <= n <= 10^7`
    - The input collection may be in any order (including already sorted!)

## Analysis
For an array of $n$ elements, comparison-based sorting has a runtime lower bound of $O(n \cdot log(n))$. CLRS gives a proof for this in 8.1. Also:
- We cannot be certain any array is completely sorted unless we look at every element at least once. So we will have to look at all $n$ elements.
- As we look at each element `arr[i]`, we are going to need to compare it to some other element in the array to determine where `arr[i]` belongs in the sorted order. Some algorithms compare `arr[i]` to _every_ other element in `arr` (doing $n$ comparisons for each of $n$ elements, thus a $O(n^2)$ runtime). Other are cleverer and do less work.


## Test harness

In [14]:
import random

def sorting_test(sorting_fn, test_count=100, min_arr_size=0, max_arr_size=100):
    def test(arr, case_name):
        expected = sorted(arr)
        actual = sorting_fn(arr)
        assert actual == expected, f"sorting failure: {case_name}!\n\nArr: {arr}\n\nExpected: {expected}\n\nActual: {actual}"
    
    # Edge cases
    test([], 'empty case')
    test([1], 'singleton case')
    test([1, 2], 'two-element case')
    test([i for i in range(100)], 'pre-sorted case')
    
    # Average cases
    for _ in range(test_count):
        arr = [randint(-1000, 1000) for _ in range(random.randint(min_arr_size, max_arr_size))]
        test(arr, 'average case (randomly generated)')


## $O(n^2)$ average time comparison sorts

Most of these algorithms are done in place; however the implementations below return a sorted copy of the given list.

### bubblesort

Bubblesort is a straightforward algorithm that works by making multiple passes over the array and swapping elements to the right when they're out of order, effectively "bubbling the largest elements up."

In [19]:
def bubble_sort(arr):
    arr = [val for val in arr]
    
    for end in range(len(arr), 0, -1):
        for j in range(1,end):
            i = j-1
            if arr[i] > arr[j]:
                arr[i], arr[j] = arr[j], arr[i]
    return arr

sorting_test(bubble_sort)

### insertion sort

Insertion sort partitions the array into a sorted half on the left and unsorted half on the right by repeatedly inserting elements from the right into the correctly sorted position on the left until the entire array is sorted. 

In [34]:
def insertion_sort(arr):
    if not arr:
        return []
    arr = arr.copy()
    
    for i in range(1, len(arr)):
        key = arr[i]
        j = i - 1
        while key < arr[j] and 0 <= j:
            arr[j+1] = arr[j]
            j-=1
        arr[j+1] = key
    return arr
    
sorting_test(insertion_sort)

## $O(n\cdot log(n))$ average time

### mergesort

Mergesort is a divide-and-conquer algorithm that splits the given array into smaller and smaller subarrays until there is only one element left per array. It then merges them into sorted order.

In [41]:
def merge(arr1, arr2):
    merged = []
    i = j = 0
    while i < len(arr1) and j < len(arr2):
        if arr1[i] <= arr2[j]:
            merged.append(arr1[i])
            i+=1
        else:
            merged.append(arr2[j])
            j+=1
    merged.extend(arr1[i:])
    merged.extend(arr2[j:])
    return merged

def mergesort(arr):
    if len(arr) <= 1:
        return arr
    left = mergesort(arr[:len(arr)//2])
    right = mergesort(arr[len(arr)//2:])
    return merge(left, right)
    
sorting_test(mergesort)

### tree sort

A tree sort takes every element of an array and inserts it into a binary search tree, then in-order walks the tree to get get the sorted ordering. 

In [48]:
class BSTNode:
    def __init__(self, val, left=None, right=None):
        self.val = val
        self.right = right
        self.left = left 
    
    def insert(self, val):
        if val <= self.val:
            if self.left:
                self.left.insert(val)
            else:
                self.left = BSTNode(val)
        else:
            if self.right:
                self.right.insert(val)
            else:
                self.right = BSTNode(val)
    
    def inorder_walk(self, traversal_arr):            
        if self.left:
            self.left.inorder_walk(traversal_arr)
        
        traversal_arr.append(self.val)
        
        if self.right:
            self.right.inorder_walk(traversal_arr)
                
    

def tree_sort(arr):
    if not arr:
        return arr
    root = BSTNode(arr[0])
    for val in arr[1:]:
        root.insert(val)
    traversal = []
    root.inorder_walk(traversal)
    return traversal

sorting_test(tree_sort)

### heapsort
A heap sort inserts every element into a min-heap and extracts them all one-by-one to get the sorted ordering. 

In [43]:
from heapq import heappush, heappop

def heapsort(arr):
    heap = []
    for val in arr:
        heappush(heap, val)
    return [heappop(heap) for _ in range(len(heap))]

sorting_test(mergesort)