<b>Sorting in Linear Time:</b><br>

For any type of comparison sort, nlgn is the worst case. Merge and heapsort are therefore asymptotically optimal.<br>
It suffices to determine the height of a decision tree in which each permutation appears as a reachable leaf. A tree of height h with l leaves corresponds to a comparison sort on n elements. As each of the $n!$ permutations of the input appears as some leaf, we have $n!\leq l$. In a Binary tree of height h, it has no more than $2^h$ leaves. <br>
$n! \leq l\leq 2^h$ <br>
when we take logs, $h \geq lg(n!) = \Omega(nlgn)$ <br>

Corollary: HeapSort and Merge sort are asymptotically optimal comparison sorts. as O(nlgn) upper bounds match $\Omega$(nlgn) worst case 

We then explore 3 sorting algorithms not based on comparison, and therefore will not have the nlgn lower bound applied to them. They run on Linear time.<br>
1. Radix Sort
2. Counting Sort
3. Bucket Sort

<b>Counting Sort:</b><br> 
Assumes each element is in range 0 to k. <br>
For each input element x, the number of elements less than x. Therefore knowing the exact position x should be. Eg. if there are 18 numbers less than x, x is 19. <br>

We require input array A[], array holding sorted output B[] and temp array C[]<br>
First for loop takes $\theta(k)$ time, 2nd for loop takes $\theta(n)$, the 3rd for loop takes $\theta (k)$ and the last loop takes $\theta(n)$<br>
In practice, we usually use counting sort when we have $k =O(n)$, in
which case the running time is $\Theta(n)$

- counting sort is stable
- this means that it does not matter the order of the input. elements side by side will break ties and follow a first in first out

In [29]:
def countSort(arr):
    max_element = int(max(arr))
    min_element = int(min(arr))
    element_span = max_element - min_element +1
    
    #Output Array B
    B = [0 for i in range(len(arr))]
    #Count array C to store individual position counts
    C = [0 for i in range(element_span) ]
    #store count of each array element 
    #This is done by taking the difference between the current value and minimum value
    for i in range(0,len(arr)):
        C[arr[i]-min_element]+=1
    #Modify count array to indicate position of previous object in output.
    #this is such that output will result in count decreasing by 1
    for i in range(1,len(C)):
        C[i]+= C[i-1] #add prev
    #count down across span of C, whereby its value(count of) will update relative position in B
    for i in range(len(arr)-1,-1,-1):
        B[C[arr[i]-min_element]-1] = arr[i]
        C[arr[i]-min_element]-=1
    for i in range(0,len(arr)):
        arr[i]=B[i]
    return arr
        

In [30]:
if __name__ =='__main__':
    arr = [-5, -10, 0, -3, 8, 5, -1, 10] 
    ans = countSort(arr) 
    print("Sorted character array is " + str(ans)) 

Sorted character array is [-10, -5, -3, -1, 0, 5, 8, 10]


In [12]:
#To visualize the counting process
arr = [1,2,3,4,5,1,2,3,4,2,4,5,4]
C = [0 for i in range(0,5)]
for i in range(0,len(arr)):
    C[arr[i]-1]+=1
    print(C)
    
for i in range(1,len(C)):
    C[i]+=C[i-1]
    print(C)

[1, 0, 0, 0, 0]
[1, 1, 0, 0, 0]
[1, 1, 1, 0, 0]
[1, 1, 1, 1, 0]
[1, 1, 1, 1, 1]
[2, 1, 1, 1, 1]
[2, 2, 1, 1, 1]
[2, 2, 2, 1, 1]
[2, 2, 2, 2, 1]
[2, 3, 2, 2, 1]
[2, 3, 2, 3, 1]
[2, 3, 2, 3, 2]
[2, 3, 2, 4, 2]
[2, 5, 2, 4, 2]
[2, 5, 7, 4, 2]
[2, 5, 7, 11, 2]
[2, 5, 7, 11, 13]


<b>Radix Sort:</b><br>
Radix sort solves sorting by sorting the least significant digit first, then the second least significant up intil all have been sorted. In order for it to work correctly, digit sorts must be stable.<br>

If Counting sort ranges from 1 to $n^2$, it will have a worst case of $O(n^2)$. Radix will use count sort as subroutine and start from least significant digit

- quicksort often uses hardware caches more effectively than radix sort
- if use counting sort as stable sort, will not sort in place.(takes up mem)
- A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input unsorted array. Some sorting algorithms are stable by nature like Insertion sort, Merge Sort, Bubble Sort. [quicksort and heapsort are unstable]
- radix sort takes $O(d*(n+b))$

<br>
Qn: Why does Radix Sort need to be stable<br>
Ans: <I> Integers that are equal in some digit aren't always equal (e.g., 12 and 13). Let's sort [13, 12] using a LSD-first radix sort (base 10). But we'll use an unstable sort for each digit.<br>

First we sort by the 1's place, so the array becomes [12, 13].
<br>
Now we sort by the 10's places, but 12 and 13 have the same digit. Since the sort is unstable, the resulting array could be [12, 13] or [13, 12]. We don't know.
<br>
If the sort were stable we would be guaranteed to get a correctly sorted array.</I>

In [41]:
def countSort1(arr, digit):
    #here we only need to count up to 10.
    
    #Output Array B
    B = [0 for i in range(len(arr))]
    
    #Count array C to store individual position counts
    C = [0 for i in range(10) ]
    
    #store count of each array element 
    #dont need to minus min because non negative.
    for i in range(0,len(arr)):
        index = (arr[i]/digit)
        C[int(index % 10)] +=1
        
    for i in range(1,10):
        C[i]+= C[i-1] #add prev
    
    for i in range(len(arr)-1,-1,-1):
        index = (arr[i]/digit)
        B[C[int(index % 10)]-1] = arr[i]
        C[int(index % 10)]-=1
          
    for i in range(0,len(arr)):
        arr[i]=B[i]
    return arr

In [39]:
def radixSort(arr):
    max_element = max(arr)
    digit =1 
    while max_element/digit >0:
        countSort1(arr,digit)
        digit *=10


In [46]:
if __name__=='__main__':
    arr = [170, 45, 75, 90, 802, 24, 2, 66] 
  
    # Function Call 
    radixSort(arr) 
      
    for i in range(len(arr)): 
        print(arr[i], end=" ") 

2 24 45 66 75 90 170 802 

<b>Bucket Sort</b><br>
Bucket sort assumes the input is a uniform distribution and has a average running time of $O(n)$. <br>
- Counting sort assumes input consists of integers in a small range
- Bucket sort assumes input is generated by a random process with uniform dist.
- It sorts into $n$ equal sized buckets and distrubutes them into them. 
- Put into bucket
- Go through bucket in order.
- sort accordingly


In [53]:
def insertionSort(arr):
    for i in range(1,len(arr)):
        #set as highest value
        key = arr[i]
        # Move elements of arr[0..i-1], that are 
        # greater than key, to one position ahead 
        # of their current position 
        j = i - 1
        while j >=0 and arr[j] > key:
            arr[j+1] = arr [j]
            j-=1
        arr[j+1]=key
    return arr

def bucketSort(arr):
    out = []
    #resolution of 0.1, 10 buckets
    slot_num = 10
    for i in range(slot_num):
        out.append([])
    #place arrays into the various buckets
    for j in arr:
        #make the float an integer and put into that integer bucket
        index_b = int(slot_num * j)
        out[index_b].append(j)
    # Sort individual buckets (within using insertion sort)
    for i in range(slot_num):
        out[i]=insertionSort(out[i])
    #concat result
    
    k=0
    for i in range(slot_num):
        for j in range(len(out[i])):
            arr[k] = out[i][j]
            k+=1
    return arr
        
    

In [54]:
x = [0.897, 0.565, 0.656, 
     0.1234, 0.665, 0.3434]  
print("Sorted Array is") 
print(bucketSort(x)) 

Sorted Array is
[0.1234, 0.3434, 0.565, 0.656, 0.665, 0.897]
