##                                                 Sorting

### What is sorting ?

1) arranging elements in a list or array according to some property. All elements in list have to be of same data type for sorting to happen
2) What are advantages of sorting
    1) Presentation : For example, a website might allow you to sort hotels by increasing price, or decreasing rating , or something like that
    2) Compute time : In an unsorted list, finding an element is O(n). In a sorted list, it is O(log n)
    3) Sorting cab be done on integers/float, strings (in lexicographical/like a dictionary), complex data types such as hotels in the example above
    
3) For illustration, typically, we take a list of integers and try to arrange in ascending order

### Types of sorting algorithms

many types - bubble sort, insertion sort, merge sort, quick sort, selection sort, heap sort, radix sort, counting sort

Sorting algorithms are classified based on 
1) Time complexity : Is it O(n**2), O(nlogn)m etc as a function of array size n
2) Space complexity : Is it O(1) (inplace sorting / constant memory) or is it a function of n?
3) Stability : Lets say you have cards 2 diamond, 6 clubs, 6 hearts, and 10 spades, and want to sort by rank/number ? either 6 hearts or 6 club can appear first A stable sorting algorithm preserves the original order in input list in the output after sorting given ties. For example, if 6 clubs is before 6 hearts in original list, even after sorting , a stable sorting algorithm will put 6 club instead of 6 hearts
4) Internal or external sort - 
    Internal sort : if all records are in memory/RAM
    External sort : Records can be on disk (better if you have a tonne of data)
5) Recursive vs non recursive
    Quick and merge sort are recursive
    insertion and selection sort are non recursive

![sorting_algos_complexity_comparison](./pictures/sorting_comparison_geeks_for_geeks.PNG "sorting comparison")  


## Selection sort

1) simplest, most intuitive sorting scenario
2) Pseudocode in words : at every step of loop i (i from 0 to n-1), scan all elements from i to n-1, find minimum, and swap the i element with the min. keep repeating. this ensures that the left side of the list keeps getting sorted in increasing order, and the right side has unsorted, till sorting ends
    i) Loop: i from over 0 to n-2 (n-2 since if all n-2 elements are sorted, n-1 is automatically sorted)
            {
                imin = i
                for j in i+1:n-1:
                    if A[j]<A[imin]:
                        imin = j
                temp = A[i]
                A[i] = A[imin]
                A[imin] = temp
              }
        
        take the mininum of list - move to new array
        replace minimum in original list as some very large integer to prevent counting it again
        repeat until all positions in new array are filled
    iii) in end, assign a to b

3) Classification :
    1) Slow - Time complexity : O(n**2), In fact even in best case its O(n**2)
    2) Space complexity : O(1) ## in place, no extra memory
    3) unstable - can change order of ties
    4) non-recursive

    

Example
1) Input : step 1 : [2,7,4,1,5,3]
2) Step 2 : [1,7,4,2,5,3]  (swap 2 in first position in prev step with minimum value in elements after it)
3) step 3 : [1,2,4,7,5,3]  (swap 7 in  second position in prev step with minimum value after it)...
4) step 4 : [1,2,3,7,5,4]
5) step 5 : [1,2,3,4,5,7]
6) step 6 : [1,2,3,4,5,7]

Done

At every step, note that the first part of the array keeps getting the lower elements in sorted order, and second part of array is unsorted

In [16]:
def selection_sort(nums):
    n = len(nums)
    print("step initial")
    print(nums)
    for i in range(0,n-1):
        print("step {0}".format(i))
        print(nums)

        index_min = i
        value = nums[i]
        for j in range(i+1,n):
            if nums[j]<value:
                index_min = j
                value = nums[j]
        temp = nums[i]
        nums[i] = value
        nums[index_min] = temp
    return nums
            

In [17]:
selection_sort([2,7,4,1,5,3])

step initial
[2, 7, 4, 1, 5, 3]
step 0
[2, 7, 4, 1, 5, 3]
step 1
[1, 7, 4, 2, 5, 3]
step 2
[1, 2, 4, 7, 5, 3]
step 3
[1, 2, 3, 7, 5, 4]
step 4
[1, 2, 3, 4, 5, 7]


[1, 2, 3, 4, 5, 7]

## Bubble sort

1) elements "bubble" from left to right, after every step, highest elements keep going to the right
2) Pseudocode in words : scan the element from left multiple times (like a bubble).when scanning array, compare element at adjacent position, if element at current position is > element at next position . swap two elements. rinse and repeat
3) 
    i) Loop: i from over 0 to n-1 
            {
                flag = 0
                for j in 0 to (n-k-1):  ## as all parts after n-k-1 is still sorted
                    {
                        if A[j]>A[j+1]:
                            swap(A[j],A[j+1])
                            flag = 1
                     }
                 if flag==0:
                     break (break out of outer loop if array is already sorted)
              }
        
        take the mininum of list - move to new array
        replace minimum in original list as some very large integer to prevent counting it again
        repeat until all positions in new array are filled
    iii) in end, assign a to b
    
4) Classification :
    1) Slow - Time complexity : worse case : O(n**2), average case : O(n**2), best case : O(n) (already sorted list)
    2) Space complexity : O(1) ## in place, no extra memory
    3) stable - does not change order of ties
    4) non-recursive


Example
1) Input : step 1 : [2,7,4,1,5,3]
2) Step 2 : [2,4,1,5,3,7]  (bubble 7 up)
3) step 3 : [2,1,4,3,5,7]  (bubble 4 up partly , then bubble 5 up)...
4) step 4 : [1,2,3,4,5,7]


Done


Note that at every step, the last part of the array is sorted , and keeps growing in reverse order

In [14]:
def bubble_sort(nums):
    n = len(nums)
    print("step initial")
    print(nums)
    for i in range(0,n-2):
        print("step {0}".format(i))
        flag = 0
        for j in range(0, n-i-1):
            if nums[j]>nums[j+1]:
                temp = nums[j]
                nums[j] = nums[j+1]
                nums[j+1] = temp
                flag=1
        print(nums)
        if flag==0: ## already sorted no swaps
            break
            

    return nums

## Insertion sort

1) think of list having two parts, first part sorted, second part unsorted. At every step, you pick first element from unsorted part and insert in right position in sorted part

2) Pseudocode in words : Start with first element (element in zeroth index) being in sorted part, all the other elements unsorted. One at a time , pick first element from unsorted part, and keep swapping with elements in sorted part till right position is acheived (don't exactly need to swap every single time till right position is reached, you just find the right position to insert, and insert the value in unsorted position in that position, while doing so, all other values are moved one position to the right)
3) 
    i) Loop: i from 1 to n-1 (since zeroth element is already "sorted" initially
            {
                value = A[i]
                hole_position = i
                while (A[hole_position-1]>value) and (hole_position >= 1):  ## if array is already sorted to left, this while loop is not entered into
                    temp = A[hole_position-1]
                    A[hole_position] = temp
                    hole_position = hole_position-1
                A[hole_position] = value
start with a "hole" position which is current index in loop, keep moving that whole position till right position is reached
4) Classification :
    1) Slow - Time complexity : worse case (reverse sorted): O(n**2), average case : O(n**2), best case : O(n) (already sorted list)
    2) Space complexity : O(1) ## in place, no extra memory
    3) stable - does not change order of ties
    4) non-recursive


even though this is also O(n**2)  theoretically, Practically, it is faster in general than bubble and selection sort as you don't do as many swaps and comparisons

Example
1) Input : step 1 : [2,7,4,1,5,3]
2) Step 2 : [2,7,4,1,5,3]  (hole in index 1, first two elements are already sorted, won't enter while loop, so no swaps)
3) step 3 : [2,4,7,1,5,3]  (hole at index 2, 4 inserted at right position)...
4) step 4 : [1,2,4,7,5,3] (hole at index 3, 1 moved till inserted at right position)
5) step 5 : [1,2,4,5,7,3]
6) step 6 : [1,2,3,4,5,7]


Done




At every step, note that similar to selection sort, the first part of the array keeps getting the lower elements in sorted order, and second part of array is unsorted. However the approach is different, as in selection sort we first find minimum of unsorted position and swap, in insertion we take first element of unsorted position (hole) and move it to right position

In [5]:
def insertion_sort(nums):
    n = len(nums)
    print("step initial")
    print(nums)
    for i in range(1,n):
        print("step {0}".format(i-1))
        value = nums[i]
        hole = i
        while ((hole>0) and (nums[hole-1]>value)):
            nums[hole] = nums[hole-1]
            hole = hole-1
        nums[hole] = value
        print(nums)
    return nums

In [6]:
insertion_sort([2,7,4,1,5,3])

step initial
[2, 7, 4, 1, 5, 3]
step 0
[2, 7, 4, 1, 5, 3]
step 1
[2, 4, 7, 1, 5, 3]
step 2
[1, 2, 4, 7, 5, 3]
step 3
[1, 2, 4, 5, 7, 3]
step 4
[1, 2, 3, 4, 5, 7]


[1, 2, 3, 4, 5, 7]

## Merge sort

1) Very different technique from the earlier three techniques we looked at  - selection, bubble and insertion sort which rely on swapping in different ways. This technique is a recursion and divide and conquer algorithm
2) Has best case, average case and worst case time complexity of O(nlogn), so much faster . Space complexity O(n), not in place
3) Intuition : recursive algorithm. Basic merge sort algorithm is to divide array into two equal halves (if length odd, one half will have one more element). Do merge sort of left half, merge sort of right half, and merge sorted left half and sorted right half. Note that if you have have two sorted left and right halves, merging can be done by pair wise comparison and is O(n)
4) Pseudocode in words : 
Divide array into two roughly equal halves, left and right. Call merge sort of left, merge sort of right, and then merge the sorted left and right halves. For the last step (the merge), start with 3 counters, one for left array i , one for right array j , and one for position in main array k . Compare left[i] with right[j], which ever is smaller, replace in main_array[k]. Increment k , increment i or j whichever was replaced. If one array gets over before the other, just put all elements of remaining array in the main_array
5) Pseudocode :
    Mergesort(A):
        n = len(A)
        if n<2:
            pass
        else:
            left = A[0:int(n/2)]   O(n) operation
            right = A[int(n/2):]
            left = Mergesort(left) - recursive call
            right = Mergesort(right) - recursive call
            merge(left, right,A) - O(n)
        return A
6) Time complexity : Let merge sort have time complexity T(n)
Therefore T(n) = c*n (for the left and right assignments and merge ) + 2*T(n/2) (recursion of right and left respectively)

T(n) = 2*T(n/2) + c*n
     = 2*(2T(n/4) + c*n/2) + c*n
     = 4*T(n/4) + 2*c*n
     = 8*T(n/8) + 3*c*n...
     = 2^k * T(n/(2**k)) + k*c*n
     
T(1) = c (as we immediately exit). Therefore k = log2(n) at this point when n=1
T(n) = 2^(log2(n)) + c + c*n*log2(n) = c1*log2(n) + nc + c = theta(nlogn) and also O(nlogn)

Therefore in both average and worse case, it is of order nlog2(n)

7) Space complexity : the left and right arrays are extra memories
Every step : we need size n. If we delete at end of every step, memory remains n.
Otherwise,if we don't clear after every step. memory becomes n*log(n) as there are log n steps

If we clear after every step, extra memory will be n+ (n/2) + (n/4)...
we will have log n terms, but even if we take till infinity, time will be n

4) Classification :
    1) Fast - Time complexity : worse case  O(nlogn), average case : O(nlogn), best case : O(nlogn) (already sorted list)
    2) Space complexity : Not inplace sorting algorithm unlike earlier sorting algorithms, since new array is created, O(n) ## needs creation of new left and right arrays every time
    3) stable - does not change order of ties
    4) recursive


5) 

Example
1) Input : step 1 : [2,7,4,1,5,3]
2) Step 2 :
    left array = [2,7,4]  
        call merge sort:
        left array = [2] - Done
        right array = [7,4]
            left array = [7]
            right array = [4]
            merge = [4,7]
        merge = [2,4,7]
    
    
    right array = [1,5,3]
        call merge sort:
        left array = [1] : Done
        right array = [5,3]
            left array = [5]
            right array = [3]
            merge = [3,5]
         merge = [1,3,5]
    merge = [1,2,3,4,5,7]


Done




Note recursion at every step

In [10]:
def merge(left, right, nums):
    i = 0
    j = 0
    k = 0
    while( (i <= len(left)-1) and (j <= len(right)-1)):
        if left[i]<=right[j]:
            nums[k] = left[i]
            i = i+1
        else:
            nums[k] = right[j]
            j = j+1
        k = k + 1
    while(i <= len(left)-1):  ## this is entered if right list gets finished before left
        nums[k] = left[i]
        i = i+1
        k = k+1
    while(j <= len(right)-1):  ## this is entered if left list gets finished before right
        nums[k] = right[j]
        j = j+1
        k = k+1    
    return nums
    

def merge_sort(nums):
    n = len(nums)
    print("step initial")
    print(nums)
    if n<=1:
        return nums
    else:
        left = nums[0:int(n/2)]
        right = nums[int(n/2):]
        left = merge_sort(left)
        right = merge_sort(right)
        nums = merge(left, right, nums)
        return nums
    


In [12]:
merge_sort([2,7,4,1,5,3,1])

step initial
[2, 7, 4, 1, 5, 3, 1]
step initial
[2, 7, 4]
step initial
[2]
step initial
[7, 4]
step initial
[7]
step initial
[4]
step initial
[1, 5, 3, 1]
step initial
[1, 5]
step initial
[1]
step initial
[5]
step initial
[3, 1]
step initial
[3]
step initial
[1]


[1, 1, 2, 3, 4, 5, 7]

## Quick sort

1) Recursive divide and conquer algorithm similar to merge sort. 
2) Average case time complexity is O(nlogn) similar to merge sort, worse case time complexity is O(n^2), worse than merge sort. However, it is in place, so space complexity is O(log n)
3) Using a randomized version of quick sort, the worse case solution can almost always be avoided
4) Most programming languages implement sort using quick sort (Note that python is an exception, uses Timsort, a hybrid of merge and insertion sort)
5) Pseudocode in words - at every iteration, pick a pivot (can be value at last index). and partition in place such that all elements less than pivot are in left , all elements greater than pivot in right . Take elements in left, pick pivot, repeat. similarly, pick element in right , take pivot repeat. till length of array is 1 after picking pivot. Since all element positions in parition are done in place, this is O(1) in space complexity
6) Pseudocode : 
    quicksort(A, start_index, end_index):
        if start_index >= end_index:
            return A ## terminating condition
        else:
            pivot_position = partition(A, start_index, end_index)
            quicksort(A, start_index, pivot_position-1) ## quicksort of left
            quicksort(A, pivot_position+1, end_index)
        return A



6) Time complexity : Let quick sort have time complexity T(n)
Therefore T(n) = c*n (partition function) + 2*T(n/2) (in best case, when pivot is the median of the array in every partition) if n>1. If n=1, (T(1) = c)  - This recurrence relation is same as the one we got for merge sort. This is O(nlogn)



T(n) = 2*T(n/2) + c*n
     = 2*(2T(n/4) + c*n/2) + c*n
     = 4*T(n/4) + 2*c*n
     = 8*T(n/8) + 3*c*n...
     = 2^k * T(n/(2**k)) + k*c*n
     
T(1) = c (as we immediately exit). Therefore k = log2(n) at this point when n=1
T(n) = 2^(log2(n)) + c + c*n*log2(n) = c1*log2(n) + nc + c = theta(nlogn) and also O(nlogn)

Therefore in best case, it is of order nlog2(n)

For worse case:

At every stage, pivot is picked as the largest or smallest element
T(n) = T(n-1) + c*n (n>1)
T(1) = c1

T(n) = T(n-2) + c*(n-1 + n-2)
     = T(n-3) + c*(n-1 + n-2 + n-3)....
      ~O(n**2)


For average case : partition can happen anywhere randomly at position i, so left half will have i-1 elements, right n-i
Take an expectation of this
T(1) = c1
T(n) = c*n (for partitions) + (1/n)*\sum_{i:0,n-1}(T(i-1) + T(n-i))

If we solve this, we get O(nlogn)


Space Complexity : on average case is O(logn), in worst case O(n). Why ? Each step/recursion in quick sort is O(1) (constant time). On average, since there are logn recursions, log(n)*c = log(n)

4) Classification :
    1) Fast - Time complexity : worse case  O(n**2), average case : O(nlogn), best case : O(nlogn) (already sorted list). Worse case can almost always be avoided by using randomized version of quicksort
    2) Space complexity :  inplace sorting algorithm , so O(1)
    3) unstable - does not change order of ties
    4) recursive

In [12]:
def partition_in_place(nums, start, end):
    pivot = nums[end]
    ## print("pivot", pivot)
    j = start
    for i in range(start, end):
        print(i, nums)
        print("j", j)
        if nums[i]<=pivot:
            
            temp = nums[j]
            nums[j] = nums[i]
            nums[i] = temp
            ## print("inside")
            ## print(nums)
            j = j + 1
        else:
            pass
        ## print(i, j)
    temp = nums[j]
    nums[j] = pivot
    nums[end] = temp
    return j
        
            
        

In [13]:
nums = [9,4,2,7]
aa = partition_in_place(nums,0,len(nums)-1 )
print(nums)

0 [9, 4, 2, 7]
j 0
1 [9, 4, 2, 7]
j 0
2 [4, 9, 2, 7]
j 1
[4, 2, 7, 9]


In [32]:
print(aa)

2


In [5]:
def quick_sort(nums, start_index, end_index):
    if start_index >= end_index:
        pass
    else:
        pivot_position = partition_in_place(nums, start_index, end_index) ## returns pivot position , and also partitions nums in place
        quick_sort(nums, start_index, pivot_position-1)
        quick_sort(nums, pivot_position+1, end_index)
        


In [38]:
nums = [2,7,4,1,5,3,1]
quick_sort(nums, 0, len(nums)-1)

In [39]:
nums

[1, 1, 2, 3, 4, 5, 7]

## Heap sort

2) Average case, best and worse case time complexity is O(nlogn) similar to merge sort, and quicksort . Building a heap is O(n) if heapify is used, if regular inserting one element at a time is used, it is O(nlogn). Once heap is built, deleting all elements is O(nlogn). So overall O(nlogn). Space complexity is O(1), everything done inplace, no additional memory blocks created.
3) Thus space complexity wise it is better than merge sort, and even quick sort, and time complexity wise, worse case is O(nlogn) which is comparable to merge sort and better than quick sort. However, in practice quick sort is usually prefered

4) Pseudocode in words -Two step. Step 1) is building a max heap using heapify. Step 2 is deleting elements from max heap . at the end you get a sorted array. 
6) Pseudocode : 

    heapify(A, n, i):
        max_index = i
        left_child_index = 2*i + 1
        right_child_index = 2*i+1
        
        if (left_child_index <= n) and (A[i] <= A[left_child_index]):
            max_index = left_child_index
        else:
            pass
            
        if (right_child_index <= n) and (A[max_index] <= A[right_child_index]):  ## note - here max_index can be i or     ## left_child_index depending on prev step
        
            max_index = right_child_index
        else:
            pass
        if max_index != i:
            swap(A[i], A[max_index])
        heapify(A, n, max_index)
            
    heapsort(A):
        n = len(A)
        
        ## the build max heap part
        for i in range(n/2,-1,1): ## starting with one level above leaf nodes (since leaf nodes don't need heapify to be applied)
            heapify(A,n,i)
            
        ## the delete part
        for i in  range(n,-1,2):
            swap(A[i],A[0])  
            heapify(A,i,1) ## heapify at first position, but only first i elements of array
         
        if start_index >= end_index:
            return A ## terminating condition
        else:
            pivot_position = partition(A, start_index, end_index)
            quicksort(A, start_index, pivot_position-1) ## quicksort of left
            quicksort(A, pivot_position+1, end_index)
        return A





4) Classification :
    1) Fast - Time complexity : worse case  O(nlogn), average case : O(nlogn), best case : O(nlogn) 
    2) Space complexity :  inplace sorting algorithm , so O(1)
    3) unstable - why ? during heapify, comparing parent with children, if both children have same value, either can be chosen
    4) not really recursive but uses heapify which is recursive

In [42]:
## Implementation - 0 indexing

def leftChild(i):
    return 2*i + 1
def rightChild(i):
    return 2*i + 2
def returnParent(i):
    if i%2 == 0: ##even
        return ( (i//2) - 1)
    else:
        return i//2

def returnIsLeaf(A, n, i):
    if leftChild(i)>n or rightChild(i)>n:
        return True
    else:
        return False
    
def swap(A, pos1, pos2):
    A[pos1], A[pos2] = A[pos2], A[pos1]
def maxHeapify(A, n, i):  ## heapify first n elements of A (using 1 indexing) at position i
    if returnIsLeaf(A, n, i)==False: ## confirm not a leaf node
        maximum = i
        if A[leftChild(i)]>A[maximum]:
            maximum = leftChild(i)
        if A[rightChild(i)]>A[maximum]:
            maximum = rightChild(i)     
        if maximum != i:
            swap(A, i, maximum)
            maxHeapify(A, n, maximum)
            
    

def heapSort(A):## assume 1 indexing, empty position at 0 to account for 0 position

    n = len(A)
    
    ##build max Heap
    for i in range(n//2 -1,-1,-1): ## n//2 because you start with non-leaf nodes
        maxHeapify(A, n, i)
    print(A)
    ## delete 1 by 1 from beginning
    for i in range(n-1,-1,-1):
        print(i)
        swap(A, i, 0)
        print(A)
        maxHeapify(A, i-1, 0)
        print(A)
        

    
    
        
    

In [43]:
nums = [2,7,4,1,5,3,1]
heapSort(nums)

[7, 5, 4, 1, 2, 3, 1]
6
[1, 5, 4, 1, 2, 3, 7]
[5, 2, 4, 1, 1, 3, 7]
5
[3, 2, 4, 1, 1, 5, 7]
[4, 2, 3, 1, 1, 5, 7]
4
[1, 2, 3, 1, 4, 5, 7]
[3, 2, 1, 1, 4, 5, 7]
3
[1, 2, 1, 3, 4, 5, 7]
[2, 1, 1, 3, 4, 5, 7]
2
[1, 1, 2, 3, 4, 5, 7]
[1, 1, 2, 3, 4, 5, 7]
1
[1, 1, 2, 3, 4, 5, 7]
[1, 1, 2, 3, 4, 5, 7]
0
[1, 1, 2, 3, 4, 5, 7]
[1, 1, 2, 3, 4, 5, 7]


## Count sort

1) Cannot be applied to all sort problems. If values in array are positive, and within a tight range (ie you have all values in list between 10 and 50 , for example, with duplicates allowed, then count sort is an O(n+k) time complexity, O(n+k) space complexity solution. where k is the number of unique values in the range (in the list above , this is 41)
2) Makes sense in cases where k <<n. If k >> n, lets say k is n**2 or higher, this is obviously an inefficient method, and other previous methods make sense
3) This method doesn't require any swapping , unlike all the methods seen above
4) Because it uses a temporary inplace array, it is not an inplace algorithm

5) Pseudocode in words -Two step. 
Step 1) In O(n) time, get count of each of the k unique values the list can take. (Count is stored in an array of length k, where position j is count of jth unique value in array - very similar to an ordered hash map)
Step 2) Do some arithmetic to get final array. basically since the count vector above is already in ascending order of keys, we just need to space element at position j in count array count(j) times in order to get final array
This is done by first getting the count vector, then getting a cumsum on the count vector. The cumsum value of a specific element indicates the end index - 1 of the specific element.
Step 3) .Create new array to store sorted value, same size as n (original array). go back to original array. start from last element j (this is to maintain stable sort). For the element, see what the index in the cumsum vector is, subtract 1, and put the element in new array at the index. Decrement j by 1, keep traversing till you end

Because you traverse arrays linearly (both the original array of length n and the new count array of length k, time complexity is O(n+k). Because you create temp arrays of length k, and new array of length n to store final sorted value, space complexity is also O(n+k)

6) Pseudocode : 
    countsort(A, k):
        ## step 1 - create count array
        count_array = [0]*k
        For i in range(n):
            count_array[A[i]] = count_array[A[i]] + 1
            
         ## step 2 - cum sum: (which gives end_index + 1 for every element in final sorted list)
         temp = 0
         for j in range(k):
             count_array[j] = count_array[j] + temp
             temp = count_array[j]
             
             
          ## step 3 - create new array of size n to contain sorted array and populate it
          b = [0]*n
          
          for i in range(reverse(n)):
              count_array[A[i]] = count_array[A[i]]-1
              b[count_array[A[i]]] = A[i]
              
              
           ## step 4 - you can do a final step where you copy all elements from sorted array b back to original array a
              
              
             
         
            

   



4) Classification :
    1) Fast if k << n- Time complexity : worse case  O(n+k), average case : O(n+k), best case : O(n+k) 
    2) Space complexity :  O(n+k)
    3) stable -
    4) non recursive, non swapping

Example
1) Input : step 1 : [2,7,4,1,5,3,1]
2) Step 1 : histogram : [0,2,1,1,1,1,0,1]  ## value at index i is count of element in original array.
3) step 2 : cumsum [0,2,3,4,5,6,6,7]  
4) step 3 : create sorted array - first create empty array of same size as input [0,0,0,0,0,0,0]
                start with last element of original array : which is 1
                Look at value at 1th index in cumsum which is 2. Subtract cumsum to get [0,1,3,4,5,6,6,7]
                At 1th index in new array, populate 1 = [0,1,0,0,0,0,0]
                
                Go to second last element of original array which is 3.
                Look at value at 3rd index in cumsum which is 4. Subtract 1 to cumsum at that position to get [0,1,3,3,5,6,6,7]
                At 3rd index of new array, populate 3 => [0,1,0,3,4,5,7] and so on
            
            
6) step 6 : [1,2,3,4,5,7]

https://www.geeksforgeeks.org/counting-sort/

In [1]:
def count_sort(A):
    k = max(A)+1
    n = len(A)
    
    ## step 1 - creating histogram of unique values
    count_array = [0]*k
    for i in range(n):
        count_array[A[i]] = count_array[A[i]] + 1
    
    ## step 2 - cumsum
    temp = 0
    for i in range(k):
        count_array[i] = count_array[i] +  temp
        temp = count_array[i]
        
    ## step 3 - using cumsum to get indices in original array
    sorted_array = [0]*n
    for i in range(n-1,-1,-1):
        count_array[A[i]] = count_array[A[i]]-1
        sorted_array[count_array[A[i]]] = A[i]
    return sorted_array
        
        
        

In [2]:
count_sort([2,7,4,1,5,3,1])

[1, 1, 2, 3, 4, 5, 7]

In [1]:
leftChild,3,4,5]

In [4]:
a = [None,1,2,3,4,5]

## Radix sort

1) Variation/Modification of count sort. Multi pass count sort, where each pass takes care of one significant digit
2) Therefore, time complexity is O(n*k), where k is the number of significant digits of the largest element element in list
3) Space complexity - O(n+k)
4) disadv - Makes assumptions that data is in a range of elements - either numbers or strings.can be applied only where the constituents of elements are either digits or letters. This is why methods like quicksort are preferred as they are more generic
5) This method doesn't require any swapping , so stable sort
6) Because it uses a temporary inplace array, it is not an inplace algorithm

5) Pseudocode in words -Two step. 
Step 1) Start with most least significant digit. Do count sort only on that position and sort all elements. (This requires initializing a list of length 10 - characters 0-9)
Step 2) Repeat until the largest significant digits are used

6) Code : 

    countSort(A, k, pos):
        
        ## create histogram
        count_array = [0]*k
        n = len(A)
        for i in range(n):
            digit = (A[i]//10**pos)%10  ## if i = 0 you want right most digit, so its just A remainder 10 , If i = 1, you want second digit from right, so you first chop the last digit, by taking quotient of A//10, and then take remainder
            
            count_array[digit] = count_array[digit] + 1
            
         ## create cumsum which gives end index + 1 of each position in sorted array
         temp = 0
         for i in range(k):
             count_array[i] = count_array[i] + temp
             temp = count_array[i]
             
         ## create new array of length n, and populate after iterating from reverse of original array
         sorted_array = [0]*n
         for i in range(n-1,-1,-1):
             digit =  (A[i]//10**pos)%10 
             count_array[digit] = count_array[digit]-1
             sorted_array[count_array[digit]] = A[i]
          
         ## copy finally back to original array and return
         for i in range(n):
             A[i] = sorted_array[i]
        return A
    radixsort(A, k):
        
        for i in range(0,max_significant_digit):
            countSort(A, k, i )  ## at each step, sort numbers in A based on countsort of that level's significant digit, starting with left to right

   
              
             
         
            
   



4) Classification :
    1) Fast if k << n- Time complexity : worse case  O(n+k), average case : O(n+k), best case : O(n+k) 
    2) Space complexity :  O(n+k)
    3) stable -
    4) non recursive, non swapping

In [21]:
def count_sort_radix(A, k, pos):
    n = len(A)
    
    ## step 1 - creating histogram of unique values
    count_array = [0]*k
    for i in range(n):
        digit = (A[i]//(10**pos))%10
        count_array[digit] = count_array[digit] + 1
    
    ## step 2 - cumsum
    temp = 0
    for i in range(k):
        count_array[i] = count_array[i] +  temp
        temp = count_array[i]
    ## step 3 - using cumsum to get indices in original array
    sorted_array = [0]*n
    for i in range(n-1,-1,-1):
        digit = (A[i]//(10**pos))%10
        count_array[digit] = count_array[digit]-1
        sorted_array[count_array[digit]] = A[i]
    return sorted_array


def radix_sort(A, k):
    max_pos = max([len(str(x)) for x in A])
    for i in range(0,max_pos):
        A = count_sort_radix(A, k, i)
    return A

In [22]:
A = [2,7,4,1,27,5,3,111,1]
radix_sort(A, 10)

[1, 1, 2, 3, 4, 5, 7, 27, 111]

In [20]:
A

[2, 7, 4, 1, 27, 5, 3, 111, 1]

### References

1) https://www.youtube.com/watch?v=pkkFqlG0Hds&list=PL2_aWCzGMAwKedT2KfDMB9YA5DgASZb3U&index=1
2) https://www.geeksforgeeks.org/analysis-of-different-sorting-techniques/
3) https://fullyunderstood.com/pseudocodes/heap-sort/#:~:text=Steps%20to%20perform%20heap%20sort%3A&text=Once%20the%20heap%20is%20ready,in%20the%20array%20are%20sorted.

In [25]:
def check(A):
    for i in range(len(A)):
        A[i] = 2

In [26]:
A = [1,2,3]

In [27]:
check(A)

In [28]:
A

[2, 2, 2]