The quick sort algorithm for sorting arrays proceeds recursively--it selects an element (the "pivot"), reorders the array to make all the elements less than or equal to the pivot appear first, followed by all the elements greater than the pivot. The two subarrays are then sorted recursively.

Implemented naively quick sort has large run times and deep function calls stacks on arrays with many duplicates because the subarrays may differ greatly in size. One solution is to reorder the array so that all elements less than the pivot appear first, followed by all elements equal to the pivot, followed by all elements greater than the pivot. This is known as Dutch flag partioning, because the Dutch national flag consists of three horizontal bands, each in different color. 

**problem:** write a program that takes an array A and an index i into A, and rearranges the elements such all elements less than A[i] (the pivot) appear first, followed by elements equal to the pivot, followed by elements greater than the pivot

The problem is trivial to solve with O(n) additional space store all the smaller elements, those equal to the pivot and those greater. We can avoid O(n) additional space but O(n^2)

To improve the time complexity, we can make a single pass and move all the elements less than the pivot to the beginning. In the second pass we move larger elements to the end. It's easy to perform each pass in a single iteration, moving out of place elements as soon as they are discovered.

First let's implement the naive solution

In [29]:
def swap(i,j,arr):
    item1=arr[i]
    item2=arr[j]
    arr[j],arr[i]=item1,item2

def dutchFlagPartition(pivot_index,arr):
    pivot=arr[pivot_index]
    
    #move all the smaller elements to the start of the array
    for i in range(len(arr)):
        for j in range(i+1,len(arr)):
            if(arr[j]<pivot):
                swap(i,j,arr)
                print(arr)
                break
    
    #second pass move all the larger elements to the end of array
    for i in range(len(arr)-1,-1,-1):
        for j in range(i-1,-1,-1):
            if(arr[j]>pivot):
                swap(i,j,arr)
                print(arr)
                break
        if(arr[j]<pivot):
            break
        

In [30]:
arr=[6,21,3,4,5,9,20]
dutchFlagPartition(3,arr)

[3, 21, 6, 4, 5, 9, 20]
[3, 21, 6, 4, 5, 20, 9]
[3, 21, 6, 4, 20, 5, 9]
[3, 21, 20, 4, 6, 5, 9]
[3, 21, 4, 20, 6, 5, 9]
[3, 4, 21, 20, 6, 5, 9]


That works but its O(n^2) complexity

The additional space complexity is now O(1) but the time complexity is O(n^2). Intuitively this approach is bad because in the first pass when searching for each additional element smaller than the pivot we start from the beginning. However there is no reason to start from so far back, we can begin from the last loaction we advanced to. To improve time complexity we can make a single pass and move all the elements less than pivot to the begininng, in the second pass we move larger elements to the end, it's easy to perform each pass in a singel iteration, moving out of place elements as soon as they are discovered.

In [45]:
def swap(i,j,arr):
    item1=arr[i]
    item2=arr[j]
    arr[j],arr[i]=item1,item2

def dutchFlagPartitionDoublePass(pivot_index,arr):
    pivot=arr[pivot_index]
    smaller=0
    
    #move all the smaller elements to the beginning of the array
    for i in range(len(arr)):
        if(arr[i]<pivot):
            swap(i,smaller,arr)
            smaller+=1
            
    #move all the larger elements to the end of the array
    larger=len(arr)-1
    for j in range(len(arr)-1,-1,-1):
        if(arr[j]>pivot):
            swap(j,larger,arr)
            larger-=1
        

In [46]:
arr=[6,21,3,4,5,9,20]
dutchFlagPartitionDoublePass(3,arr)
arr

[3, 4, 21, 6, 5, 9, 20]

The algorithm we jsut presented is O(n) and space complexity O(1)

The third algorithm we present now is the same time complexity but performs a single pass. We dothis by maintaining for subarrays: bottom (elements less than the pivot), middle (equal to pivot), unclassified and top (elements greater than the pivot). Initially all elements are unclassified. We iterate and move elements intoone of bottom middle ad top groups accordingly to the relative order between the unclassifed element and the pivot.

Suppose the array is <-3,0,-1,1,1,?,?,?,4,2> where the pivot is 1 and ? is the unclassified element. There are three possible cases for the unclassified element A[5].

It could be less than 1 so for example a[5]=-5. In this case we swap it with the first element that's equal to the pivot. So the array becomes <-3,0,-1,-5,1,1,?,?,4,2>

It could be equal to the pivot in this case the array remains unchanged. 

It could be greater than 1 for example A[5]=6 and in that case we swap it with the farthest right unclassified element. So the array becoms <-3,0,-1,1,1,?,?,6,4,2>

If you notice every time the number of unclassifed elements decreases. 

In [3]:
def swap(i,j,arr):
    item1=arr[i]
    item2=arr[j]
    arr[j],arr[i]=item1,item2


def dutchFlagPartitionSinglePass(pivot_index,arr):
    pivot=arr[pivot_index]
    smaller=0
    equal=0
    larger=len(arr)
    
    """The following invariants must be true during partitioning:
        bottom group: Arr[0: smaller-1]
        middle group: Arr[smaller:equal-1]
        unclassified: Arr[equal:larger-1]
        top group: Arr[larger: A.size()-1]"""
    
    while(equal<larger):
        if(arr[equal]<pivot):
            swap(equal,smaller,arr)
            smaller+=1
            equal+=1
        elif(arr[equal]==pivot):
            equal+=1
        elif(arr[equal]>pivot):
            swap(larger-1,equal,arr)
            larger-=1
            


In [72]:
arr=[6,21,3,4,5,9,20]
dutchFlagPartitionSinglePass(5,arr)
arr

[6, 3, 4, 5, 9, 20, 21]

In [73]:
arr=[2,3,2,2,3,2,3,3]
dutchFlagPartitionSinglePass(1,arr)
arr

[2, 2, 2, 2, 3, 3, 3, 3]

Variant: Assuming that keys take one of three values, reorder the array so that all objects with the same key appear together. The order of the subarrays is not important. 

I think the solutions is to pick the midpoint of the keys. If only two appear choose either and use the single pass partition

In [4]:



def select_pivot(arr):
    dicta={}
    for i in range(len(arr)):
        if arr[i] in dicta.keys():
            continue
        else:
            dicta[arr[i]]=1
        if len(dicta.keys())==3:
            break
    return sorted(dicta.keys())

arr1=np.random.randint(1,4,(100,))
print("before:",arr1)
dutchFlagPartitionSinglePass(select_pivot(arr1)[1],arr1)
print("after: ", arr1)

('before:', array([2, 1, 3, 3, 3, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 3, 2,
       1, 2, 2, 3, 1, 2, 3, 3, 2, 3, 1, 2, 1, 3, 2, 3, 1, 3, 2, 2, 3, 1,
       1, 3, 2, 1, 1, 3, 2, 1, 3, 1, 3, 2, 3, 3, 2, 1, 2, 1, 1, 3, 3, 1,
       3, 2, 3, 1, 1, 1, 2, 2, 3, 2, 2, 2, 2, 3, 2, 2, 3, 2, 2, 1, 1, 2,
       2, 3, 3, 2, 3, 1, 2, 3, 1, 2, 2, 1]))
('after: ', array([2, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1,
       2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 1,
       1, 1, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1,
       2, 1, 2, 2, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]))


In [97]:
# Function to sort array 
def sort012( a, arr_size): 
    lo = 0
    hi = arr_size - 1
    mid = 0
    while mid <= hi: 
        if a[mid] == 0: 
            a[lo], a[mid] = a[mid], a[lo] 
            lo = lo + 1
            mid = mid + 1
        elif a[mid] == 1: 
            mid = mid + 1
        else: 
            a[mid], a[hi] = a[hi], a[mid]  
            hi = hi - 1
    return a 

arr1=np.random.randint(0,3,(100,))
print("before:",arr1)
sort012(arr1,len(arr1))
print("after: ", arr1)

('before:', array([2, 1, 1, 2, 2, 2, 1, 2, 0, 1, 1, 0, 1, 0, 2, 1, 2, 2, 0, 1, 2, 0,
       0, 2, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 2, 1, 1, 1, 1, 2,
       0, 2, 0, 1, 0, 2, 2, 1, 1, 0, 2, 1, 1, 0, 0, 0, 2, 0, 1, 2, 2, 2,
       2, 1, 0, 1, 1, 0, 2, 2, 2, 2, 1, 0, 1, 1, 0, 0, 2, 2, 2, 1, 1, 2,
       1, 1, 1, 0, 0, 2, 2, 0, 0, 1, 1, 2]))
('after: ', array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]))


In [None]:
#Dutch partition an array with four distinct keys

In [9]:
arr1=np.random.randint(0,4,(10,))
arr1

array([0, 3, 1, 2, 1, 3, 1, 1, 3, 0])

In [10]:
dutchFlagPartitionSinglePass(8,arr1)
arr1

array([0, 1, 2, 1, 1, 1, 0, 3, 3, 3])

Given an array with four values and then n values, reorder the array so that all objects of the same key appear together. Use O(1) additional space and O(n) additional time.

In [16]:
import numpy as np
def swap(i,j,arr):
    if(i!=j):
        item1=arr[i]
        item2=arr[j]
        arr[j],arr[i]=item1,item2

def BruteForce(num_keys,arr):
    smallest=0
    for i in range(num_keys):
        for j in range(smallest,len(arr)):
            if(arr[j]==i):
                swap(j,smallest,arr)
                smallest+=1

In [1]:
import numpy as np
arr3=np.random.randint(0,4,(10,))
print(arr3)
BruteForce(4,arr3)
arr3

[2 1 1 1 2 3 1 3 1 0]


NameError: name 'BruteForce' is not defined

This has a worst time execution time of O(k*n) where k is the number of distinct keys

Variant given an Array of n objects with boolean valued keys, reorder the array so that objects that have the key apper first. The relative ordering of objects with key true

In [27]:
arr_len=10
arr4=[]
for i in range(arr_len):
    ar1=[np.random.randint(0,100),np.random.randint(0,2)]
    arr4.append(ar1)
arr4=np.array(arr4)
arr4

array([[12,  0],
       [58,  1],
       [77,  1],
       [90,  1],
       [49,  0],
       [19,  1],
       [21,  0],
       [71,  1],
       [76,  1],
       [44,  1]])

In [26]:
def swap_two(i,j,arr):
    if(i!=j):
        item1=np.copy(arr[i])
        item2=np.copy(arr[j])
        arr[i,:]=item2
        arr[j,:]=item1

def order_boolean(arr):
    largest=len(arr)-1
    for i in range(len(arr)-1,-1,-1):
        pair=arr[i]
        pair_bool=pair[1]
        if(pair_bool==1):
            swap_two(i,largest,arr)
            largest-=1
            
        

In [20]:

print(arr4)

swap_two(0,4,arr4)
arr4

[[62  1]
 [ 0  1]
 [60  0]
 [ 2  1]
 [65  0]
 [87  0]
 [95  0]
 [29  1]
 [83  0]
 [22  1]]
-----


array([[65,  0],
       [ 0,  1],
       [60,  0],
       [ 2,  1],
       [62,  1],
       [87,  0],
       [95,  0],
       [29,  1],
       [83,  0],
       [22,  1]])

In [28]:
print(arr4)
print('-----')
order_boolean(arr4)
print(arr4)

[[12  0]
 [58  1]
 [77  1]
 [90  1]
 [49  0]
 [19  1]
 [21  0]
 [71  1]
 [76  1]
 [44  1]]
-----
[[12  0]
 [21  0]
 [49  0]
 [58  1]
 [77  1]
 [90  1]
 [19  1]
 [71  1]
 [76  1]
 [44  1]]
