Bucket Sort

It is not as popular or widely used as the previous algorithms we have covered. Bucket sort works well when the dataset to be sorted has values within a specific range.

Concept
Imagine we have an array of size 6 and it contains values of an inclusive range of 0âˆ’2. The idea behind bucket sort is to create a "bucket" for each one of the numbers and map them to their respective buckets.

There will be a bucket for 0, 1 and 2. This bucket, which is just a position in a specified array will contain the frequencies of each one of the values within the range. For the sake of this example, we only have three values and accordingly we will have three buckets.

Once each one of the buckets is filled with the frequency of each one of the values, we will overwrite all the values in the original array such that they end up in the sorted order.


Time Complexity

You may be looking at the nested for loop and immediately going, that is O(n^2). That is not quite right. Let's do some analysis. We know that for the first for loop, we are performing n steps since we are going through all the elements and counting frequency.

The first for loop will run n times where n is the length of the counts array. However, the inner loop will only run until counts[n], which is a different everytime. The first time it will be 2, then 1 and then 3. Therefore, our algorithm belongs belongs to O(n).

Stability

Since we are overwriting the original array, there is no way to preserve the relative order of the values. There is no swapping that takes place either. Hence, it will stay unstable.

In [None]:
def bucketSort(arr):
    # Assuming arr only contains 0, 1 or 2
    counts = [0, 0, 0]

    # Count the quantity of each val in arr
    for n in arr:
        counts[n] += 1
    
    # Fill each bucket in the original array
    i = 0
    for n in range(len(counts)):
        for j in range(counts[n]):
            arr[i] = n
            i += 1
    return arr

So while the bucket sort algorithm runs in O(n) time, we must remember that it will only work if the dataset is within a specified range.

Generally, with algorithmic problems, the safest bet is making use of merge sort, or quick sort.

Time Complexity: 

Algorithm | Big - O Time | Notes
Insertion | O(n^2)*      | If fully, or nearly sorted, O(n)
Merge     | O(n log n)   | 
Quick     | O(n log n)*  | In worst case it is O(n^2)
Bucket    | O(n)*        | Assuming all values in the input are in a specified range.