Description:
Reference implementation for IndexSort algorithm and compare the runtime with quick sort and the linear time counting sort and bin sort.
QuickSort: https://en.wikipedia.org/wiki/Quicksort
CountingSort: https://opendatastructures.org/ods-java/11_2_Counting_Sort_Radix_So.html
BinSort: https://opendsa-server.cs.vt.edu/ODSA/Books/CS3/html/BinSort.html
Index sort is an sorting algorithm what works on set of distinct natural numbers and can deliver better runtime performance than QuickSort, CountingSort and BinSort in many cases. This is not a generic sorting algorithm but works very effectively with positive unique integer values what is used in many business scenario. It is based on the bin (bucket) sorting algorithm but utilize the unique numbers in the sortable array.
How to run reference implementation:
Download the /bin/IndexSorter.jar
Call: java -jar IndexSorter.jar param1 param2 param3 param4
Parameters:
param1: number of samples to be generated (like 5)
param2: size of the array what will be sorted (like 100)
param3: range of the natural numeric numbers (like 1000)
param4: print statistic only and skip the array printouts. (like true)
Algorithm with pseudocode:
array a; // sortable array
array b; // helper array
n = length(a) // length of the sortable array
max = 0 // store the maximum value in the sortable array
min = infinite // store the minimum value in the sortable array
Loop i = 1 .. n { if a[i] > max then max = a[i]; if a[i] < min then min = a[i]; } // look for maximum and minimum
b = new array[max] // helper array creation
Loop i = 1 .. n { b[a[i]] = true } // set the true by value as index
pos = 0
Loop i = min .. max {if (b[i]) then a[pos++] = i} // rewrite the sortable array with position pointer
Algorithm execution by a sample
- Sample to sort [8, 5, 3, 7]
- Create helper boolean array where length of the array is the maximum number in the sample (8):
[false, false, false, false, false, false, false, false] - Use the number from input array as index and set true by this index in the helper array.
[false, false, true, false, true, false, true, true] - Loop over the helper array and write back to the index value to the original array left to right if the helper array contains true in the index position.
[3, 5, 7, 8]
Formal analysis:
Sort an array of n distinct elements, quicksort takes O( n log n ) time in best case and O ( n2 ) in worst case.
Index sorting needs three loops over the the array.
- Selecting the maximum and minimum element: n step.
- Setting the helper array value: n step.
- Writing back the index values to original: (max - min) step, where max and min are the maximum and minimum values in the array.
The time is always linear and O ( 2n + max ) in every case.
n = 1000 then QuickSort needs 3000 steps in best case. IndexSort needs 2000 + (maximum - minimum value) steps in every case.
CountingSort has O (n + max) what is comparable with IndexSort but IndexSort on random datasets shows better performance.
Statistics:
Index sort shows the best execution time in case of big array sizes. There is no big exection time difference in the small arrays.
Index sort delivers always better results than BinSort in any sample size
Number of Sample | Size of array | Number intervall 1..n | QuickSort Wins | IndexSort Wins | CountingSort Wins |
---|---|---|---|---|---|
100 | 10 | 100 | 81 | 16 | 3 |
100 | 10 | 100 | 92 | 6 | 2 |
100 | 10 | 100 | 89 | 8 | 3 |
100 | 100 | 100 | 38 | 55 | 7 |
100 | 100 | 100 | 50 | 45 | 5 |
100 | 100 | 100 | 46 | 50 | 4 |
100 | 100 | 1000 | 51 | 43 | 6 |
100 | 100 | 1000 | 46 | 44 | 10 |
100 | 100 | 1000 | 48 | 46 | 6 |
100 | 1000 | 1000 | 3 | 65 | 32 |
100 | 1000 | 1000 | 3 | 64 | 33 |
100 | 1000 | 1000 | 2 | 76 | 22 |
100 | 1000 | 10000 | 7 | 81 | 12 |
100 | 1000 | 10000 | 8 | 77 | 15 |
100 | 1000 | 10000 | 6 | 81 | 13 |
100 | 10000 | 10000 | 0 | 96 | 4 |
100 | 10000 | 10000 | 0 | 92 | 8 |
100 | 10000 | 10000 | 0 | 99 | 1 |
100 | 10000 | 100000 | 0 | 100 | 0 |
100 | 10000 | 100000 | 0 | 98 | 2 |
100 | 10000 | 100000 | 0 | 100 | 0 |
10 | 100000 | 1000000 | 0 | 10 | 0 |
Disadvantages and improvement oportunities
- It works only with distinct values in the array.
- Runtime depends on the distance between maximum and minimum value in the array.
- It can work with negative numbers with transforming the values to positive by adding the minimum value to each element.
- It can work with real numbers with transforming the values to integer whole number with multiplication by 10s.
- Better representation of the helper array to avoid big size array (memory usage and loop size). Current represenation is good for natural business numbers becasue the values in the helper array are 1 bits in memory only.