# visualgorithm: a sorting machine problem
---
*submitted by Goldy Mariz Lunesa || [@gmlunesa](https://github.com/gmlunesa)*


## O($n^{2}$)  algorithms

Count the number of comparisons (selection vs bubble vs insertion)

- Best case comparison (pre sorted sequences)

- Worst case comparison (reverse sorted sequences)

- Average case comparison (random numbers)

### Selection Sort

Selection sort works by finding the number in the first position of $A'$, $a'_{0}$, which is eqivalent to the smallest number in $A$, min($A$). It then finds the number to be placed in the $a'_{1}$ which is eqivalent to the smallest number in $A$ excluding $a'_{0}$ which is equivalent to min($A$ − [$a'_{0}$]). The next number, $a'_{2}$ ( is equivalent to min($A$ − [$a'_{0}$, $a'_{1}$]). Selection sort does this until the value for $a'_{n}$ is found.


In [328]:
def initArrays():
    with open('on2bestcase.txt') as file_bc:
        bestCaseArray = []
        for line in file_bc:
            bestCaseArray.append(int(line))
        
    with open('on2worstcase.txt') as file_wc:
        worstCaseArray = []
        for line in file_wc:
            worstCaseArray.append(int(line))

    with open('on2avgcase.txt') as file_ac:
        avgCaseArray = []
        for line in file_ac:
            avgCaseArray.append(int(line))
            
    return bestCaseArray, worstCaseArray, avgCaseArray

In [329]:
dataArrays = initArrays()

def selection_sort(data):
    count = 0
    for index in range(len(data)):
        min = index
        count += 1
        for scan in range(index + 1, len(data)):
            if (data[scan] < data[min]):
                min = scan
        if min != index:
            data[index], data[min] = data[min], data[index]
    return count, data

print("Best Case: %s",  selection_sort(dataArrays[0])[0])
print("Worst Case:  %s", selection_sort(dataArrays[1])[0]) 
print("Average Case: %s", selection_sort(dataArrays[2])[0])

Best Case: %s 300
Worst Case:  %s 300
Average Case: %s 300


---
### Insertion Sort

Insertion sort works by partially sorting the elements of the sequence. It
starts by sorting the first 2 elements of the sequence, inserting the $a '_{1}$ to
the first position if $a '_{1}$ < $a '_{0}$. It then sorts the first 3 elements of the
sequence by inserting $a '_{2}$ to its correct spot in then sorted 2 elements. It
then sorts the first 4 elements by inserting the $a '_{3}$ into its correct spot.
The algorithm does this $n$ − 1 number of times.



In [330]:
dataArrays = initArrays()

# Recursive implementation of insertion sort
def insertion_sort(data):
    count = 0
    for index in range(1, len(data)):
        while 0 < index and data[index] < data[index - 1]:
            count += 1
            data[index], data[
                index - 1] = data[index - 1], data[index]
            index -= 1

    return count, data



print("Best Case: %s" % insertion_sort(dataArrays[0])[0])
print("Worst Case:  %s" % insertion_sort(dataArrays[1])[0]) 
print("Average Case: %s" % insertion_sort(dataArrays[2])[0])

Best Case: 0
Worst Case:  44850
Average Case: 20943


---
### Bubble Sort

Bubble sort works by comparing every pair of adjacent elements, $a_{i}$ and
$a_{i+1}$. If the elements are in the wrong order, (i.e. $a_{i}$ > $a_{i+1}$) the elements
are swapped. The algorithm performs this until all elements are in their
right position.

In [331]:
dataArrays = initArrays()

def bubble_sort(data):
    count = 0
    while True:
        swapped = False
        for index in range(1, len(data)):
            count += 1
            if data[index-1] > data[index]:
                data[index-1], data[index] = data[index], data[index-1]
                swapped = True
        if not swapped:
            break
    return count, data

print("Best Case: %s" % bubble_sort(dataArrays[0])[0])
print("Worst Case:  %s" % bubble_sort(dataArrays[1])[0]) 
print("Average Case: %s" % bubble_sort(dataArrays[2])[0])


Best Case: 299
Worst Case:  89700
Average Case: 83720


## Shell Sort optimization


### Shell Sort

Shell sort is the generalization of either the insertion sort algorithm, (or
the bubble sort algorithm, but almost always, insertion sort is the
chosen subroutine as discussed in the previous section). This algorithm
works by dividing the input sequence into $g$ interleaved subsequences
where each element in the sublist is separated by some gap $g$. The
algorithm individually sorts these subsequences using either insertion
sort or bubble sort. After sorting the subsequence, the procedure
repeats with a reduced value for $g$, decreasing the amount of
subsequences. The algorithm stops after the algorithm is sorted with
$g$ = 1 (normal insertion or bubble sort).

#### Shell Sort with Insertion Sort subroutine

In [332]:
dataArrays = initArrays()

# In this Shell sort with insertion sort subroutine,
# the gaps are determined by dividing the length of array by 2
def shell_sort_is(data):
    comparisons = 0
    comparisonSumSS = 0
    swapSumSS = 0
    gap = len(data) // 2
   
    while gap > 0:
        # Subroutine call
        insertionSortResult = insertion_subroutine(data, gap)
        comparisonSumSS = comparisonSumSS + insertionSortResult[1]
        swapSumSS = swapSumSS + insertionSortResult[2]
        
        gap //= 2
        
    return data, comparisonSumSS, swapSumSS


def insertion_subroutine(data, gap):
    comparisons = 0
    comparisonSumIS = 0
    
    swaps = 0
    swapSumIS = 0
    
    for index in range(gap, len(data)):
        comparisons = 0
        swaps = 0;
        
        while 0 < index and data[index] < data[index - gap]:
            comparisons += 1
            # Do the swap!
            data[index], data[
                index - gap] = data[index - gap], data[index]
            swaps += 1
            index -= gap
            
        comparisons += 1
        comparisonSumIS += comparisons
        
        swapSumIS += swaps
       
    return data, comparisonSumIS, swapSumIS

presortedResultsSSIS = shell_sort_is(dataArrays[0]);
reverseResultsSSIS = shell_sort_is(dataArrays[1]);
randomResultsSSIS = shell_sort_is(dataArrays[2]);

print("Presorted; Comparisons: %s ; Swaps: %s ;" %(presortedResultsSSIS[1], presortedResultsSSIS[2]))
print("Reverse sorted; Comparisons: %s ; Swaps: %s ;" %(reverseResultsSSIS[1], reverseResultsSSIS[2]))
print("Random Order; Comparisons: %s ; Swaps: %s ;" %(randomResultsSSIS[1], randomResultsSSIS[2]))


Presorted; Comparisons: 2104 ; Swaps: 0 ;
Reverse sorted; Comparisons: 8926 ; Swaps: 6822 ;
Random Order; Comparisons: 8795 ; Swaps: 6691 ;


#### Shell Sort with Shell's Gap Sequence

Donald Shell proposed a sequence that follows the formula FLOOR($\frac{N}{2^k}$), published in his paper *A High-Speed Sorting Procedure*, published in 1959. In our test case, the generated sequence is [150, 75, 37, 18, 9, 4, 2, 1].

In [333]:
dataArrays = initArrays();

gapSequence =  [150, 75, 37, 18, 9, 4, 2, 1]

def shell_sort_sgs(data):
    comparisons = 0
    comparisonSumSS = 0
    swapSumSS = 0
   
    for gap in gapSequence:
        # Subroutine call
        insertionSortResult = insertion_subroutine(data, gap)
        comparisonSumSS = comparisonSumSS + insertionSortResult[1]
        swapSumSS = swapSumSS + insertionSortResult[2]
        
        gap //= 2
        
    return data, comparisonSumSS, swapSumSS


presortedResultsSSShellGap = shell_sort_sgs(dataArrays[0]);
reverseResultsSSShellGap = shell_sort_sgs(dataArrays[1]);
randomResultsSSShellGap = shell_sort_sgs(dataArrays[2]);

print("Presorted; Comparisons: %s ; Swaps: %s ;" %(presortedResultsSSShellGap[1], presortedResultsSSShellGap[2]))
print("Reverse sorted; Comparisons: %s ; Swaps: %s ;" %(reverseResultsSSShellGap[1], reverseResultsSSShellGap[2]))
print("Random Order; Comparisons: %s ; Swaps: %s ;" %(randomResultsSSShellGap[1], randomResultsSSShellGap[2]))



Presorted; Comparisons: 2104 ; Swaps: 0 ;
Reverse sorted; Comparisons: 8926 ; Swaps: 6822 ;
Random Order; Comparisons: 8795 ; Swaps: 6691 ;
