# [CptS 215 Data Analytics Systems and Algorithms](https://github.com/gsprint23/cpts215)
[Washington State University](https://wsu.edu)

[Gina Sprint](http://eecs.wsu.edu/~gsprint/)
# Analysis of Sorting

Learner objectives for this lesson:
* Review sorting routines
* Perform algorithm analysis of sorting routines


## Acknowledgments
Content used in this lesson is based upon information in the following sources:
* No sources to report

## Review of Sorting Routines
Sorting is the process of organizing items in a collection into an ordered sequence. The order depends on the data type and the problem. For example, suppose we want to sort U.S. city populations from largest to smallest. Here the data type is integer and the problem dictates the integers are sorted in *descending* order. As another example, supposed we want to sort words in a document alphabetically. Here the data type is a string and the problem dictates the strings are sorted in *ascending* alphabetical order. We could also consider sorting the words in order by their frequency in the document, from the most frequently occurring words to the least frequently occurring words.

While there are many sorting routines, we are going to first review the most common sorting algorithms:
* Selection sort
* Bubble sort
* Insertion sort
* Shell sort
* Merge sort
* Quick sort
* Heap sort (to be learned when we cover heaps)

Note: [Here](https://visualgo.net/sorting) is a great site that visualizes the sorting algorithms.

For each algorithm, we will implement it in Python and analyze its time complexity for the average, best, and worst case scenarios. For analysis of sorting routines, we are going to measure and compare two main operations:
1. Number of comparisons
1. Number of data swaps

### Selection Sort
Big picture: *Select* the smallest item in the unsorted portion of the sequence and add it to the sorted portion of the sequence.

Algorithm:
1. Walk through each location `i` in the sequence
    1. Walk through each location `j` in the sub-sequence starting at the `i`th location and find the smallest value
        * Save the location of the smallest value
    1. Swap the data at the `i`th location in the sequence with the location of the smallest value
    
Python implementation using lists:

In [1]:
import numpy.random as rand

def selection_sort(array):
    '''
    
    '''
    for i in range(len(array)):
        smallest_index = i
        for j in range(i + 1, len(array)):
            if array[j] < array[smallest_index]:
                smallest_index = j
        temp = array[i]
        array[i] = array[smallest_index]
        array[smallest_index] = temp

data = rand.randn(20)
print(data)
selection_sort(data)
print(data)

[ 1.8146135  -0.36891753  0.09588123 -0.70163335 -0.74257802  1.06622815
 -1.91328379  0.17385816 -0.2903398   1.4942705  -1.9693279   1.39247707
 -1.81163403 -2.54225156  1.17809481 -1.63965553 -0.14151793 -0.50778633
  0.24872875  0.49631872]
[-2.54225156 -1.9693279  -1.91328379 -1.81163403 -1.63965553 -0.74257802
 -0.70163335 -0.50778633 -0.36891753 -0.2903398  -0.14151793  0.09588123
  0.17385816  0.24872875  0.49631872  1.06622815  1.17809481  1.39247707
  1.4942705   1.8146135 ]


#### Selection Sort Time Complexity
* Average case: $\mathcal{O}(n^{2})$
* Worst case: $\mathcal{O}(n^{2})$
* Best case (with early exit): $\Omega(n^{2})$

#### Practice Problem #1
Adapt the above algorithm to sort a linked list. First, define a `Node` class and a `LinkedList` class. Then, create a `LinkedList` with random data. Then, pass the `LinkedList` object into `selection_sort()` and sort the linked list.

### Bubble Sort
Big picture: Perform pairwise comparisons, swapping data such that the largest items in the unsorted portion of the sequence *bubble* to the end and form the sorted portion of the sequence.

Algorithm:
1. Walk through each location `i` in the sequence
    1. Walk through each location `j` in the sub-sequence starting at the `i`th location
        * If the data at the `j`th position is smaller than the data at the `j+1`th position
            * Swap the data
    
Python implementation using lists:

In [2]:
import numpy.random as rand

def bubble_sort(array):
    '''
    
    '''
    for i in range(len(array) - 1):
        for j in range(i, len(array)):
            if array[j + 1] < array[j]:
                temp = array[j]
                array[j + 1] = array[j]
                array[j] = temp

data = rand.randn(20)
print(data)
selection_sort(data)
print(data)

[-1.60831716 -0.49560813 -0.60907064 -0.8482408  -0.38105834 -0.14798523
  0.51624905 -0.20985352 -1.47816246  0.69571234  0.18621439 -0.26189422
 -0.73897571  0.31214107 -0.10836942 -0.93460077 -1.16597431  2.00540992
 -0.69423641 -1.79954617]
[-1.79954617 -1.60831716 -1.47816246 -1.16597431 -0.93460077 -0.8482408
 -0.73897571 -0.69423641 -0.60907064 -0.49560813 -0.38105834 -0.26189422
 -0.20985352 -0.14798523 -0.10836942  0.18621439  0.31214107  0.51624905
  0.69571234  2.00540992]


#### Bubble Sort Time Complexity
* Average case: $\mathcal{O}(n^{2})$
* Worst case: $\mathcal{O}(n^{2})$
* Best case (with early exit, see the practice problem): $\Omega(n)$

#### Practice Problem #2 
Improve the above algorithm so sorting stops once the array is sorted. 

Hint: Using the inner loop, how can you determine if the list is sorted?

### Insertion Sort
Big picture: Build the sorted portion of the sequence by *inserting* items into their sorted position in the sorted portion of the sequence.

Algorithm:
1. Assume the first location in the sequence is sorted
1. Walk through each location `i` in the sequence
    1. Walk through each location `j` in the sub-sequence starting at the `i`th location
        * If the data at the `j`th position is smaller than the data at the `j+1`th position
            * Swap the data
    
<img src="https://upload.wikimedia.org/wikipedia/commons/b/b1/Insertion-sort.svg" width="600">
(image from [https://upload.wikimedia.org/wikipedia/commons/b/b1/Insertion-sort.svg](https://upload.wikimedia.org/wikipedia/commons/b/b1/Insertion-sort.svg))
    
Python implementation using lists:

In [3]:
import numpy.random as rand

def insertion_sort(array):
    '''
    
    '''
    for i in range(1, len(array)):
        temp = array[i]
        for j in range(i - 1, -1, -1):
            if temp < array[j]:      
                array[j + 1] = array[j]
                array[j] = temp
                temp = array[j]
data = rand.randn(20)
print(data)
insertion_sort(data)
print(data)

[-0.16600798 -1.18790374  0.87574693 -1.86813251  0.82862806 -0.79820175
 -0.95699579  0.46773705  0.5026483   0.32423749 -0.08715077 -0.28315698
  0.47360401  0.41098634 -1.41405215  1.68399034 -0.95187858 -0.7847223
 -0.77511054  1.02798421]
[-1.86813251 -1.41405215 -1.18790374 -0.95699579 -0.95187858 -0.79820175
 -0.7847223  -0.77511054 -0.28315698 -0.16600798 -0.08715077  0.32423749
  0.41098634  0.46773705  0.47360401  0.5026483   0.82862806  0.87574693
  1.02798421  1.68399034]


#### Insertion Sort Time Complexity
* Average case: $\mathcal{O}(n^{2})$
* Worst case: $\mathcal{O}(n^{2})$
* Best case (with early exit): $\Omega(n)$

### Shell Sort
Big picture: Perform repeated insertion sorts by sorting every item in a gapped sequence (e.g. gapped sequences comprised of items at position 0, 1 * gap, 2 * gap; 1, 1 + 1 * gap, 1 + 2 * gap,...; 2, 2 + 1 * gap, 2 + 2 * gap, ...etc.) until the gap is 1 (the entire sequence).

Note: The choices for `gap` are important and highly researched (read more [here](https://en.wikipedia.org/wiki/Shellsort)). For our introductory implementation of shell sort, we will use powers of two. We will start with the largest gap (initialize `gap` to `n // 2`) and repeatedly divide `gap` by 2 until `gap` is 1. 

Algorithm:
1. Initialize `gap` to `n // 2`
1. While gap is greater than one
    1. Assume the first items in the sequence slice [0:gap] are sorted in gapped order (the first item in each gapped sequence is assumed sorted)
    1. For each gapped sequence
        1. Apply insertion sort
        
<img src="https://upload.wikimedia.org/wikipedia/commons/2/26/Shellsort.svg" width="600">
(image from [https://upload.wikimedia.org/wikipedia/commons/2/26/Shellsort.svg](https://upload.wikimedia.org/wikipedia/commons/2/26/Shellsort.svg))
        
For example, suppose the sequence to sort is the following:

|index|0|1|2|3|4|5|6|7|8|9|10|11|
|-|-|-|-|-|-|-|-|-|-|-|-|-|
|value|61|2|78|54|17|1|34|9|26|90|11|50|

The first pass, `gap = n // 2 = 12 // 2 = 6`. The slice [0:6] is sorted in gapped order. The gapped sequences are as follows:
1. Indices: 0, 6
    * (61, 34) sorted: (34, 61)
1. Indices: 1, 7
    * (2, 9) sorted: (2, 9)
1. Indices: 2, 8
    * (78, 26) sorted: (26, 78)
1. Indices: 3, 9
    * (54, 90) sorted: (54, 90)
1. Indices: 4, 10
    * (17, 11) sorted: (11, 17)
1. Indices: 5, 11
    * (1, 50) sorted (1, 50)
    
|index|0|1|2|3|4|5|6|7|8|9|10|11|
|-|-|-|-|-|-|-|-|-|-|-|-|-|
|value|34|2|26|54|11|1|61|9|78|90|17|50|

The second pass, `gap = gap // 2 = 6 // 2 = 3`. The gapped sequences are as follows:
1. Indices: 0, 3, 6, 9
    * (34, 54, 61, 90) sorted: (34, 54, 61, 90)
1. Indices: 1, 4, 7, 10
    * (2, 11, 9, 17) sorted: (2, 9, 11, 17)
1. Indices: 2, 5, 8, 11
    * (26, 1, 78, 50) sorted: (1, 26, 50, 78)

|index|0|1|2|3|4|5|6|7|8|9|10|11|
|-|-|-|-|-|-|-|-|-|-|-|-|-|
|value|34|2|1|54|9|26|61|11|50|90|17|78|

Python implementation using lists:

The second pass, `gap = gap // 2 = 3 // 2 = 1`. The gapped sequence is as follows:
1. Indices: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11

|index|0|1|2|3|4|5|6|7|8|9|10|11|
|-|-|-|-|-|-|-|-|-|-|-|-|-|
|value|1|2|9|11|17|26|38|50|54|61|78|90|

Python implementation using lists:

In [4]:
import numpy.random as rand

def shell_sort(array):
    '''
    
    '''
    gap = len(array) // 2
    while gap > 0:
        for i in range(gap, len(array)):
            temp = array[i]
            j = i
            while j >= gap and temp < array[j - gap]:   
                array[j] = array[j - gap]
                j -= gap
            array[j] = temp
        gap //= 2
data = rand.randn(20) #[61,2,78,54,17,1,34,9,26,90,11,50]
print(data)
shell_sort(data)
print(data)

[-0.0876752   0.28055368  0.36811016 -0.52886688 -0.0296252   0.73190635
  1.66354357  0.20394456 -1.19194637 -0.01126536 -1.52877352  0.58463557
  0.32968584 -1.62408656  1.26625469  1.11043323  0.73527598  0.33374931
 -0.33161215 -0.29119189]
[-1.62408656 -1.52877352 -1.19194637 -0.52886688 -0.33161215 -0.29119189
 -0.0876752  -0.0296252  -0.01126536  0.20394456  0.28055368  0.32968584
  0.33374931  0.36811016  0.58463557  0.73190635  0.73527598  1.11043323
  1.26625469  1.66354357]


#### Shell Sort Time Complexity
Note: repetitive division by two is represented by the mathematical function $log_{2} n$. The loop iterates $log_{2} n$ because of `gap //= 2`

* Average case: $\mathcal{O}(n log_{2}^{2} n)$
* Worst case: $\mathcal{O}(n log_{2}^{2} n)$
* Best case: $\Omega(n log_{2} n)$

## Practice Problems

### 1
For the following list, 34 50 25 16 60 82 76 5 25, walk through the following algorithms and show the state of the list at each pass:
* Selection sort
* Bubble sort
* Insertion sort

The following questions refer to the code above for Shell sort:

### 2
Show what the array below will look like after the first two passes through the outer loop of shell sort (this means show the array after it is sorted with the initial gap and then show the array after it is sorted with the second gap).

Assume an initial call is as follows: `shell_sort(a_list)`

|Initial list|2|90|50|80|10|5|75|0|21|60|78|30|40|
|-|
|After initial gap (gap is: )||||||||||||||
|After second gap (gap is: )||||||||||||||
|...||||||||||||||
|Final (after gap of 1)|||||||||||||||

Answer: 

|Initial list|2|90|50|80|10|5|75|0|21|60|78|30|40|
|-|
|After initial gap (gap is: )|2|0|21|60|10|5|40|90|50|80|78|30|15|
|After second gap (gap is: )|2|0|5|40|10|21|60|78|30|75|90|50|80|
|...||||||||||||||
|Final (after gap of 1)|0|2|5|10|21|30|40|50|60|75|78|80|90|

### 3
In `shell_sort()` above, do the following:
1. Circle all loop control assignments
1. Double circle all loop control comparisons
1. Place a square around all data assignments
1. Place a double square around all data comparisons

### 4
Implement insertion sort using recursion.

### 5
Which quadratic sort's performance is least affected by the ordering of the array elements? Which is most affected?

### Stable vs. Unstable Sorting
Stable sorting algorithms retain the order of equal value items from input (unsorted) to output (sorted). Unstable sorting algorithms do not guarantee the final ordering for equivalent items. We will return to this topic when we visit priority queues (heap sort).