# Sorts

In this notebook, we will talk about all types of sorting, other than merge sort and quick sort.

### Sorting a Sequence 💎

Sorting is crucial. If something is sorted, it is really easier to index it.

## Heap Sort - `using heaps!`

Here is an example with **heap sort** using the Standard Library:

In [1]:
import heapq

def heap_sort(my_list):
	"""Function to perform the sorting using heap sort"""
	heapq.heapify(my_list)  # the list is now a heap - O(n) time
	result = []
	while my_list:
		result.append(heapq.heappop(my_list)) # n times o(log(n))
	return result

col = [60, 20, 40, 70, 30, 10]
print("Input Array: ", col) # Input list: [60, 20, 40, 70, 30, 10]
print("Sorted Array: ", heap_sort(col)) # Sorted list: [10, 20, 30, 40, 60, 70]

Input Array:  [60, 20, 40, 70, 30, 10]
Sorted Array:  [10, 20, 30, 40, 60, 70]


## Insertion Sort `start from 1 - slap city` 🎩

We study several sorting algorithms in this book, most of which are described in Chapter 12. 

As a warm-up, in this section we describe a nice, simple sorting algorithm known as insertion-sort.

- We start with the first element in the array. One element by itself is already sorted. 

- Then we consider the next element in the array. If it is smaller than the first, we swap them. 

- Next we consider the third element in the array. We swap it leftward until it is in its proper order with the first two elements. 

- We then consider the fourth element, and swap it leftward until it is in the proper order with the first three. 

- We continue in this manner with the fifth element, the sixth, and so on, until the whole array is sorted.

**Insertion Sort:** Compare as you insert. Time complexity is $O(n^2)$.

In [2]:
def insertionSort(array):
	"""Sort elements in a list in non decreasing order"""
	for step in range(1, len(array)):
		key = array[step]
		# index on left
		j = step - 1
		
		# Compare key with each element on the left of it
		# until an element smaller than it is found
		# For descending order, change key<array[j] to key>array[j].
		while j >= 0 and key < array[j]:
			# slap the value on jth to right neighbour
			array[j + 1] = array[j]
			# decrease j
			j = j - 1
			# run this until you get to the start of the list
			# or you find an element smaller than jth.
			# if the value is bigger just keep slapping.
		# Place key at after the element just smaller than it.
		array[j + 1] = key

data = [9, 5, 1, 4, 3]
insertionSort(data)
print(f'Sorted data: {data}') # Sorted data: [1, 3, 4, 5, 9]

Sorted data: [1, 3, 4, 5, 9]


In [3]:
# another implementation:
def insertion_sort(A):
    n = len(A)
    
    for i in range(1, n):
        current_element = A[i]
        j = i - 1

        # Compare the current element with previous elements in the sorted portion
        while j >= 0 and current_element < A[j]:
            A[j + 1] = A[j]  # Shift the previous element to the right
            j -= 1

        # Place the current element at the correct position in the sorted portion
        A[j + 1] = current_element

# Example usage:
arr = [64, 25, 12, 22, 11]
insertion_sort(arr)
print(arr)  # Output: [11, 12, 22, 25, 64]


[11, 12, 22, 25, 64]


## Selection Sort ? - $O(n^2)$ not really used

Selection sort is a simple sorting algorithm that works by repeatedly finding the minimum (or maximum, depending on sorting order) element from the unsorted part of the array and swapping it with the first unsorted element. 

The process is repeated until the entire array is sorted. 

It has a time complexity of O(n^2) for the worst-case scenario, making it inefficient for large lists, but it is easy to implement and understand.

In [2]:
"""Important Idea"""

# C-9.47 
# Describe an in-place version of the selection-sort algorithm for an array
# that uses only O(1) space for instance variables in addition to the array.

"""Solution"""

# Selection sort is typically not considered an in-place sorting algorithm because 
# it repeatedly selects the minimum (or maximum) element from the unsorted portion of
#  the array and places it at the beginning of the sorted portion. This involves
#  swapping elements, and in the worst case, it may require O(n) auxiliary space for the swaps.

# However, you can implement a variation of selection sort that uses only O(1) additional
#  space for instance variables by minimizing the number of swaps. Instead of swapping
#  elements during each step, you can keep track of the index of the 
# minimum (or maximum) element and swap it with the first element of the unsorted
#  portion at the end of each pass. This way, you perform fewer swaps and use 
# minimal additional space.

def in_place_selection_sort(arr):
    n = len(arr)

    # one smaller than total length
    for i in range(n - 1):
        min_index = i

        # Find the index of the minimum element in the unsorted portion
        for j in range(i + 1, n):
            if arr[j] < arr[min_index]:
                min_index = j

        # Swap the minimum element with the first element of the unsorted portion
        arr[i], arr[min_index] = arr[min_index], arr[i]

# Example usage:
arr = [64, 25, 12, 22, 11]
in_place_selection_sort(arr)
print(arr)  # Output: [11, 12, 22, 25, 64]

[11, 12, 22, 25, 64]


## Bubble Sort

If you just compare every element within the array with others, you will implement bubble sort.

It is O(n*2) and not really used.

In [6]:
def bubble_sort(my_list : list):
    for i in range(len(my_list) - 1):
        for j in range(i, len(my_list)):
            if my_list[i] > my_list[j]:
                my_list[i], my_list[j] = \
                    my_list[j], my_list[i]
    return my_list

print(bubble_sort([1,6,9,3,7,2]))
print(bubble_sort([1,34,3,2,56,8,9,55,10])) # [1, 2, 3, 8, 9, 10, 34, 55, 56])

[1, 2, 3, 6, 7, 9]
[1, 2, 3, 8, 9, 10, 34, 55, 56]


## Merge Sort

It is using an algorithmic pattern called **Divide and Conquer**. This [wonderful source](https://www.youtube.com/watch?v=ib4BHvr5-Ao) explains how to approach divide and conquer.

If an algorithm exhibit overlapping sub problems, you can use divide and conquer.

###  **Divide:** 
If sequence has 0 or 1 element, return. If it has at least 2 elements, divide it into 2.

### **Conquer:** 

Recursively sort sequences that you have as a result of dividing.

### **Combine:** 

Put back sorted elements into S by merging the sorted sequences into one.


In [9]:
def merge_sort(seq):
    
    if len(seq) <= 1:
        return seq
    mid = len(seq) // 2
	# divide and conquer
    left = merge_sort(seq[:mid])
    right = merge_sort(seq[mid:])
    # combine
    return merge(left, right)

def merge(left, right):
    result = []
    i = j = 0
    while i < len(left) and j < len(right):
        if left[i] < right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    # why do we do this ?
    result.extend(left[i:])
    result.extend(right[j:])
    return result

array = [38, 27, 43, 3, 9, 82]
print("Sorted Array:", merge_sort(array)) # Sorted Array: [3, 9, 27, 38, 43, 82]

Sorted Array: [3, 9, 27, 38, 43, 82]


### The running time of MergeSort - `o(n(log(n))`

**Merge method** : $O(n_1+ n_2)$ - The merge function iterates through both left and right arrays exactly once, which takes $O(n)$ time in total, where $n$ is the combined length of left and right.

**Merge Sort Method:** $log(n)$ is the height of the tree, for each division we get that. To divide every node on a tree for merge sort will be proportional to $log(n)$ time and we have $o(n)$ time complexity for n elements in seq.

## Quick Sort 🧠 - the pivot

Quick sort is also based on **Divide and Conquer** algorithm.

### **Divide:** 

If S has at least two elements (nothing needs to be done if S has zero or one element), select a specific element x from S, which is called the ==pivot==. As is common practice, choose the pivot x to be the last element in S.  

Remove all the elements from S and put them into three sequences:  
	• L, storing the elements in S less than x  
	• E, storing the elements in S equal to x  
	• G, storing the elements in S greater than x  

Of course, if the elements of S are distinct, then E holds just one element - the pivot itself.  

### **Conquer:** 

Recursively sort sequences L and G.  

### **Combine:**

Put back the elements into S in order by first inserting the elements of L, then those of E, and finally those of G.

### Running time of Quick Sort

Worst case $o(n^2)$. The height of the tree on quick sort is worst case $o(n - 1)$ NOT $o(logn)$ like it was on merge sort.

This is because the splitting while comparing to pivot is not guaranteed to be half-half. 

## In Place Quick Sort

An algorithm is in-place if it uses only a small amount of memory in addition to that needed for the original input. Our implementation of heap-sort, from Chapter 9.4.2, is an example of such an in-place sorting algorithm. 

Quick-sort of an array-based sequence can be adapted to be in-place, and such an optimization is used in most deployed implementations.

Here is In place Quick Sort:

The main idea behind quick sort is that, we select a pivot, we select two pointers that we are interested in. We keep comparing the values in those pointers and swap when needed. At last, put the pivot just before the right pointer as it should be there.

In [10]:
def inplace_quick_sort(S, a, b):
    """Sort the list from S[a] to S[b] inclusive using the quick-sort algorithm."""
    if a >= b: return # range is trivially sorted
    pivot = S[b] # last element of range is pivot
    left = a # will scan rightward
    right = b-1 # will scan leftward
    while left <= right: 
        # scan until reaching value equal or larger than pivot (or right marker)
        while left <= right and S[left] < pivot:
            left += 1
        # scan until reaching value equal or smaller than pivot (or left marker)
        while left <= right and pivot < S[right]:
            right -= 1 
        if left <= right: # scans did not strictly cross
            S[left], S[right] = S[right], S[left] # swap values 
            left, right = left + 1, right - 1 # shrink range
    # put pivot into its final place (currently marked by left index)
    S[left], S[b] = S[b], S[left]
    # make recursive calls
    inplace_quick_sort(S, a, left - 1)
    inplace_quick_sort(S, left + 1, b)


seq = [2,4,5,621,324,123,45324]
inplace_quick_sort(seq, 0, len(seq)-1)
print(seq) # [2, 4, 123, 324, 621, 45324]

[2, 4, 5, 123, 324, 621, 45324]


## Pythons Built in Sorting

We can use a key to sort sequences with the built in method. This works with **decorate-sort-undecorate** pattern.


In [11]:
seq = [5,7,4,23]
seq.sort() # no new object, the seq is sorted

sorted(seq) # makes a new list object

[4, 5, 7, 23]

In [12]:
colors = ["cyan", "white", "black", "magenta", "red"]
print(sorted(colors, key = len)) 
# ['red', 'cyan', 'white', 'black', 'magenta']

my_tuples_in_town = [(9,2), (6,3), (3,4), (0,5)]
print("Sorted based on first element:", sorted(my_tuples_in_town, key = lambda x : x[0]))
# Sorted based on first element: [(0, 5), (3, 4), (6, 3), (9, 2)]
  
my_dict = { 4: "asdf", 2: "asdz", 0: "seq"}
print(sorted(my_dict, key = my_dict.get)) # # [4, 2, 0]

# For example, here’s a case-insensitive string comparison:
sorted("This is a test string from Andrew".split(), key=str.lower)
# ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']

['red', 'cyan', 'white', 'black', 'magenta']
Sorted based on first element: [(0, 5), (3, 4), (6, 3), (9, 2)]
[4, 2, 0]


['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']

## Comparing Sorting Algorithms - Efficiency - Memory Usage and Stability

Wisdom: While selecting a sorting algorithm, there are trade offs involving efficiency, memory usage and stability.

### TRADE OFFS - EFFICIENCY - MEMORY USAGE - STABILITY


Tim-sort has also become the default algorithm for sorting arrays in Java7.

| Name                                                                            | Best      | Average   | Worst     | Memory    | Stable | Method              | Other notes                                                                                                                                                                     |
| ------------------------------------------------------------------------------- | --------- | --------- | --------- | --------- | ------ | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [Heapsort](https://en.wikipedia.org/wiki/Heapsort "Heapsort")                   | n log (n) | n log (n) | n log (n) | Selection | No     |                     |                                                                                                                                                                                 |
| [Merge sort](https://en.wikipedia.org/wiki/Merge_sort "Merge sort")             | n log (n) | n log (n) | n log (n) | n         | Yes    | Merging             | [Highly parallelizable](https://en.wikipedia.org/wiki/Merge_sort#Parallel_merge_sort "Merge sort") (up to _O_(log _n_) using the Three Hungarians' Algorithm).                  |
| [Timsort](https://en.wikipedia.org/wiki/Timsort "Timsort")                      | n         | n log (n) | n log (n) | n         | Yes    | Insertion & Merging | Makes _n-1_ comparisons when the data is already sorted or reverse sorted.                                                                                                      |
| [Quicksort](https://en.wikipedia.org/wiki/Quicksort "Quicksort")                | n log (n) | n log (n) | n ^ 2     | log (n)   | No     | Partitioning        | Quicksort is usually done in-place with _O_(log _n_) stack space.                                                                                                               |
| [Insertion sort](https://en.wikipedia.org/wiki/Insertion_sort "Insertion sort") | n         | n ^ 2     | n ^ 2     | 1         | Yes    | Insertion           | _O_(_n_ + _d_), in the worst case over sequences that have _d_ [inversions](https://en.wikipedia.org/wiki/Inversion_(discrete_mathematics) "Inversion (discrete mathematics)"). |
| [Bubble sort](https://en.wikipedia.org/wiki/Bubble_sort "Bubble sort")          | n         | n ^ 2     | n ^ 2     | 1         | Yes    | Exchanging          | Tiny code size.                                                                                                                                                                 |
| [Selection sort](https://en.wikipedia.org/wiki/Selection_sort "Selection sort") | n ^ 2     | n ^ 2     | n ^ 2     | 1         | No     | Selection           | Stable with O(n) extra space, when using linked lists, or when made as a variant of Insertion Sort instead of swapping the two items.                                           |
| [Cycle sort](https://en.wikipedia.org/wiki/Cycle_sort "Cycle sort")             | n ^ 2     | n ^ 2     | n ^ 2     | 1         | No     | Selection           | In-place with theoretically optimal number of writes.                                                                                                                           |


### Why is stable sorting important?

Sorted is stable and indeed that it uses exactly the same algorithm as the sort method.

A sort is stable if it guarantees not to change the relative order of elements that compare equal — this is helpful for sorting in multiple passes (for example, sort by department, then by salary grade).

Imagine we have a list of people with their ages:

```python
people = [
    {"name": "Alice", "age": 25},
    {"name": "Bob", "age": 30},
    {"name": "Charlie", "age": 25},
    {"name": "David", "age": 20},
    {"name": "Eve", "age": 30},
]
```
Now, if we want to sort this list by age using a stable sort, the original order of people with the same age should be preserved. In this case, both Alice and Charlie are 25 years old, and Bob and Eve are 30 years old. If we use a stable sort, the sorted list would maintain the order of people with the same age as they appear in the original list.

Here's how you can achieve this in Python using the sorted function with a custom key:

```python
# Define the list of people
people = [
    {"name": "Alice", "age": 25},
    {"name": "Bob", "age": 30},
    {"name": "Charlie", "age": 25},
    {"name": "David", "age": 20},
    {"name": "Eve", "age": 30},
]

# Sort the list of people by age using a stable sort
sorted_people = sorted(people, key=lambda x: x["age"])

# Print the sorted list
for person in sorted_people:
    print(person["name"], person["age"])
```

The output of this code will be:

```markdown
David 20
Alice 25
Charlie 25
Bob 30
Eve 30
```

As you can see, Alice and Charlie, who are both 25 years old, appear in the sorted list in the same order as they appeared in the original list. This demonstrates the concept of stable sorting, where the relative order of equal elements is preserved.