#### Sorting Algorithms

##### Bubble Sort
**O($n^2$) time** | **O(1) space**
- Start at the beginning of the array and swap the first two elements if first is > second
- Proceed to next pair, and so on, until your reach the end of the array
- Now do the second element of the array, taking into consideration that the last element (n-1-i) has now been sorted
- Repeat process until entire array has been swept

In [3]:
from typing import List

def bubble_sort(arr: List[float]) -> List[float]:
  for i in range(len(arr) - 1):  # range(len(arr)) works but will repeat one time more than needed
    for j in range(len(arr) - 1 - i):
      if arr[j] > arr[j + 1]:
        arr[j], arr[j + 1] = arr[j + 1], arr[j]

  return arr

arr = [8, 1, 12, 9, 4, 22, 5]
bubble_sort(arr)
print(arr)

[1, 4, 5, 8, 9, 12, 22]


##### Selection Sort
**O($n^2$) time** | **O(1) space**
- Find the smallest element using a linear scan and move it to the front (swapping it with the front element)
- Find the second smallest element and so on by again doing a linear scan
- Continue until all elements are in place

In [1]:
from typing import List

def selection_sort(arr: List[float]) -> List[float]:
  for i in range(len(arr) - 1):
    min = i
    # Find minimum
    for j in range(i + 1, len(arr)):
      min = j if arr[j] < arr[min] else min
    
    # Put min at the correct position
    arr[i], arr[min] = arr[min], arr[i]
  
  return arr

arr = [8, 1, 12, 9, 4, 22, 5]
selection_sort(arr)
print(arr)

[1, 4, 5, 8, 9, 12, 22]


##### Insertion Sort
**O($n^2$) time (worst case)** | **O(n) time (best case)** | **O(1) space**
- Examine each element and compare it to its left neighbour
- If it's not in the correct position (i.e. is smaller than its left neighbour), insert it in the correct position

Notes:
- Best case occurs when the array is already sorted
- Similar to sorting a hand of cards

In [12]:
from typing import List

def insertion_sort(arr: List[float]) -> List[float]:
  for i in range(1, len(arr)):
    # Compare to left neighbour
    j = i
    while j > 0 and arr[j] < arr[j - 1]:
      arr[j], arr[j - 1] = arr[j - 1], arr[j]
      j -= 1
  
  return arr


arr = [8, 1, 12, 9, 4, 22, 5]
insertion_sort(arr)
print(arr)

[1, 4, 5, 8, 9, 12, 22]


##### Merge Sort
**O(nlog n) time** | **O(n) space**
- Divide and conquer algorithm
- **Stable** sort algorithm
- Two functions involved:
  - **merge_sort** to divide the array until the size becomes one
  - **merge** for merging two halves
- Divide the array in half while l < r, eventually you are merging just two single element arrays
- Loop through the left and right arrays and compare values, adding the lower one to the original array
- Merge any remaining elements in the left and right arrays

See [here](https://www.khanacademy.org/computing/computer-science/algorithms/merge-sort/a/analysis-of-merge-sort) for an explanation of merge sort time complexity.

In [25]:
from typing import List

def merge_sort(arr: List[float]) -> List[float]:
  if len(arr) > 1:
    mid = len(arr) // 2
    L = arr[:mid]
    R = arr[mid:]

    merge_sort(L)
    merge_sort(R)
  
    merge(arr, L, R)
  
  return arr

def merge(arr: List[float], L: List[float], R: List[float]):
  i = j = k = 0  # left, right and current

  # Compare each element of left and right lists and merge
  while i < len(L) and j < len(R):
    if L[i] <= R[j]:
      arr[k] = L[i]
      i += 1
    else:
      arr[k] = R[j]
      j += 1
    
    k += 1  # always increment current

  # Copy any remaining elements over
  while i < len(L):
    arr[k] = L[i]
    k += 1
    i += 1
  
  while j < len(R):
    arr[k] = R[j]
    k += 1
    j += 1

arr = [8, 1, 12, 9, 4, 22, 5]
merge_sort(arr)
print(arr)
  

[1, 4, 5, 8, 9, 12, 22]


##### Quick Sort
**O(nlog n) average time**  | **O($n^2$) worst-case time** | **O(logn) average space**
- Divide and conquer algorithm
- Pick a random pivot element (could be last or middle, etc. as well) and partition the array such that all all numbers which are less than the partition come before all elements that are greater than it
- Repeatedly partition the array around an element to eventually sort it.
- Worst case O($n^2$) because the partitioned element is not guaranteed to be the median.

In [20]:
from typing import List

def quick_sort(arr: List[float], l: int, r: int):
  if l < r:
    p = partition(arr, l, r)
    quick_sort(arr, l, p - 1)
    quick_sort(arr, p, r)

def partition(arr, l, r):
  pivot = arr[(r - l) // 2 + l]  # Pick middle element between l and r for pivot
  
  while l <= r:
    while arr[l] < pivot: l += 1  # Look for an element larger than pivot on the left
    while arr[r] > pivot: r -= 1  # Look for an element smaller than pivot on the right
    
    # Swap the elements
    if l <= r:
      arr[l], arr[r] = arr[r], arr[l]
      l += 1
      r -= 1

  # Return the index of the pivot element after partitioning
  return l

arr = [8, 1, 12, 9, 4, 22, 5]
quick_sort(arr, 0, len(arr) - 1)
print(arr)
  

[1, 4, 5, 8, 9, 12, 22]


##### Counting Sort
**O(n + k) time** | **O(n + k) space** where k is the range of values
- To understand Radix sort, we must first cover counting sort:
1. Create an array with a length equal to the total range (max_val + 1). This becomes the counts array.
2. Count the number of occurrences of each number in the array and store the result in the counts array.
3. Take the cumulative sum at each position in the counts array.
4. Shift the counts array to the right (the last value will drop off).
   - This now becomes the array of starting indexes for each value (referred to as position array below).
5. Initialize a result array with the same size as the input array.
6. Iterate through the input array and place the number into the result array at the index specified by the count array, incrementing the count array's index by 1 every time.

- Counting sort is best when the range of numbers (k) is small, otherwise it will have a large space and time complexity (e.g. having the largest possible integer value will require storing an array corresponding size)
- Counting sort is **stable** and **non-comparative**
- Radix and count sort can only be used for **integers** or **strings** that are mapped to integer keys (basically like hashing)

In [3]:
from typing import List

def counting_sort(arr: List[int]) -> List[int]:
  # Initialize the counts array with a length equal to the total range (max val + 1)
  counts = [0] * (max(arr) + 1)
  
  # Count the number of occurrences of each number in the input array
  for num in arr:
    counts[num] += 1
  
  # Calculate the cumulative sum
  for i in range(1, len(counts)):
    counts[i] += counts[i - 1]

  # Right shift to get the position array
  pos = [0] + counts[:-1]

  # Initialize sorted result array
  res = [0] * len(arr)

  # Store numbers in the result corresponding to their start index in the position array
  for num in arr:
    res[pos[num]] = num
    pos[num] += 1

  # Copy res to original array
  for i, num in enumerate(res):
    arr[i] = num

arr = [8, 1, 12, 9, 4, 22, 5]
counting_sort(arr)
print(arr)

[1, 4, 5, 8, 9, 12, 22]


In [2]:
# Algorithm for working with negative values (subtract the min value to get the range and positions):
from typing import List

def counting_sort(arr: List[int]) -> List[int]:
  min_ = min(arr)
  max_ = max(arr)
  counts = [0] * (max_ - min_ + 1)
  
  for num in arr:
    counts[num - min_] += 1
  
  for i in range(1, len(counts)):
    counts[i] += counts[i - 1]

  pos = [0] + counts[:-1]

  res = [0] * len(arr)

  for num in arr:
    res[pos[num - min_]] = num
    pos[num - min_] += 1

  # Copy res to original array
  for i, num in enumerate(res):
    arr[i] = num

arr = [8, 1, -5, 12, 9, -8, 4, 22, 5]
counting_sort(arr)
print(arr)

[-8, -5, 1, 4, 5, 8, 9, 12, 22]


##### Radix Sort
**O(d(n + b)) time** where d is the **number of passes** of the sorting algorithm and b is the base (e.g. base 10) | **O(n + b) space** 
- Radix sort is an optimization of counting sort which maintains a linear time complexity regardless of the range of numbers

1. Sort the given array using counting sort using the last digit (tens place) of the array of numbers
2. Repeat same procedure for all places of the digits (replacing with 0 if the place doesn't exist for a number)

- Choosing a base comes down to making a trade off between time and space, a larger b implies a smaller d (or number of passes), however would result in larger arrays during the counting sort step.

In [17]:
import math
from typing import List

def get_digit(number: int, n: int):
  """Returns the nth digit of a number."""
  digit = abs(number) // 10**n % 10
  return digit if number >= 0 else -digit

def count_sort(arr: List[int], digit: int):
  digit_arr = [get_digit(num, digit) for num in arr]
  max_ = max(digit_arr)
  min_ = min(digit_arr)
  counts = [0] * (max_ - min_ + 1)

  for num in digit_arr:
    counts[num - min_] += 1
  
  for i in range(1, len(counts)):
    counts[i] += counts[i - 1]
  
  pos = [0] + counts[:-1]

  res = [0] * len(arr)
  for i, num in enumerate(arr):
    # Get position corresponding to the digit array value
    res[pos[digit_arr[i] - min_]] = num
    pos[digit_arr[i] - min_] += 1
  
  # Copy res to original array
  for i, num in enumerate(res):
    arr[i] = num

def radix_sort(arr: List[int]):
  abs_max = max(arr, key=abs)
  # Number of digits in largest absolute value
  d = int(math.log10(abs_max)) + 1
  
  for i in range(d):
    count_sort(arr, i)
  
  return arr
  
arr = [8, 1, 12, -25, -1, 1, 9, 4420, 243, 56]
radix_sort(arr)
print(arr)


[-25, -1, 1, 1, 8, 9, 12, 56, 243, 4420]


##### Binary Search
Look for an element x in a **sorted** array by first comparing x to the midpoint of the array. If x is less than the midpoint, then we search the left half of the array. If x is greater than the midpoint, we search the right. Repeat until we find x or the subarray has a size of 0.

In [1]:
from typing import List

# Iterative
def binary_search(arr: List[int], x: int):
  low = 0
  high = len(arr) - 1

  while low < high:
    mid = low + (high - low) // 2
    if arr[mid] < x:
      low = mid + 1
    elif arr[mid] > x:
      high = mid - 1
    else:
      return mid
  
  return -1

# Recursive
def rec_binary_search(arr: List[int], x: int, low: int, high: int):
  if low > high:
    return -1

  mid = low + (high - low) // 2

  if arr[mid] < x:
    return rec_binary_search(arr, x, mid + 1, high)
  elif arr[mid] > x:
    return rec_binary_search(arr, x, low, mid - 1)
  else:
    return mid
  

arr = [12, 19, 23, 32, 45, 52]
print(binary_search(arr, 12))
print(rec_binary_search(arr, 12, 0, len(arr) - 1))
print(binary_search(arr, 45))
print(rec_binary_search(arr, 45, 0, len(arr) - 1))

0
0
4
4


Below is a generalized binary search template for algorithm problems. Only three things need to be changed for each problem:
- The left and right boundaries (include all values)
- Decide the return value `left` or `left - 1`. Remember that after exiting the loop, `left` is the minimum index satisfying the condition
- The condition function

The search space will eventually shrink so that the minimum (left) is at the target. See this [question](https://leetcode.com/problems/first-bad-version/discuss/769685/Python-Clear-explanation-Powerful-Ultimate-Binary-Search-Template.-Solved-many-problems.) as an introductory example.

In [7]:
# Generalized template:
def binary_search(array) -> int:
    def condition(value) -> bool:
        pass

    left, right = 0, len(array)
    while left < right:
        mid = left + (right - left) // 2
        if condition(mid):
            right = mid
        else:
            left = mid + 1
    return left

# Example
def ex_binary_search(array, target) -> int:
    left, right = 0, len(array)
    while left < right:
        mid = left + (right - left) // 2
        if array[mid] >= target:
            right = mid
        else:
            left = mid + 1
    # Watch out for index out of range error
    return left if left < len(array) and array[left] == target else -1

print(ex_binary_search([1, 4, 6, 18, 20, 23, 35], 45))
print(ex_binary_search([1, 4, 6, 18, 20, 23, 35], 23))
print(ex_binary_search([1, 4, 6, 18, 20, 23, 35], 1))

-1
5
0


#### Interview Questions

**10.1 Sorted Merge**
- Similar to merge sort but we merge backwards starting from the back of A
- O(A + B) time and O(1) space

In [49]:
from typing import List

def sorted_merge(a: List[int], b: List[int]):
  i = len(list(filter(None, a))) - 1
  j = len(b) - 1
  k = -1

  # While we're not at the end of b
  while j >= 0:
    # Ensure we haven't reached the end of A
    if i >= 0 and a[i] >= b[j]:
      a[k] = a[i]
      i -= 1
    # If we've reached the end of A, we will populate only from B
    else:
      a[k] = b[j]
      j -= 1
    k -= 1

a = [1, 4, 8, 11, None, None, None]
b = [0, 2, 12]
sorted_merge(a, b)
print(a)


[0, 1, 2, 4, 8, 11, 12]


**10.2 Group Anagrams**
- Sort each word and group anagrams by using a dictionary
- Note: question does not specify to sort anagrams, only to group them
- O(nslogs) time (where s is the length of the longest string) and O(n) space

**Alternative**:
Use a tuple of character counts as the dictionary keys - **O(ns)** time and O(n) space.

In [63]:
from typing import List
from collections import defaultdict

def group_anagrams(words: List[str]):
  d = defaultdict(list)  # Dictionary for storing groups of anagrams
  for w in words:
    key = "".join(sorted(w))
    d[key].append(w)

  i = 0
  for key in d:
    for anagram in d[key]:
      words[i] = anagram
      i += 1

a = ['abc', 'xyz', 'bac', 'abcd', 'yxz']
group_anagrams(a)
print(a)

['abc', 'bac', 'xyz', 'yxz', 'abcd']


In [57]:
from typing import List
from collections import defaultdict

# Alternative solution with character counts
def group_anagrams(words: List[str]):
  d = defaultdict(list)  # Dictionary for storing groups of anagrams
  for w in words:
    counts = [0] * 26
    for c in w:
      counts[ord(c) - ord('a')] += 1
    d[tuple(counts)].append(w)  # Tuple is hashable

  return list(d.values())

a = ['abc', 'xyz', 'bac', 'abcd', 'yxz']
a = group_anagrams(a)
print(a)

[['abc', 'bac'], ['xyz', 'yxz'], ['abcd']]


**10.3 Search in Rotated Array**
- Find the normally ordered side of the array, check if the target is in the range, if not search the other side.
- Note: be careful with the = signs in this question
- O(logn) time and O(1) space

In [45]:
from typing import List

def rotated_search(nums: List[int], target: int):
  left, right = 0, len(nums) - 1

  while left <= right:
    mid = left + (right - left) // 2
    if nums[mid] == target:
      return mid
    
    # Left is normally ordered
    if nums[left] <= nums[mid]:
        if nums[left] <= target < nums[mid]:
            right = mid - 1
        else:
            left = mid + 1

    # Right is normally ordered
    if nums[right] >= nums[mid]:
        if nums[mid] < target <= nums[right]:
            left = mid + 1
        else:
            right = mid - 1


print(rotated_search([15, 16, 19, 20, 25, 1, 3, 4, 5, 7, 10, 14], 5))
print(rotated_search([70, 75, 17, 18, 30, 31, 35, 60], 30))  


8
4


**10.4 Sorted Search, No Size**
- Find the approximate size of listy using an exponential backoff approach (increase by a factor of $2^n$ until we find the end)
- Perform a regular binary search to find the target using the length using the approximate size
- O(logn) time and O(1) space

In [53]:
# Imitation of a class with no size method 
class Listy:
  def __init__(self, size: int):
    self.size = size
  
  def element_at(self, index: int):
    if index > self.size:
      return -1
    return index  # Return the same number as the index to keep it simple (maintains sorted order)

def search_listy(listy: Listy, target: int):
  # Find the approximate size of listy, or the point we pass the target, using exonential backoff (log n)
  n = 0
  while listy.element_at(2**n) != -1 and listy.element_at(2**n) < target:
    n += 1

  # Can start our search at 2^(n-1) because we know from exponential backoff the target must be between n-1 and the end (n)
  l, r = 2**(n-1), 2**n
  while l <= r:
    mid = l + (r - l) // 2
    if target > listy.element_at(mid):
      l = mid + 1
    elif target < listy.element_at(mid):
      r = mid - 1
    else:
      return mid
  
  return -1

print(search_listy(Listy(5000), 496))



496
