# Divide and Conquer: Quick Select

### Selection

* Find the $k^{th}$ largest value in a sequence of length $n$
* Sort in descending order and look at position $k$ - $O(n \ log \ n)$
* Can we do better than this?
  - $k = 1$ - maximum, $O(n)$
  - $k = n$ - minimum, $O(n)$
* For any fixed $k$, $k$ passes, $O(kn)$
* Median - $k = n / 2$
  - If we can find median $O(n)$, quicksort becomes $O(n \ log \ n)$

### Divide and Conquer

* Recall partitioning for quicksort
  - Pivot partitions sequence as `lower` and `upper`
* Let `m = len(lower)`. 3 cases:
  - `k <= m` - answer lies in `lower`
  - `k == m + 1` - answer lies in `pivot`
  - `k > m + 1` - answer lies in `upper`
* Recursive strategy
  - Case 1: `select(lower, k)`
  - Case 2: `return(pivot)`
  - Case 3: `select(upper, k - (m + 1))`

In [None]:
# To find the k-th largest element in L[l:r]
def quick_select(L, l, r, k):
  if (k < 1) or (k > r - 1):
    return None
  
  pivot, lower, upper = L[l], l + 1, l + 1

  for i in range(l + 1, r):
    if L[i] > pivot:    # Extend the upper segment
      upper += 1
    else:               # Exchange L[i] with the start of upper segment
      L[i], L[lower] = L[lower], L[i]
      lower += 1
      upper += 1
  
  # Move the pivot
  L[l], L[lower - 1] = L[lower - 1], L[l]
  lower - 1

  # Recursive calls
  lower_len = lower - l

  if k <= lower_len:
    return quick_select(L, l, lower, k)
  elif k == lower_len + 1:
    return L[lower]
  else:
    return quick_select(L, lower + 1, r, k - (lower_len + 1))

### Analysis

* Recurrence is similar to quick sort
* $T(1) = 1$
* $T(n) = max(T(m), T(n - (m + 1))) + n$, where $m = len(lower)$
* Worst case: $m$ is always $0$ or $n - 1$
  - $T(n) = T(n - 1) + n$
  - $T(n)$ is $O(n^2)$
* Recall: if the pivot is within a fixed fraction, quick sort is $O(n \ log \ n)$
  - E.g. pivot in middle third of values
  - $T(n) = T(n / 3) + T(2n / 3) + n$
* Can we find a good pivot quickly?

### Median of medians

* Divide $L$ into blocks of $5$
* Find the median of each block (brute force)
* Let $M$ be the list of block medians
* Recursively apply the process to $M$
* What can we guarantee about `MoM(L)`?

In [None]:
def MoM(L):   # Median of medians
  if len(L) <= 5:
    L.sort()
    return L[len(L)//2]
  
  # Construct list of block medians
  M = []

  for i in range(0, len(L), 5):
    X = L[i : i + 5]
    X.sort()
    M.append(X[len(X)//2])
  
  return MoM(M)

* We can visualize the blocks as follows
* Each block of $5$ is arranged in ascending order, top to bottom
* Block medians are the middle row

![image](https://firebasestorage.googleapis.com/v0/b/fb-sandbox-25.appspot.com/o/W8L5_1.png?alt=media&token=f2f237c3-712d-4d0e-8aa6-01c9bba40496)

* We can visualize the blocks as follows
* Each block of $5$ is arranged in ascending order, top to botto
* Block medians are the middle row
* The median of block medians lies between $3len(L)/10$ and $7len(L)/10$

![image](https://firebasestorage.googleapis.com/v0/b/fb-sandbox-25.appspot.com/o/W8L5_2.png?alt=media&token=49e9df06-ecc9-4389-9286-f8feff204f49)

### Analysis

* Use median of block medians to locate the pivot for `quick_select`
* `MoM` is $O(n)$
  - $T(1) = 1$
  - $T(n) = T(n/5) + n$
* Recurrence for `fast_select` is now
  - $T(1) = 1$
  - $T(n) = max(T(3m/10), T(7m/10) + n),$ where $m = len(lower)$
* $T(n)$ is $O(n)$
* Can also use `MoM` to make quick sort $O(n \ log \ n)$

In [None]:
# Find the k-th largest element in L[l:r]
def fast_select(L, l, r, k):
  if (k < 1) or (k > r - 1):
    return None
  
  # Find MoM pivot and move to L[l]
  pivot = MoM(L[l:r])
  pivot_pos = min([i for i in range(l, r) if L[i] == pivot])
  L[l], L[pivot_pos] = L[pivot_pos], L[l]

  # Partition as before
  pivot, lower, upper = L[], l + 1, l + 1
  for i in range(l + 1, r):
    ...
  
  # Recursive calls
  lower_len = lower - l

  if k <= lower_len:
    return fast_select(L, l, lower, k)
  elif k == lower_len + 1:
    return L[lower]
  else:
    return fast_select(L, lower + 1, r, k - (lower_len + 1))

### Summary

* Median of block medians helps find a good pivot in $O(n)$
* Selection becomes $O(n), quicksort becomes $O(n \ log \ n)$
* Notice that `fast_select` with `k = len(L)/2` finds median in time $O(n)$

**Historical note**
* C.A.R. Hoare described `quick_select` in the same paper that introduced `quick_sort`, 1962
* The median of medians algorithm is due to Manuel Blum, Robert Floyd, Vaughn Pratt, Ron Rivest and Robert Tarjan, 1973

Acknowledgement

Illustrations from [Algorithms by Jeff Erickson](https://jeffe.cs.illinois.edu/teaching/algorithms/)