# Selection Algorithms

- selection algorithms are algorithms to find the k-th smallest (or largest) item in a data structure.
- such a number is called k-th order statistic 
- we can find the maximum item, the minimum item or the median. 
- the aim is to achieve **O(N)** linear running time complexity.
- **ALGORITHMS**: quickselect or median of medians method. 

-------------------------------------------------------------------------------------------------
Complexity.

- we may have the intuition that let's use sorting.
- if we have the sorted data structure (array) then we can find k-th order statistic with index k-1
- it is an inefficient approach if we want to find just a single item (max, min, or the median)
- sorting has **O(N*Log(N))** running time in best-case.
- it is an efficient solution if we want to find the several k-th order statistics.  

- online algorithms are methods that can process the input in a serial manner - without having the entire input available from the start. 
- so here we do not know the whole input.
- for example we keep downloading data and we want to find the k-th order statistics (minimum, maximum, or median) on the fly.
- PROBLEM: we do not know the values in advance.
- this is so called secretary problem.

### Quickselect Algorithm (Hoare's Algorithm)

- it is a selection algorithm designed by Tony Hoare
- quickselect is a selection algorithm - it is able to find the k-th smallest (largest) item in an unordered array.
- it has **O(N)** linear running time in best-case.
- in worst-case it has in **O(N^2)** quadratic running time. 
- Hoare algorithm is an in-place approach - does not need additional memory (huge advantage). 

- the concept is very similar to that of quicksort
- instead of recursing into both side of the array we just take one side when dealing with quickselect – this is how we
- this is how we end up with O(N) instead of O(NlogN)

The Algorithm

1.) choose a so-called pivot item at random

2.) partition the array (based on the value of the pivot)

3.) instead of recursion into both sides, we take just one side 

#### 1. THE PARTITION PHASE 
The partition method is just for partitioning the array according to the pivot

    -> choose a pivot value at random: we generate a random number
        in the range [first_index, last_index]

    -> re-arrange the array in a way that all elements less than pivot are on left side
        of pivot and others on right. 

            ~ partition returns with the final position (index) 
                 of the pivot element


### 2. THE SELECT PHASE (k=2 so we are after the second largest item) 

    After partititoning there may be 3 cases:

        1.) k == pivot

            It means we have found the k-th smallest (largest) item we are after because
               this is how partitioning works: there are exactly k-1 items that are
            smaller than the pivot (in this case pivot == k)

        2.) k < pivot
        
           The k-th smallest item is on the left side of the pivot, thats why we can
            discard the other subarray (unlike quicksort)

        3.) k > pivot       

            The k-th smallest item is on the right side of the pivot
            

### 3. PERFORMANCE

- best-case performance: **O(N)**
- worst-case performance: **O(N^2)**
- average-case performance: **O(N)**

- The worst-case running time **O(N^2)** so quadratic running time complexity For Example: We want to find the maximum in a sorted array and we always choose the first element (smallest one) to be the pivot.


In [21]:
from typing import List 
import random

class QuickSelect:
    def __init__(self, nums: List[int]) -> None: 
        self.nums = nums 
        self.first_index = 0 
        self.last_index = len(nums) - 1 
        self.descending = False 

    def __swap(self, i: int, j: int) -> None:
        self.nums[i], self.nums[j] = self.nums[j], self.nums[i]

    def partition(self, first_index: int, last_index: int) -> int: 
        pivot_index = random.randint(first_index, last_index)

        self.__swap(pivot_index, last_index) 

        for i in range(first_index, last_index): 

            if (self.descending):
                if (self.nums[i] > self.nums[last_index]):
                    self.__swap(i, first_index) 
                    first_index = first_index + 1 
            else:
                if (self.nums[i] < self.nums[last_index]):
                    self.__swap(i, first_index) 
                    first_index = first_index + 1 

        self.__swap(first_index, last_index)

        return first_index

    def select(self, first_index: int, last_index: int, k: int) -> int: 
        pivot_index = self.partition(first_index, last_index)

        if (pivot_index < k):
            return self.select(pivot_index+1, last_index, k) 
        
        elif (pivot_index > k):
            return self.select(first_index, pivot_index - 1, k) 
        
        return self.nums[pivot_index]

    def run(self, k: int) -> int: 
        return self.select(self.first_index, self.last_index, k - 1)
    

    def sort(self, descending=False) -> List[int]: 
        sorted_list = [] 
        self.descending = descending

        for k in range(self.last_index + 1): 
            value = self.run(k + 1) 
            sorted_list.append(value)
        return sorted_list

x = [1, 2, -5, 10, 100, -7, 3, 4]
select = QuickSelect(x)
select.sort(descending=True) 

[100, 10, 4, 3, 2, 1, -5, -7]

### Advanced Selection Algorithms 
- quickselect algorithms is extremely sensitive to the pivot item
- each partition phase takes O(N) linear running time – of course N is smaller and smaller in every recursive call
- if we are not able to discard  many items: the O(N) linear running time may be reduced to O(N2) running time
- the pivot selection approach is crucial (!!!)

- let’s assume we are looking for the smallest value
- the wors-case scenario happens when we pick the largest item in every iteration to be the pivot
- the partition phase takes O(N) time and we make N iteration

#### How do we address the pivot problem and secure **O(N)**

- how to make sure we select the right pivot?
- if we select the median then the algorithm will have O(N) linear running time complexity for sure
- there will be approximately the same amount of items in the left and right subarrays
- the median of medians algorithm uses quickselect algorithm but it select the median as the pivot
- of course we have to store some additional items in memory – O(logN) memory complexity in worst-case

- we know that sorting small arrays has approximately O(N) linear running time – such as with insertion sort
- split the original array into 5 chunks and sort them
- 5 because we have to make sure the chunks are small but not too small to avoid too much recursive calls
- pick the middle item of these subarrays: the middle item in the sorted order is the median 
- and then calculate the median (middle item) of these medians

