# Quicksort
1. [Overview](#intro)
2. [Partitioning Around a Pivot](#part)
3. [Choosing a Good Pivot](#piv)
4. [Correctness of Quicksort](#proof)

## Overview <a class="anchor" id="intro"></a>

* Key Characteristics
    * Works in __O(n log n)__ on average (depends on the pivot)
    * Works in-place (ie: minimal extra memory needed)
* The sorting Problem (Again)
    * __Input__: array of n numbers, unsorted
        * [3, 8, 2, 5, 1, 4, 7, 6]
    * __Output__: Same numbers, sorted in increasing order
    * __Assume__: All array entries are distinct
* For an exercise, consider extending QuickSort to handle duplicate entries

#### Partitioning Around a Pivot
* Partition array around a _pivot element_
    * items to the left < pivot element < items to the right
* Pick an element of the array (we will just pick the first for this example, more in depth discussion later)
    * [3, 8, 2, 5, 1, 4, 7, 6] --> pivot element = 3
    * This is the item around which we will sort our values initially. 
* Rearrange items to fit the pivot rules    
    * [2, 1, 3, 6, 7, 4, 5, 8]
    * Puts the pivot in its "rightful position"
    * The general order doesn't need to be correct within the bounds of </> the pivot
* Why partition?
    * Can be done in linear time (O(n)) and no extra memory required
    * Reduces problem size, because now we just have to recursively sort each side

#### High-Level Description
* Tony Hoore c. 1961
* QuickSort(array A, length n)
    * If n = 1 return
    * p = choosePivot(A, n)
    * Partition A around p
    * Recursively sort items to the left (first part)
    * Recursively sort items to the right (second part)

## Partitioning Around a Pivot <a class="anchor" id="part"></a>

* If we didn't care about the "in place" aspect, it would be very easy to just do it in linear time 
    * Using O(n) extra memory, it's easy to partition around pivot in O(n) time
    * Basically, we could pre-allocate another array of size n and then scan the array, adding elements to the beginning or end of the array based on whether they are >/< the pivot element, and then put the pivot element in the last remaining hole
* __Assume__: pivot is the first element in an arary
    * If not, just include a pre-pro step that swaps it with the first place element
* __High Level Idea__: Single linear scan that keeps track of the part we have looked at and the part we _haven't_ looked at yet. Within the group we've seen, we'll split further according to elements </> pivot element
<img src ="resources/quick_pivot.PNG">

#### Example
* j = boundary between what we've looked at and haven't --> inceremented each iteration
* i = boundary between < p and > p
* ??? = "unpartitioned" ie: not looked at
<img src="resources/part_template.PNG">
* Starting array:
<img src = "resources/part_example1.PNG"> 
* First Iteration: i remains in the same place, because nothing less than the pivot exists in the 'looked at' group. Because nothing needs to be swapped, it's already a "partitioned subarray"
<img src="resources/part_example2.PNG">
* Second iteration: The 2 and the 8 will swap places with i incrementing to remain in the correct location, separating values greater than and less than the partition value
<img src="resources/part_example3.PNG">
* Third iteration: The 5 is already on the correct side of i, so no swaps are necessary
<img src="resources/part_example4.PNG">
* 4th Iteration: The 1 swaps with the 8, the leftmost array entry larger than the pivot
<img src="resources/part_example5.PNG">
* Fast forward to the end, j basically falls off, i is still after the right most item that is less than the pivot
<img src ="resources/part_example6.PNG">
* Now we need to put the pivot into it's place (ie: at i), and the array is now partitioned
<img src="resources/part_example8.PNG">

#### Pseudocode
* There will be an input array A and the subroutine will be passed two array indices representing the left and right boundary of the subarray. 
    * i and j are initialzed to the same value
    * For loop traverses from j to the right most index of the 'seen' group
        * If the item is greater than p, no swaps need to happen
        * if the item is less than p, swap the item with the element at A[i], the leftmost 'greater than' element, and increment i
    * Once it's fully partitioned, put the pivot where it belongs
* Don't forget the assumptions and key ideas:
    1. The partition is the first element in the array
    2. i separates the < p and > p elements
    3. j separates the 'seen' from the 'unseen'
* Partition(A, l, r):
    * p= A[l]
    * i = l + 1
    * For j = l + 1 to r:
        * if A[j] > p, do nothing
        * if A[j] < p:
            * swap A[j] and A[i]
            * i += 1
    * Swap A[l] and A[i - 1]

#### Running Time of this Subroutine
* O(n) where n i= r - l + 1 is the length of the input subarray
* O(1) work per array entry, because it's just a few comparisons and a swap
* In place - we don't allocate some second copy of an array to populate

#### Correctness of the Subroutine
* __Claim__: the for loop maintains the invariants:
    * A[l+1], ...,A[i-1] < pivot
    * A[i], ..., A[j-i] > pivot
* __Consequence__: at the end of the for loop, all has been seen so everything in the array has been organized in the appropriate manner. All that's left is to put the pivot where it belongs (A[i])

## Choosing a Good Pivot <a class="anchor" id="piv"></a>

* The running time of the Quicksort is entirely dependent on the _quality_ of the pivot
    * "Good quality" - splits the array into roughly two equal sized subrproblems
    * "Bad quality" - very unbalanced subproblems
* ___Quiz Question___: Suppose we implement QuickSort so that the ChoosePivot always selects the first element of the array. What is the running time of this algorithm on an input array that is already sorted? __Theta(n<sup>2</sup>)__
    * Recall the general QuickSort Algorithm:
        * Partition Subroutine
        * 2 recursive calls to the < p and > p subarrays
    * Thus, if the array is already sorted and the pivot is the first element, then one of these calls is just vacuous, and the second recursive call happens on a problem that has a size of only 1 less than the original array
        * This will happen over and over and over again, on subarrays with length n-1 each time until we hit the base case of n = 1
    * The partition subroutine looks at each element at least once and will run in each level of recursion on those smaller subarrays --> T >= n + (n-1) + (n-2) + (n-3) + ... + 1 = n<sup>2</sup>
* ___Quiz Question___: Suppose we run a QuickSort on some input and every recursive call chooses the median element of its subarray as its pivot. What's the running time in this case? Theta(n log n)
    * This basically exactly matches the Merge Sort recurrence
        * Work inside recursive calls: If the median is chosen every time, the input array will be n/2 at each level of recursion --> log(n)
        * Work outside: The Partition subroutine runs in linear time (explained below) 
    * Total T(n) <= 2T(n/s) + Theta(n) --> T(n)--> Theta(n log n)
* Given these upper and lower bounds, the question becomes _how do we choose pivots that keep us at or close to n log n time?_

#### Random Pivots
* For every time we recursively call QuickSort and we are passed some subarray of length k, we will randomly choose each one with an equal probablity (1/k)
* Hope: a random pivot will be "pretty good" or "good enough"
* __Intuition__: 
    1. If we always get a 25-75 split, good enough for O(n log n) running time
        * This can be proven via a recursion tree, but it's tougher than our general Master Method cases because it's unbalanced
    2. Half of the  lements will give us this "good enough" split
        * Ex: In an array of integers 1 - 100, the pivots that will give us 25-75 split or better is any number between 26-75 (inclusive), which is a full half of the items in the array
* If we are getting "good enough" half of the time, an average running time of n log n might be reachable

#### Average Running Time of QuickSort
* __QuickSort Theorem__: for every input array of length n, the average running time of the quicksort implemented with random pivots is O(n log n)
    * _Note: No assumptions about data, it holds for every input_
    * "Average" comes from the random choices made by the algorithm
>For _every_ possible input array, while the running time does fluctuate between an upper bound of n<sup>2</sup> and a lower bound of n log n, the best case dominates. More often than not, you will see n log n behavior

## Correctness of QuickSort <a class="anchor" id="proof"></a>

Optional - return to this