# The Master Method
* [Motivation](#mot)
* [Formal Statement](#form)
* [Examples](#ex)
* [Proof Part 1](#p1)
* [Interpretation of the Three Cases](#int)
* [Proof Part 2](#p2)

## Motivation <a class="anchor" id="mot"></a>

* Potentially useful algorithmic ideas often need mathematical analysis to evaluate
    * ie: the grade-school multiplication algrorithm uses O(n<sup>2</sup>) operations to multiply two n-digit numbers vs.  the recursive approach using divide and conquer to break those n-digit numbers into smaller n/2 digit numbers
* The way we reason about the running time of recursive algorithms uses a concept called __recurrence__
* __Recurrence__ - a way to express T(n) in terms of T smaller numbers, express T(n) in terms of running time of recursive calls
    * 2 ingredients:
        1. Base Case - Once you get down to a small input, then the running time is just constant
        2. General Case - Running time is comprised of the work done by the recursive calls and the work done right now, outside the recursive calls
<br><br>
* Integer Multiplication Example 
    > (Statement * ): x * y = 10<sup>n</sup>ac + 10<sup>n/2></sup>(ad + bc) + bd

* ___Previously, we focused on the work being done OUTSIDE recursive calls for out examples and proofs (merge sort, inversion counts) but this analysis focuses on the work done INSIDE the recursive calls___
* Grade School (Naive) Version:
    * Recursively compute ac, ad, bc, bd and complete Statement * in the natural way, adding zeros as necessary and summing the individual parts.
    * T(n) = maximum number of operations this algorithm needs to multiply two n-digit numbers
    * Base Case - In this case, 2 1-digit numbers = T(1) <= a constant
    * General Case - 4 recursive calls  = 4T(n/2)), additions etc outside the recursive calls = (O(n)) --> T(n) <= 4T(n/2) + O(n)
* Gauss Version (Optimized Karatsuba):
    * Recursively compute ac, bd, (a+b)(c+d), and the final product is (a+b)(c+d) - ac - bd
    * Base Case - 2 1-digit numbers, --> T(1) is a constant
    * Recursive case - __3__ recursive calls, plus additions outside the recursion --> 3T(n/2) + O(n)
* In both of these examples, we are ignoring two things:
    1. The number of digits in the sums of (a + b) and (c + d) may result in numbers with n/2 +1 digits, BUT that additional 1 makes little to no difference in the analysis
    2. The exact constant factor in the linear work done outside the recursive calls. This is actually a little larger in Gauss's version, but because it's a constant factor, would be suppressed in the Big Oh notation anyway
* Consider the __Merge Sort__ - instead of 3 recursive calls, it makes 2 (one on the left and one on the right), and outside of those calls it also does linear work. 
    * We know that the running time is N log n, so Gauss's algorithm is going to be worse --> we just don't know by how much
    * Thus, we don't actually have any idea what the running time of Gauss's Algorithm is, we just know some things that it sits between
<br>
<br>
* ___Motivation: The running time/recurrence of Gauss's recursive algorithm for integer multplication can be explicitly solved via the Master Method___

## Formal Statement <a class="anchor" id="form"></a>

* The __Master Method__ is essentially a black box for solving recurrences
    * Input: a recurrence in a particular format
    * Output: Solution, an upper bound of the running time of the recursive algorithm 
* Assumptions for our purposes:
    1. All subproblems have equal size
* Necessary Recurrence Format
    1. Base Case: T(n) <= a constant for all sufficiently small n
    2. General case: The running time of an input of length n is bounded above by some constant (a) of recursive calls and then each of those sub problems has exactly the same size of (1/b) fraction of the original input size. Outside the recursive calls, there is some amount of work, so we'll call that n<sup>d</sup>
        * T(n) <= aT(n/b) + O(n<sup>d</sup>
        * where constants:
            * a = number of recursive calls(>= 1)
            * b is the factor by which the input size shrinks before recursive calls are called (>1)
            * d = exponent of the "combine" step (>= 0)
    * There _is_ a constant hidden in the big-oh notation of d, but it doesn't really matter in the full analysis. Will be accounted for in the actual proof.
<br><br>
___The Master Method Formal Statement___  
* 3 cases:
    * a = b<sup>d</sup>
        * T(n) = O(n<sup>d</sup>logn))
    * a < b<sup>d</sup>
        * T(n) = O(n<sup>d</sup>)
    * a > b<sup>d</sup>
        * T(n) = O(n<sup>log<sub>b</sub>a</sup>)
* Observations
    * In case 1, we don't specify the log _because it doesn't matter_. The log with respect to any two bases differs by a constant factor independent of n, which is (of course) suppressed in notation. This is opposed to case 3, where we have a log in the exponent, wherein a constant is _much_ more important in keeping track of the exponential time of the function

## Examples <a class="anchor" id="ex"></a>

#### Example 1 - Merge Sort

* Identify the values a, b, and d
    * a = 2 (number of recursive calls)
    * b = 2 (each sub problem is half the size of the original input)
    * d = 1 (work done outside of recursive calls is in linear time)
* Identify the case
    * a = 2
    * b<sup>d</sup> = 2<sup>1</sup> = 2
    * a = b --> Case 1
* In case 1, T(n) = T(n) = O(n<sup>1</sup>logn)) = O(n log n)

#### Example 2 - Binary Search Algorithm

* In binary search, you don't have to check both sides of the array. Just look at the middle element, compare it to the target and then recurse on the appropriate side of the array
    * a = 1
* Each subproblem is again divided by two
    * b = 2
* Outside of recursive call is 1 comparison, so it's constant 
    * d = 0
* Case 1 --> T(n) = O(n<sup>d</sup>logn)) = O(log n)

#### Example 3 - Naive Int. Multiplication

* 4 Recursive calls
    * a = 4
* When we take products, each smaller number has n/2 digits
    * b = 2
* Outside of the recursive calls we have a number of addition operations, which can be completed in linear time
    * d = 1
* 4 > 2<sup>1</sup> --> Case 3
* Case 3 --> T(n) = O(n<sup>log<sub>b</sub>a</sup>)
    * T(n) = O(n<sup>log<sub>2</sub>4</sup>) --> log<sub>2</sub>4 = 2 --> O(n<sup>2</sup>)

#### Example 4 - Gauss's Int. Multiplication

* a = 3 (recursive calls)
* b = 2 (each number has n/2 digits)
* d = 1 (linear work outside of calls for addition etc)
* 3 > 2<sup>1</sup> --> Case 3
* Case 3 --> T(n) = O(n<sup>log<sub>b</sub>a</sup>)
    * T(n) = O(n<sup>log<sub>2</sub>3</sup>) = O(n<sup>1.59</sup>)
* Despite still being Case 3, it's better than the quadratic time of Naive Integer Multiplication

#### Example 5 - Strassen's Subcubic Matrix Multiplication

* Remember that In general, matrix multiplication requires 8 recursive calls, but if you're clever you can reduce that to 7 recursive calls
* a = 7
* b = 2 (half original)
* d = 2 (linear time in the dimension of the matrix, which means that since the matrix is n * n size, it's quadratic)
* Case 3 - 7 > 4
    * T(n) = O(n<sup>log<sub>b</sub>a</sup>) --> O(n<sup>log<sub>2</sub>7</sup>) --> O(n<sup>2.81</sup>), which again beats the original which was cubic time

#### Example 6 - Fictitious recurrence for Case 2

* Case 2 happens, we just haven't covered any examples of it yet
* Fictitious recurrence:
    * T(n) <= 2T(n/2) + O(n<sup>2</sup>)
* a = 2
* b = 2
* d = 2
* Case 2: a < b<sup>d</sup> --> 2 < 4
    * T(n) = O(n<sup>d</sup>) --> T(n) = O(n<sup>2</sup>)

## Proof Part 1 <a class="anchor" id="p1"></a>

* The computations are not the important part here, but they are worth seeing once. What _is_ important is ___the conceptual meaning of the three cases of the master method__.
    * The proof will follo a recursion tree approach, just like we used in the merge sort analysis, with each case applying a certain type of recursion tree
    * Remembering the recursion trees/cases witll allow you to 'reverse engineer' the running times from the conceptual understanding of what the three case mean and how they correspond to different trees
* __Assumptions:__
    1. Recurrence is in the following form:
        * (i) T(1) <= c
        * (ii) T(n) <= aT(n/b) + cn<sup>d</sup>
        * Where c is some constant suppressed in big-oh notation, a = the number of recursive calls, b is the factor by which the subproblems are divided and d is the work outside the recursive calls
    2. n = a power of b (Just to make our lives easier, it doens't really affect the proof)
* __General Idea:__
    * Generalize merge sort analysis (ie: recursion tree)  
<img src="resources/rec_tree.PNG">
* __Quiz Question__: At a given level j (= 0, 1, 2,....,log<sub>b</sub>n), how many distinct subproblems are there? What is the input size of each of those subproblems?
    * a<sup>j</sup> subproblems - Starting at level 0 (j = 0), each further problem makes _a_ further calls, so the number of problems increases by a factor of a
    * Subproblems have size n/b<sup>j</sup> - the input into each subproblem is decreased by a factor of j
    * These are not always the same - there are cases where a tree might recursively call 2 subproblems, but the input is not necessarily the exact output from the initial call
        * Ex: A function makes two recursive calls, each covering half the list. The list isn't actually shrunk but continues to undergo a given function (like a transformation or something), so even though the number of subproblems may increase by two each time, each subproblem is still taking n/2 (the initial half from the first call)
* ___Generalized Recursion Tree___
<img src="resources/master_tree.PNG">
1. Zoom into a single level j and calculate the total work done _not including_ that which would be done later by recursive calls
    * Look at the number of problems at level j and multiply that by a bound on the work done per sub-problem
        1. a<sup>j</sup> problems per level as we saw above
        2. Subproblem size = n/b<sup>j</sup>.
            * Size matters only inasmuch as it determines the amount of work (number of operations performed) per level j subproblem. This relationship is found in the recurrence
            * _Recurrence_ = how much work is done in a given subproblem. Includes both the recursive call work and that done outside of calls. 
            * We said above to ignore the parts done in the recursive calls and just count the work done at level j --> <= c * input size raised to the d power 
            * (n/b<sup>j</sup>)<sup>d</sup> * c
    * Now that we have the two parts, combine: 
        * <= a<sup>j</sup> * c * (n/b<sup>j</sup>)<sup>d</sup>
    * Simplify into those terms which are dependent on level j and those independent of j
    * a and b are both expressions of j, while c and n<sup>d</sup> are independent
        * cn<sup>d</sup> * (a/b<sup>d</sup>)<sup>j</sup>
    * We now have the relative magnitude of a, b, and d, the key players in master theorem
2. Sum over all the levels j=0, 1, 2,..., log<sub>b</sub>n
    * Because cn<sup>d</sup> is not dependent on j, we can pull it out ahead of the summation expression
<img src="resources/proof1_sum.PNG">
<blockquote>This somewhat messy formula is <i>crucial</i> to the rest of the proof and the 3 cases</blockquote>  

#### What did we just do?
* We used a recursion tree approach which gave us an upper bound of a running time of an algorithm, governed by the recurrence in the specified form.
* This very ugly formula now needs to be interpreted into the 3 cases of the Master Theorem

## Interpretation of the Three Cases <a class="anchor" id="int"></a>

* The entirety of the Master Theorem revolves around the relationship between a and b<sup>d</sup>
* a = number of recursive calls made by the algorithm = number of children each problem has = __rate at which subproblems proliferate as we proceed deeper into the tree__ = the factor by which there are more subproblems at the next level than the previous one --> ___RSP___
    * Could be considered "forces of evil" because this is what is likely to make your function run more slowly (as the recursion generates more and more subproblems)
* b<sup>d</sup> = __Rate of work shrinkage per subproblem__ = ___RWS___
    * "forces of good" - what will allow your function to move more quickly, because the amount of work being completed is reduced
#### Tug of war between forces of good and evil has 3 possible outcomes
1. Forces match --> a = b<sup>d</sup> --> The  amount of work is the same at every recursion level 
2. Evil wins --> a > b<sup>d</sup> --> The amount of work increases with recursion level
3. Good wins --> a < b<sup>d</sup> --> The amount of work decreases at every recursion level 
#### We know that the upper bound for level j is:
<img src="resources/master_sum.PNG">
* Given this, we can intuitively derive the formulas for our 3 cases
* Case 1: Where RSP = RWS (a = b<sup>d</sup>), we know that the work is constant per level, so we just need to account for the number of levels, j --> T(n) = O(n<sup>d</sup>logn))
    * If these two epxressions are equal, notice in the formula above that the expression a/b<sup>d</sup> is now equal to 1 for all j, meaning that when we go to calculate the total work, 1<sup>j</sup> of course = 1, and then the sum is just 1 summed with itself log<sub>b</sub>n + 1 times. This then gets multiplied by the cn<sup>d</sup> term, which is independent of j. For  big-oh, we suppress the constant, _c_, the log base (because all logarithms differ by a constant factor), and the last + 1
* Case 2: Where RSP < RWS (a < b<sup>d</sup>), we know that there is less work at each level, so the most work is being done at the root --> T(n) = O(n<sup>d</sup>)
* Case 3: Where RSP > RWS (a > b<sup>d</sup>), we know that there is _more_ work being done at each level, so the most work is happening at the leaves --> O(# leaves) --> T(n) = O(n<sup>log<sub>b</sub>a</sup>)

## Proof Part 2 <a class="anchor" id="p2"></a>

#### Short detour
* r = a/b<sup>d</sup>
* Sum up the powers of r stopping at the _k_ th power:
    * 1 + r + r<sup>2</sup> + r<sup>3</sup> + ...r<sup>k</sup> 
    * For this expression there is a formula that will represent the solution  
    <br>
<img src = "resources/master_p2.PNG">
* For r < 1, r <= 1/1-r
    * Consider r = 1/2 --> the sum of the powers (1/2, 1/4, 1/8...) is converging to 2 as k grows large --> 1 / (1/2) = 2
    * ___Note that for r < 1, the sum here is a constant (that is, independent of k)___
    * Another way to think about this is that for r < 1, the  first term (1) dominates, and regardless of k, the sume never exceeds the sum constant above
* For r > 1, r <= r<sup>k</sup> * 1 + (1/(r - 1))
    * Consider r = 2 --> the sume of the powers (2, 4, 6...)will never be greater than _twice_ the largest term
    * ___The second part of this expression (1 + 1/r-1) is also independent of k___
    * Another way to think about this is that for r < 1, the _largest_ term dominates (r<sup>k</sup>)
* __To summarize, if r > 1, the largest power of that constant will dominate the sum, where as if r < 1, that sum is just a constant__
#### Back to the Master Method
* Case 2: a < b<sup>d</sup> (that is, r < 1 and work is decreasing with each level)
    * While r does depend on a, b, and d, it is a constant in that it doesn't depend on n.
    * Since r < 1, we know that the upper bound is going to be a constant independent of the number of terms (ie: n)
    * Our general formula simplifies to c * n<sup>d</sup> * r
    * Suppress constants for big-oh --> __O(n<sup>d</sup>)__
    * Thus, this type of recursion tree is dominated by the root
* Case 3: a > b<sup>d</sup> (that is, r > 1 and work is increasing with each level)
    * Our general formula simplifies to c * n<sup>d</sup> * an additional constant factor, the dominating largest term (ie: the biggest value of j we will ever see)
    * Big-oh --> O(n<sup>d</sup> * r<sup>log<sub>b></sub>n</sup> --> O(n<sup>d</sup> * (a/b<sup>d</sup>)<sup>log<sub>b></sub>n</sup>
    * Simplifying the second term
        * Consider only the denominator of the r ratio (1/b<sup>d</sup>)<sup>log<sub>b></sub>n</sup>
        * b<sup>(-d)log<sub>b</sub>n</sup> = (b<sup>log<sub>b</sub>n</sup>)<sup>-d</sup>
        * b and log<sub>b</sub> cancel--> n<sup>-d</sup>
   * Substitute it back in:
       * O(n<sup>d</sup> * a<sup>log<sub>b</sub>n</sup>/n<sup>d</sup>) --> ___O(a<sup>log<sub>b</sub>n</sup>)___
   * log<sub>b</sub>n = the highest level j --> a<sup>j</sup> we already know is the number of subproblems per level j --> the last expression is really just the number of leaves
    * __Why is this different than the given Case 3 formula of n<sup>log<sub>b</sub>a</sup>?__
        * a<sup>log<sub>b</sub>n</sup> and n<sup>log<sub>b</sub>a</sup> are equivalent
        * While a<sup>log<sub>b</sub>n</sup> is more intuitive, n<sup>log<sub>b</sub>a</sup> is easier to apply