# Week 3 Review: 
---

To do a quick run down and analysis of an algorithm can be broken into pieces. Each of these resources require a cost. 
Not only will a cost analysis be required but also a complex analysis of space time. (Memory, Bandwidth, Hardware, etc..) 

Consider the size complexity of the arguements, and the limit one each of the word spaces that will be required. For long running inputs assume *some* integers are represented as by *c* lg *n* bits for the constant $ c \geq 1. c \geq 1$ is required to so that each word can hold the value of *n*. This enables the indexing of the individual input elements, and *c* is restricted to be a constant so the word size does not grow to an arbitrally large size in constant linear time. 

### Methods of the Analysis: 

Consider factors like the time that might be required per procedure. The input size might depends on the bounds being studied. The running time will be measured to indicate the complexity and growth, but constant time is required.

#### Quick Insertion Sort Analysis: 

First present the procedure with the time **"cost"**, and the **n** number of times that each statement is executed. 

Second the running time complexity is the sum of the running times for each statement executed. So if a statement thats *c* cost steps to execute, and execute *n* times will contribute the *cn* totol running times. The best case is as follows:

The insertion sort has a Time complexity of in best-case:
- $  T(n) = c_{1}n + c_{2}(n-1) + c_{3}(n-1) + c_{4}(n-1) + c_{5}\Sigma_{j=2}^n t_{j} + c_{6}\Sigma_{j=2}^n (t_{j}-1) + c_{7}\Sigma_{j=2}^n (t_{j}-1) + c_{8}(n-1) $

The running time can be expressed as *an + b* for constants *a* and *b* that depends on the statement cost *c*., Thus this can remain a linear function. ..

The worst case would be written as follows: 
- $ \scriptsize
T(n) = c_{1}n + c_{2}(n-1) + c_{3}(n-1) + c_{4}(n-1) + c_{5}(\frac{n(n+1)}{2} - 1) + c_{6}(\frac{n(n-1)}{2}) + c_{7}(\frac{n(n-1)}{2}) + c_{8}(n-1)\\ 
  \scriptsize 
= (\frac{c_{5}}{2} + \frac{c_{6}}{2} + \frac{c_{7}}{2})n^{2} + ( c_{1} + c_{2} + c_{4} + \frac{c_{5}}{2} - \frac{c_{6}}{2} - \frac{c_{7}}{2} + c_{8}) - (c_{2} + c_{4} + c_{5} + c_{8}) $

because the steps will execute an *n* number of times to contribute to the total running time. From the case about, it can be show that the worst-case running time shares bounds similar to that of the quadratic function $an^{2}+bn+c$ for constant *a,b, and c* that again depend on the statement costs *c* it is thus a ***quadratic function*** of *n*.

###### To show:
This provided a good example of what is needed to understand the algorithm in terms of operations, cost, time, and growth. It can help represent the shape of the operation in place, and can ease the understanding of how complex it truly is. 

### Understanding Order of Growth:

A simplified method of breaking this analysis down will include:
    - *Understanding* where each constant *c* is actually consider a value of *cost*
    - *Observe* what order of *n* equations there are
    - *State* the absolute upper and lower boundaries of the Time complexity in terms of worse case.
    
### Designing Algorithms: 

There is a wide range of ways to design "algorithms". Some may use an *increamental* approach, and others might use ***divide-and-conquer***. This method is used to design algorithms that are of *n* lg *n* time complexity. 

These algorithms will utilize *recursion* to solve as many relative subproblems as it can before solving a larger sum of equations. 

The Divide and Conquer paradigm involves three steps at each level of recursion: 
- **Divide** the problem into a number of subproblems that are smaller instances of the same problems.
- **Conquer** the subproblems by solving them recursively. 
- **Combine** the solutions to the subproblems into the solution for the original problem. 

The MERGE-SORT is a good example of just how well this type of implementation will work. 

---

#### Smaller Week 2 Review: 

   

# Week 4 - Design and Analysis of Algorithms
---
### Monday 2/06 - Standard Notation and Common Functions

This section reviews some standard mathematical functions and notations and explores the relationships among them. It also illustrates the use of the *asymptotic notations*.

#### Monotonicity:

Vocab: 
- ***monotonically increasing***: a function *f(n)* is monotonically increasing if *m* $\leq$ *n* implies $f(m0) \leq f(n)$. 
- ***monotonically decreasing*** if *m* < *n* implies *f(m)* > *f(n)*.

#### Floors and Ceilings:

Vocab:
- ***floor*** the greatest integer that is *less than* or equal to *x* by $\lfloor x \rfloor$
- ***ceiling*** the least integer greater than or equal to *x* by $\lceil x \rceil$

For any real number *x*, we denote the greatest integer less than or equal to *x* by $\lceil x \rceil$. For all real *x*,

- $ x - 1 < \lfloor x \rfloor \leq x \leq \lceil x \rceil < x + 1 $

For any integer *n*,

- $ \lceil \frac{n}{2} \rceil + \lfloor \frac{n}{2} \rfloor = n$,

and for any real number $x \geq 0$ and integers *a*, *b* > 0,

- $ \lceil \frac{\lceil x/a \rceil}{b} \rceil $ = $\lceil \frac{x}{ab} \rceil $

- $ \lfloor \frac{\lfloor x/a \rfloor}{b} \rfloor $ = $\lfloor \frac{x}{ab} \rfloor $

- $ \lceil \frac{a}{b} \rceil \leq \frac{a + (b - 1)}{b} $
- $ \lfloor \frac{a}{b} \rfloor \geq \frac{a + (b - 1)}{b} $

The floor and ceiling function are considered to be *monotonically increasing*

#### Modular Arithmetic:

For any integer *a* and any positive integer *n*, the value *a* mod *n* is the ***remainder*** ( or the *residue*) of the quotient *a/n*:

- *a* mod *n* = a - n$\lfloor a/n \rfloor$

It follows that 
- $ 0 \leq a mod n < n$

Vocab: 
- ***remainder** (or **residue**)* 
- ***equivalent***
 
#### Polynomials: *

Vocab:
- ***polynomial** in **n of degree d*** 
- ***coefficient***
- ***polynomially bounded*** 

Given a nonnegative integer *d* a ***polynomial in n of degree d*** is a function *p(n)* of the form

- *p(n)* = $\Sigma_{i = 0}^{d} a_{i}n^{i}$

where the constants $a_{0}$, $a_{1}$, ..., $a_{d}$ are the ***coefficients*** of the polynomial $a_{d} \neq 0$. A polynomial is asymptotically positive if and only if $a_{d} > 0$. For an asymptotically positive polynomial *p(n)* of degree *d*, there is$ p(n) = \Theta(n^{d})$. For any real constant $ a \geq 0$, the function $n^{a}$ is monotonically increasing, and for any real constant $a \leq 0 $, the function $n^{a}$ is monotonically decreasing. We say that a function *f(n)* is ***polynomially bounded*** if $ f(n) = O(n^{k}) $ for some constant k.

#### Exponentials:  *

For all *n* and $a \geq 1$, the function $a^{n}$. When convenient, assume $0^{
0}$ = 1.

The rate of growth of polynomials and exponentials by ...

Thus, any exponential function with a base strictly greater than 1 grows 
faster than any polynomial function.

(In this equation, the asymptotic notation is used to describe the limiting behavior as $x \rightarrow 0$ rather than as $x \rightarrow \infty$) We have for all *x*,


#### Logarithms: *

- ***binary logarithm***
- ***natural logarithm***
- ***exponentiation***
- ***composition***

#### Factorials:

The notation *n*!

A weak upper bound on the factorial function is *n*! $\leq n^{n}$, since each of the *n* terms in the factorial product ais at most *n*. ***Stirling's approximation***,

where *e* is the base of the natural logarithm, gives a tighter upper bound, and a lower bound as well

#### Functional Iteration:

The notation $f^{i}(n)$ to denote the function *f(n)* iteratively applied *i* thimes to an initial value of *n*. 

It is to be used over the set of all nonnegative real numbers, and will iterate recursively

#### The Iterated Logarithm Functions:

The notation lg* *n* (read "log star of *n*) to denote the iterated logaarithm. Let $lg^{i}n$ be as defined above, with *f(n)* = lg *n*. Because the logarithm of a nonpositive number is undefined, $lg^{i}$*n* is defined only if $lg^{i-1}$*n* > 0. Be sure to distinguish $lg^{i}$*n* (the logarithm fnuction applied *i* times in succession, starting with argument *n*) from $lg^{i}$*n* (the logarithm of *n* raised to the 
*i*th power). 

- lg* *n* = min{*i* $\geq 0$ : $lg^{i}$*n* $\leq$ 1}

The number of atoms in the observable universe is estimated to be aabout $10^{
80}$, which is much less than $2^{65536}$, the input size *n* such that $lg^{i}$*n* > 5. 

#### Fibonacci numbers:

The ***Fibonacci numbers*** are a reoccurant sum, where each Fibonacci number is the sum of the two previous ones. These numbers are related to the ***golden ratio $\phi$*** and to its conjugate $ \hat{\Phi} $, which are the two roots of the equation:
- $ x^{2} = x + 1 $
aand are given by the following



---


In [4]:
arr = [13,-3,-25,20,-3,-16,-23,18,2,-7,12,-5,-22,15,-4,7]

# O(n^3) time
def naiveMaxSubarray(arr:list) -> list:
    #two nested loops to compare all possible pairs
    
    #assume the arr is not empty. otherwise there is no max subarray
    currentMax = arr[0]
    i_max, j_max = 0,0
    n = len(arr)
    
    for i in range(n):
        # write sum logic to get rid of sum function in if-condition
        for j in range(i,n):
            if sum(arr[i:j]) > currentMax: # takes linear time, take previous and add next number
                currentMax = sum(arr[i:j])
                i_max, j_max = i,j
    return [i_max, j_max-1] # boundary erros
   
    
    
print("Max Subarray Summation: ",naiveMaxSubarray(arr))

[7, 10]


### Perform a Design Analysis on a Naive Max Subarray sort
---
#### Inductive, and Proportional Reasoning: 
It can be notice that this algorithm is using a divide-and-conquer paradigm. This can be show by taking the following steps into consideration:
- **Divide** the MaxSubarray into 2 smaller Subarrays
- **Conquer** each side of the divided MaxSubarray until the larges quantity is found
- **Combine** when the maxSubarray is crossing the mid point. (this i dont quite understand)

#### Checking for Loop Invariant conditions:

##### Initialization: 
what happens first before the loop: 

the currentMax is assigned as the first value of the index. 
counter terms are written to track the progression of index i, and j
the size of the array is measured and assigned to the value n

##### Maintenance:
what happens when the loop executes (what changes)

##### Termination:
what happens after everything has changed (what it might use to break loop) what happens when there's no loop, etc...



---

### Chapter 4 - Divide-and-Conquer (but closer)

Recall the Divide and Conquer algorithm can be solved recursively, applying three steps at each level of the recursions:

- **Divide**: the problem into *n* equal parts/*subproblems* that are smaller instances of the same problem.
- **Conquer**: the subproblem by solving them recursively; if the subproblem sizes are small enough they can be solved in a simple manner
- **Combine**: the solutions to the subproblems into the soltions for the original


#### HW Hints
---

You can use divide and conquer paradigm to improve the worst case running time

Notice: that the max subarray was in the middle of the first array, and also that it is challenging to report accurate data when the last index of j is unbounded...

Question: How can you check the end values between both arrays

Solution: 
1. Completely in left range
2. Completely in right range
3. The maximum subarray is crossing mid point



Note the Subarray is the most import, in order to improve time complexity

<img src='./screenshots/find-max-subarray.png' width='300'>
<!-- <img src='./screenshots/find-max-crossing-subarray.png' width='250'> -->

---

Write your own version of the code here:

In [6]:
import math

arr = [13,-3,-25,20,-3,-16,-23,18,20,-7,12,-5,-22,15,-4,7]

# make O(n^2)
def findMaxSubarray(arr, low, high):
#     invalid range: low is greater than high
    if (low > high):
        return -10000
    
    if (low == high):
        return arr[low]
    mid = (low + high) // 2
    
    return max(findMaxSubarray(arr, low, mid-1),
              findMaxSubarray(arr, mid+1, high), 
              findMaxCrossingSubarray(arr, low, mid, high))
        

def findMaxCrossingSubarray(arr, low, mid, high):
#   sentinial characters to allow for the lowest bound possible
    left_sum  = -(math.inf)
    right_sum = math.inf
    max_left  = -(math.inf)
    max_right = math.inf
    max_sum   = 0
    
    for i in range(mid, low-1, 1):
        max_sum = max_sum + arr[i]
        if(max_sum > left_sum):
            left_sum = max_sum
            max_left = i
    
    max_sum = 0
    for j in range(mid+1, high,1):
        max_sum = max_sum + arr[j]
        if(max_sum > right_sum):
            right_sum = max_sum
            max_right = j
    # returns; 
    #    - sum of the elements on the left and right of the middle
    #    returning only left_sum + right_sum will fail
    return max(left_sum + right_sum - arr[mid], 
            left_sum, right_sum)

n = len(arr)
max_sum = findMaxSubarray(arr, 0, n-1)
print("Maximum contiguous sum is ", max_sum)

# help from GeeksToGeeks

Maximum contiguous sum is  20


In [7]:
import math
# Brute Force Method for finding Maximun Subarray

def bruteForceSubarray(arr):
    n = len(arr)
    max_sum = math.inf
    for i in range(1,n):
        sm = 0
        for j in range(1,n):
            sm = sum + arr[j]
            low = i
            high = j
    return (low, high, max_sum)

#### A brute-force solution:

A brute force algorithm can easily be devised to solve this problem: trying every possible summation. A period of *n* elements has $\binom{n}{2}$ possible combinations. Since $\binom{n}{2}$ is $\Theta(n^{2})$, and the best we can hope for is to evaluate each possible pair in constant time, this approach would take $ \Omega(n^{2})$ time.

#### A transformation:

In order to design an algorithm with an *o*$(n^{2})$ running time, look at the input a slightly different way. Try to find the sequence of elements over which the sum changes from the first index to the last maximum. Instead of looking at *all* the elements. Consider the daily change in price, where the change in index *i* is the difference between the index after *i* - 1 and *i*. 

<img src='./screenshots/volatile-chart.png'>

The talbe shows changes at the bottom row. If this row is treated as an array of *A*, it can be used to find the ***maximum subarray***. The maximum subarray of *A[1 ... 16]* is *A*[8 .. 11], with the sum 43. 

This transformation might not help. It is still helpful to chech $\binom{n-1}{2} = \Theta(n^{2})$ subarrays for a period of *n* elements. It can be organized the computation so that each subarray sum takes $O(1)$ time, given the values of previously computed subarray sums, so that the brute-force solution takes $\Theta(n^{2})$ time.

<img src='./screenshots/max-subarray.png'>

There is a better way. 

### A solution using divide-and-conquer:

To solve this problem the *divide-and-conquer* technique. Suppose to find the maximum subarray of the subarray *A*[*low* .. *high*]. Divide-and-Conquer suggests that the subarray is divided into two subarrays of as equal size as possible. Find a midpoint of the subarrays *A*[*low* .. *mid*] and *A*[*mid* + 1 .. *high*]. Therefore, any contiguous subarray *A*[*i* .. *j*]  of *A*[*low* .. *high*] must lie in exactly one of the following places:

- entirely in the subarray *A*[*low* .. *mid*], so that $low \leq i \leq j \leq mid$
- entirely in the subarray *A*[*mid* + 1 .. *high*], so that $mid \leq i \leq j \leq high$, 
- crossing the midpoint, so that $low \leq i \leq mid \leq j \leq high$

Consequently, a maximum subarray of *A*[*low* .. *high*] must lie in exactly one of these places. In fact, a maximum subarray of *A*[*low* .. *high*] must have the greatest sum over all subarrays entirely in *A*[*mid* + 1 .. *high*], or crossing the midpoint. The maximum subarray can be found in *A*[*low* .. *mid*] and *A*[*mid* + 1 .. *high*] recursively, because these two subproblems are smaller instances of the problem of finding a maximum subarray. Thus, all that is left to do is find a maximum subarray that crosses the midpoint, and take a subarray with the largest sum of the three.

<img src='./screenshots/midpoint-subarray.png'>

The crossing at the midpoint can easily be found in linear time based on the size of the subarray *A*[*low* .. *high*]. The problem is *not* a smaller instance of our original problem, because it has the added restriction that the subarray it chooses must cross the midpoint. Any subarray crossing the midpoint is itself made of two subarray *A*[*i* .. *mid**] and *A*[*mid* + 1 .. *j*] and then combine them. THe procedure FIND-MAX-CROSSING-SUBARRAY takes as input the array *A* and the indices *low*, *mid*, and *high*, and it returns a tuple containing the indices demarcating a maximum subarray that crosses the midpoint, along with the sum of the values in a maximum subarray. 

<img src='./screenshots/find-max-crossing-subarray.png' width='250'>

The procedure works as follows. Lines 1-7 find a maximum subarray of the left half, *A*[*low* .. *mid*]. Since the subarray must contain *A*[*mid*], the **for** loop of lines 3-7 starts the index *i* at *mid* and works down to *low*, so that every subarray it considers is of the form *A*[*i* .. *mid**]. Whenever a subarray *A*[*i* .. *mid*] with a sum of values greater than *left-sum*, the *left-sum* is updated to this subarray's sum in line 6, and in line 7 update the variable *max-left* to record this index *j* at *mid* + 1 and works up to *high*, so that every subarray it considers is of the form *A*[*mid* + 1 .. *j*]. Finally, line 15 *sum* + *right-sum* of the values in the subarray *A*[max-left .. max-right].

If the subarray *A*[*low* .. *high*] contains *n* entries (so that *n* = *high* - *low* + 1), then the claim can be made that the FIND-MAX-CROSSING-SUBARRAY(*A, low, mid, high*) takes $\Theta(n)$ time. Since each iteration of each of the two **for** loops takes $\Theta(1)$ time, count up the interations there are altoghter. The **for** loop of lines 3-7 makes *mid* - *low* + 1 iterations, and the **for** loop of lines 10-14 meakes *high* - *mid* iterations, and so the total number of iterations is 

- (mid - low + 1) + (high - mid) = *high* - *low* + 1 = *n*

With a linear-time FIND-MAX-CROSSING-SUBARRAY procedure in hand, the qseudocode for divide-and-conquer algorithm to solve the maximum-subarray problem:

<img src='./screenshots/find-max-subarray.png' width = '300'>

The initial call FIND-MAXIMUM-SUBARRAY(*A*, 1, *A.length*) will find a maximum subarray of *A*[1 .. *n*].

Similar to FIND-MAXIMUM-CROSSING-SUBARRAY, the recursive procedure FIND-MAXIMUM-SUBARRAY returns a tuple containing the indicies that demarcate a maximum subarray, along with the sum of the values in a maximum subarray. Line 1 tests for the base case, where the subarray has just one element. A subarray with one element has only one subarray. Line 2 returns a tuple with the starting and ending indeicies of just the one element, along with its value. Lines 3-11 handle the recursive case. Line 3 does the divide part, computing the index *mid* of the midpoint. Let's refer to the subarray *A*[*low* .. *mid*] as the ***left subarray*** and to *A*[*mid* + 1 .. *high*] as the ***right subarray***. Becuase the subarray *A*[*low* .. *high*] contains at least two elements, each of the left and right subarrays must have a least one lement. Lines 4 and 5 conquer by recursively finding maximum subarrays within the left and right aubarrays. Lines 6-11 form the combine part. Line 6 finds a maximum subarray that crosses the midpoint. (Recall that becuase line 6 solves a subproblem that is not a smaller instance of the original problem, consider it to be in the combine part.) Line 7 tests whether the left subarray contains a subarray with the maximum sum, and line 8 returns that maximum subarray. Otherwise, line 9 tests the left nor right subarrays contain a subarray achieving the maximum sum, then a maximum subarray must cross the midpoint, and line 11 returns it. 

### Analyzing the Divide-and-Conquer Algorithm:

Setting up a recurrence that describes the running time of the recursive FIND-MAXIMUM-SUBARRAY procedure. The simplified assumption that the original problem size is a power of 2, so that all subproblem sizes are integers. 

Denote the running time of FIND-MAXIMUM-SUBARRAY as T(n) on a subarray of *n* elements. Line 1 takes constant time. The base case, when *n* = 1, is easy: line 2 takes constant time, 
- $T(1) = \Theta(1)$

The recursive case occurs when *n* > 1. Lines 1 and 3 take constant time. Each of the subproblems solved in lines 4 and 5 is on a subarray of $n/2$ elements (the assumption that the original problem size is a power of 2 ensures that $n/2$ is an integer). So the time spent $T(n/2)$ time solving each of them. There has to be two subproblems to solve- for the left subarray and for the right subarray. The contribution to the running time from lines 4 and 5 comes to $2T(n/2)$. The call to FIND-MAX-CROSSING-SUBARRAY in line 6 takes $\Theta(n)$ time. Lines 7-11 take only $\Theta(1)$ time. For the recursive case, it will be:

- $T(n) = \Theta(1) if n = 1$
- = $2T(n/2)+\Theta(n) if n > 1$

Combining equations will give a recurrence for the running time *T(n)* of FIND-MAX-SUBARRAY:

\begin{equation*}
T(n)=\begin{cases}
          \Theta(1) \quad &\text{if} \, n = 1 \\
          2T(n/2)+\Theta(n) \quad &\text{if} \, n > 1 \\
     \end{cases}
\end{equation*}



---

### Wednesday - 2/08 

#### Class Intros: 


**Quiz Topic**: 
   - insertion sort
   - loop invariant (i.s., sum of array, linear search, bubble)
   - Merge sort: DNC, etc.
   - Big Theta, Big Theta: definitions, finding C1 & C2, N0
   - maxSubArray: DNC,
    
*Note 4.1-3, 4.1-5 **(Sliding Window Technique)**

- 10 T/F
- 3-7 Pseudocode Analysis and Design


**Problem**: find a subarray that its summations is maximum from a given array (python)
 - closed range, both sides

**DNC**: it is based on that the problem can be divided into subsarrays, mainly:
- Maximum Subarray could be purely on the left half
- Maximum Subarray could be purely on the right half
- Maximum Subarray could be purely somewhere that crosses the midpoint.

[10, 20, -1, -5, 4, -2, 2] 



The Maximum Subarray is in one and only one case


---


## Week 4 - Closing references.


