## Gradient Descent

### 1. Problem Understanding

-   What is the problem you are trying to solve

    Finding and optimazing w and b using gradient descent. By finding the most optimal values for w and b we can have a better fitting model.

-   What kind of data are you working with?

    Numerical and gaussian distribution data that is linear

-   What are the goals and objectives of the project?

    To find the most optimal values that will yield the less cost
    
-   What is the expected output of the machine learning algorithm?

    Two numbers, w and b optimized

-   What are the constraints and limitations of the problem?

    None?

### 2. Equation


In lecture, *gradient descent* was described as:

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline
\;  w &= w -  \alpha \frac{\partial J(w,b)}{\partial w} \tag{3}  \; \newline 
 b &= b -  \alpha \frac{\partial J(w,b)}{\partial b}  \newline \rbrace
\end{align*}$$
where, parameters $w$, $b$ are updated simultaneously.  
The gradient is defined as:
$$
\begin{align}
\frac{\partial J(w,b)}{\partial w}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)} \tag{4}\\
  \frac{\partial J(w,b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)}) \tag{5}\\
\end{align}
$$

Here *simultaniously* means that you calculate the partial derivatives for all the parameters before updating any of the parameters.

### 3. Code Implementation

In [44]:
import numpy as np
import math

x = np.array([2,5,8,10])
y = np.array([100,300,600,900])
m = len(x)
w = 0
b = 0

def compute_cost(x,y,w,b):
    m = len(x)
    f_wb = w * x + b
    cost =  (f_wb - y) ** 2
    total_cost = (1/(2*m)) * np.sum(cost)
    return total_cost

def compute_gradient(x,y,w,b):
    f_wb = w * x + b
    dj_dw = (1/m) * np.sum((f_wb - y) * x)
    dj_db = (1/m) * np.sum(f_wb - y)
    return dj_dw, dj_db

compute_gradient(x,y,w,b)


def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function): 

    J_history = []
    p_history = []
    b = b_in
    w = w_in

    for i in range(num_iters):
        dj_dw, dj_db = gradient_function(x,y,w,b)
        w = w - alpha * dj_dw
        b = b - alpha * dj_db

        if i < 100000:
            J_history.append(cost_function(x,y,w,b))
            p_history.append([w,b])

    return w, b, J_history, p_history


w_init = 0
b_init = 0
iterations = 10000
tmp_alpha = 1.0e-2

w_final, b_final, J_hist, p_hist = gradient_descent(x,y,w_init, b_init, tmp_alpha, iterations, compute_cost,compute_gradient)

print(f"(w,b) found by gradient descent: ({w_final:8.4f},{b_final:8.4f})")


(w,b) found by gradient descent: ( 98.6395,-141.4966)


In [45]:
compute_cost(x,y,w_final, b_final)

1241.4965986394564