## **Optimization Algorithms**

In the real world the humans often want to optimizate our task, for example we want to do the most amount of tasts in the shortest time, or in the case of a company may want to get more sales but spend less on advertising.

In ML and DL we want get the model with the less error posible, for this task we can use an optimization algorithm like the **Gradient Descent**

### **Gradient Descent**

This algorith seeks minimize the cost function, for example in regressionwe can minimize the MSE. If we want to apply this algorithm, the function should meet two requirements:
* Differentiable -< have a derivate
* Convex


#### **Gradient Descent in Linear Regression**

The equation of linear regression is:

$\hat{y} = b + wx$

The cost function:

* $ J(w,b) = \frac{1}{N}\sum_{i=1}^{n}{(y^{i}-\hat{y})^2} $
   * $ J(w,b) = \frac{1}{N}\sum_{i=1}^{n}{(y^{i}-(wx^{i}+b))^2} $

To develop this algorithm we need values for w, b and a learning-rate, the learning rate is a small value for example 0.01

* $ w = w - \alpha \frac{\partial J}{\partial w}J(w,b) $
   * $ \frac{\partial J}{\partial w}J(w,b) = -2x^{i}(y^{i}-(wx^{i}+b)) $

* $ b = b - \alpha \frac{\partial J}{\partial b}J(w,b) $
   * $ \frac{\partial J}{\partial b}J(w,b) = -2(y^{i}-(wx^{i}+b)) $

In [2]:
# Gradient Descent for Linear Regression
# yhat = wx + b 
# loss = (y-yhat)**2 / N 
import numpy as np
# Initialise some parameters
x = np.random.randn(10,1)
y = 5*x + np.random.rand()
# Parameters
w = 0.0 
b = 0.0 
# Hyperparameter 
learning_rate = 0.01

# Create gradient descent function
def descend(x, y, w, b, learning_rate): 
    djdw = 0.0 
    djdb = 0.0 
    N = x.shape[0]
    # loss = (y-(wx+b)))**2
    for xi, yi in zip(x,y):
        djdw += -2*xi*(yi-(w*xi+b))
        djdb += -2*(yi-(w*xi+b))
    
    # Make an update to the w parameter 
    w = w - learning_rate*(1/N)*djdw
    b = b - learning_rate*(1/N)*djdb
    return w, b 

# Iteratively make updates
for epoch in range(800): 
    w,b = descend(x,y,w,b,learning_rate)
    yhat = w*x + b
    loss = np.divide(np.sum((y-yhat)**2, axis=0), x.shape[0]) 
    print(f'{epoch} loss is {loss}, paramters w:{w}, b:{b}')
print(x,y)

0 loss is [32.75208502], paramters w:[0.14519672], b:[-0.04138703]
1 loss is [30.6680473], paramters w:[0.28553151], b:[-0.08037741]
2 loss is [28.72809926], paramters w:[0.42117808], b:[-0.11707159]
3 loss is [26.92193287], paramters w:[0.5523038], b:[-0.15156617]
4 loss is [25.2399871], paramters w:[0.67906989], b:[-0.18395399]
5 loss is [23.67339354], paramters w:[0.80163166], b:[-0.21432428]
6 loss is [22.21392598], paramters w:[0.92013876], b:[-0.24276283]
7 loss is [20.85395363], paramters w:[1.03473534], b:[-0.26935209]
8 loss is [19.58639785], paramters w:[1.14556028], b:[-0.2941713]
9 loss is [18.40469189], paramters w:[1.25274738], b:[-0.3172966]
10 loss is [17.30274371], paramters w:[1.35642556], b:[-0.3388012]
11 loss is [16.27490142], paramters w:[1.45671902], b:[-0.35875541]
12 loss is [15.31592127], paramters w:[1.55374743], b:[-0.37722683]
13 loss is [14.42093797], paramters w:[1.64762607], b:[-0.39428038]
14 loss is [13.58543717], paramters w:[1.73846605], b:[-0.409978