***
## Gradient Descent Algorithm with MSE
***
1.&emsp;Mean Squared Error (MSE) Formula: 

$$MSE = \frac{1}{n}\sum_{i=1}^{n}\left ( y_i - \hat{y_i} \right )^2$$ 

&emsp;&emsp; where $\hat{y_i} = \left ( mx_i + b \right )$

2.&emsp; Gradient of $MSE$ wrt. $m$ and $b$:

$$\frac{\partial }{\partial m} = \frac{2}{n}\sum_{i=1}^{n}-x_i\left ( y_i - \hat{y_i} \right )$$ 

$$\frac{\partial }{\partial b} = \frac{2}{n}\sum_{i=1}^{n}-\left ( y_i - \hat{y_i} \right )$$ 
***

In [9]:
import numpy as np


def mse(y_ref, y_pred):
    n = len(y_ref)
    return (1/n) * sum([v**2 for v in (y_ref - y_pred)])


def gradient_descent_with_mse(points):
    m, b = 0, 0
    n_iter = 10
    n = len(points)
    L = 0.1
                        
    y_ref = np.asarray([p[1] for p in points])
    x = np.asarray([p[0] for p in points])
    
    for i in range(n_iter):
        y_pred = m * x + b
        cost = mse(y_ref, y_pred)
        
        md = (2/n) * sum(-x * (y_ref - y_pred))
        bd = (2/n) * sum(-(y_ref - y_pred))
        
        m = m - (L * md)
        b = b - (L * bd)
        
        print(f"Iter {i} : cost {cost}, m {m}, b {b}")
        
    return m, b
        

In [10]:
points = [[0,0], [1,1], [1.9,2], [3,3.2], [4,4.1], [5,5.11]]
m, b = gradient_descent_with_mse(points)

Iter 0 : cost 9.693683333333333, m 1.8783333333333332, b 0.5136666666666667
Iter 1 : cost 8.941584930555553, m 0.08235277777777794, b -0.008305555555555455
Iter 2 : cost 8.247885703065183, m 1.814901697222222, b 0.4661203425925926
Iter 3 : cost 7.608047518109974, m 0.15800253755771632, b -0.014838235546296208
Iter 4 : cost 7.0178853060435245, m 1.7560882420114807, b 0.42332148457596397
Iter 5 : cost 6.4735395447758055, m 0.22750594146385295, b -0.01986663920493098
Iter 6 : cost 5.971450894116246, m 1.701569723490935, b 0.3847787377090082
Iter 7 : cost 5.508336811394608, m 0.2913721971007954, b -0.02362330583329114
Iter 8 : cost 5.081169993042365, m 1.6510439162088488, b 0.3500531641066387
Iter 9 : cost 4.687158499057824, m 0.3500672358970438, b -0.026309280431750626


***
## Stochastic Gradient Descent with MSE
***
Use only one sample to update the parameters.
***

In [30]:
def stochastic_gradient_descent_with_mse(points):
    m, b = 0, 0
    n_iter = 10
    n = len(points)
    L = 0.01
    sample_size = 1
                        
    y_ref = np.asarray([p[1] for p in points])
    x = np.asarray([p[0] for p in points])
    
    for i in range(n_iter):
        
        indexes = np.random.randint(0, n, sample_size)
        xs = np.take(x, indexes)
        y_refs = np.take(y_ref, indexes)
        
        y_preds = m * xs + b
        y_pred = m * x + b
        cost = mse(y_ref, y_pred)
        
        md = (2/sample_size) * sum(-xs * (y_refs - y_preds))
        bd = (2/sample_size) * sum(-(y_refs - y_preds))
        
        m = m - (L * md)
        b = b - (L * bd)
        
        print(f"Iter {i} : cost {cost}, m {m}, b {b}")
        
    return m, b

In [31]:
m, b = stochastic_gradient_descent_with_mse(points)

Iter 0 : cost 9.693683333333333, m 0.19200000000000003, b 0.064
Iter 1 : cost 6.159186906666666, m 0.20688000000000004, b 0.07888
Iter 2 : cost 5.879423911850666, m 0.26494582400000005, b 0.10944096
Iter 3 : cost 4.949849523857871, m 0.31765797902720005, b 0.137184199488
Iter 4 : cost 4.176011478032049, m 0.5330326897794561, b 0.191027877176064
Iter 5 : cost 1.8285247035498506, m 0.5632886702446889, b 0.2069520774209234
Iter 6 : cost 1.5599181827384139, m 0.6414795849553895, b 0.23301571565782359
Iter 7 : cost 0.9896223093769781, m 0.6623101617266131, b 0.24397917711636233
Iter 8 : cost 0.8546293588614731, m 0.7588525758047878, b 0.26811478063590605
Iter 9 : cost 0.3863482747672564, m 0.7698750581675177, b 0.273916087142606


## References
- [Gradient Descent From Scratch](https://towardsdatascience.com/gradient-descent-from-scratch-e8b75fa986cc)
- [Machine Learning Tutorial Python - 4: Gradient Descent and Cost Function](https://www.youtube.com/watch?v=vsWrXfO3wWw&ab_channel=codebasics)