# Mean Squared Error (MSE)

$$
\mathrm{MSE} \;=\; \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat y_i)^{2},
\quad
\hat y_i \;=\; m\,x_i + b
$$

---

## Gradients

**Gradient w.r.t. m**

$$
\frac{\partial\,\mathrm{MSE}}{\partial m}
=
-\;\frac{2}{n}\sum_{i=1}^{n}x_i\,(y_i - \hat y_i)
=
\frac{2}{n}\sum_{i=1}^{n}(\hat y_i - y_i)\,x_i
$$

**Gradient w.r.t. b**

$$
\frac{\partial\,\mathrm{MSE}}{\partial b}
=
-\;\frac{2}{n}\sum_{i=1}^{n}(y_i - \hat y_i)
=
\frac{2}{n}\sum_{i=1}^{n}(\hat y_i - y_i)
$$


In [14]:
import numpy as np


def gradient_descent(x,y, lr=0.1 , epochs=100):
    
    m, b = 0.0, 0.0
    

    
    for epoch in range(epochs):
        y_pred = m*x + b
        error = (y- y_pred)
        cost = np.mean(error**2)
        
        dm = -2*np.mean(error*x)
        db = -2*np.mean(error)
        
        
        b -= db*lr
        m -= dm*lr
        print(f"m={m} , b={b} ,Epoch {epoch}: Cost= {cost}")




In [15]:
x = np.array([1,2,3,4,5])
y = np.array([5,7,9,11,13])
    
gradient_descent(x,y)

m=6.2 , b=1.8 ,Epoch 0: Cost= 89.0
m=-2.320000000000001 , b=-0.4800000000000002 ,Epoch 1: Cost= 165.24
m=9.272000000000002 , b=2.8080000000000007 ,Epoch 2: Cost= 307.59840000000014
m=-6.611200000000004 , b=-1.5168000000000008 ,Epoch 3: Cost= 573.3613440000004
m=15.043520000000008 , b=4.553280000000003 ,Epoch 4: Cost= 1069.452311040001
m=-14.584192000000009 , b=-3.583488000000001 ,Epoch 5: Cost= 1995.4416651264023
m=25.851123200000018 , b=7.683724800000004 ,Epoch 6: Cost= 3723.822955597828
m=-29.43158272000003 , b=-7.563694080000008 ,Epoch 7: Cost= 6949.846718357678
m=46.05611571200004 , b=13.407994368000011 ,Epoch 8: Cost= 12971.181693567845
m=-57.112135475200056 , b=-15.10727393280002 ,Epoch 9: Cost= 24209.895819923546
m=83.79892692992007 , b=23.98146213888002 ,Epoch 10: Cost= 45186.72131291331
m=-108.74758959923213 , b=-29.29418644684803 ,Epoch 11: Cost= 84339.49719952248
m=154.27361938718735 , b=43.61320460206085 ,Epoch 12: Cost= 157417.24894304285
m=-205.09626602586133 , b=-55.8736