### The math
$y = Xb$ where $y$ are predictions, $X$ are features and $b$ are trained model parameters

For MSE we have closed form solution($\hat{y}$ as labels):
$
b=(X^TX)^{-1}X^T \hat{y}
$

In [20]:
import numpy as np
from numpy.typing import NDArray

n = 10 # 10 data
k = 5 # num of feautres

features = np.random.randn(n, k)
labels = np.random.randn(n, 1)

X = features
y_hat = labels

**LR with closed form**

In [21]:
b_closed_form = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y_hat)
print(b)

def MSE(y: NDArray, y_pred: NDArray) -> float:
    return sum(((y-y_pred) ** 2)/len(y))

print(MSE(y_hat, X.dot(b_closed_form)))

[[ 0.56149848]
 [-1.09514572]
 [ 0.30861099]
 [-0.85188994]
 [ 0.30270015]]
[0.20643985]


**LR through gradient descent**

The math:
$
L = \frac{1}{n}\sum_{i=1}^n(\hat{y_i}-\sum_{j=1}^kx_{ij}b_j)^2
\Rightarrow \frac{\partial L}{\partial b_m} = -\frac{2}{n}\sum_{i=1}^n x_{im}(\hat{y_i}-\sum_{j=1}^kx_{ij}b_j)
\Rightarrow \frac{\partial L}{\partial b} = -\frac{2}{n} X^T(\hat{y} - Xb)
$

In [35]:
from typing import List

max_epochs = 100
learning_rate = 0.1

b_gd = np.random.randn(k, 1)
loss = []
for i in range(1, max_epochs+1):
    pred = X.dot(b_gd)
    loss.append(MSE(y_hat, pred))
    gradient = -2/n * X.T.dot(y_hat - pred)
    b_gd -= learning_rate*gradient

    if i%10 == 0:
        print(f'At epoch {i}, loss {loss[-1]}')

print(b_gd)
print(MSE(y_hat, X.dot(b_gd)))

At epoch 10, loss [0.42182709]
At epoch 20, loss [0.22779113]
At epoch 30, loss [0.20916916]
At epoch 40, loss [0.20684663]
At epoch 50, loss [0.2065084]
At epoch 60, loss [0.20645322]
At epoch 70, loss [0.20644295]
At epoch 80, loss [0.20644069]
At epoch 90, loss [0.2064401]
At epoch 100, loss [0.20643993]
[[ 0.27057739]
 [-0.38373311]
 [ 0.46913375]
 [-0.53526969]
 [-0.06046746]]
[0.20643992]
