# Linear Regression with Normal Equation and Gradient Descent

$X \in \mathfrak{M}_{m\times n}(\mathbb{R}), \mathbf{y} \in \mathbb{R}^m, \theta \in \mathbb{R}^n$ 
\begin{equation}
\mathcal{L(\theta) = || \mathbf{X}\theta - \mathbf{y} ||^2}
\end{equation}

We want to find $\theta$ which minizes the error function $\mathcal{L}(\theta)$.

In [0]:
import numpy as np
from numpy.linalg import inv

In [0]:
num_features = 5
X = np.random.randn(100, num_features)
Y = np.random.randn(100)
theta = np.random.randn(num_features)


In [3]:
error = np.sum((Y - np.dot(X, theta))**2, axis=0)
error

441.1478116816043

## Normal Equation

 It has closed-form solution as follows, which is Normal Equation :
\begin{align*}
\text{argmin}_{\theta} \mathcal{L}(\theta) =  (X^tX)^{-1}X^t\mathbf{y}
\end{align*}

In [4]:
theta_hat = np.dot(inv(X.T @ X) @ X.T, Y)
final_loss = np.sum((Y - np.dot(X, theta_hat))**2, axis=0)
final_loss 

88.15259780536041

## Gradient Descent

Since $\mathcal{L}(\theta)$ is differentiable function, we can find find the parameter $\theta$ by gradient descent as follows, where $\eta$ is learning rate:
\begin{align*}
\nabla_\theta \mathcal{L}(\theta) := (\frac{\partial \mathcal{L}(\theta)}{\partial \theta_1}, \ldots, \frac{\partial \mathcal{L}(\theta)}{\partial \theta_n}, )
\end{align*}
\begin{equation}
\theta^{(t+1)} = \theta^{(t)} - \eta \nabla_{\theta}\mathcal{L}(\theta^{(t)})
\end{equation}

Let $\mathbf{x_i}$ the ith row vector of $X$.
\begin{align*}
\frac{\partial \mathcal{L}(\theta)}{\partial \theta_j} &= \sum\limits_{i=1}^m (\mathbf{x}_i\cdot \theta - y_i)^2 \\
&= \sum\limits_{i=1}^m \frac{\partial (\mathbf{x}_i \cdot \theta - y_i)^2}{\partial \theta_j} \\
&= \sum\limits_{i=1}^m 2(\mathbf{x}_i \cdot \theta - y_i) \frac{\partial (\mathbf{x}_i \cdot \theta - y_i)}{\partial \theta_j} \\
&=  \sum\limits_{i=1}^m 2(\mathbf{x}_i \cdot \theta - y_i) X_{ij} \:\: \text{ where } X_{ij} \text{ is } (i,j) \text{ entry of } X \\
&= 2 [X]^j \cdot (X\theta - \mathbf{y}) \text{ where } [X]^j \text{is the jth column vector of }X \\
\end{align*}


\begin{equation}\therefore \nabla_{\theta}\mathcal{L}(\theta) = 2X^t(X\theta - \mathbf{y})
\end{equation}

In [0]:
class Regression(object):
    def __init__(self, num_features, lr=0.001):
        self.weight = np.random.randn(num_features)
        self.lr = lr


    def forward(self, x):
        self.logits = np.dot(x, self.weight)
        
        return self.logits

    def backward(self, x, y):
        grad = 2 * x.T @ (self.logits - y)
        self.weight = self.weight - self.lr * grad
        return grad


In [6]:
net = Regression(num_features, lr=0.001)

for _ in range(100):
    logits = net.forward(X)
    loss = np.sum((logits - Y) ** 2, 0)
    grad = net.backward(X, Y)

logits = net.forward(X)
sgd_loss = np.sum((logits - Y)**2, axis=0)
print(sgd_loss)

88.15259780539971


In [7]:
abs(final_loss - sgd_loss)

3.929301328753354e-11