# Ordinary Least Squares

The explaination of this algorithm was taken from the book *The Elements of Statistical Learning*, chapter 3.

## Importing packages

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from os.path import join
import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixin

## Explaining the Algorithm
In a *Ordinary Least Square* Algorithm we want to minimize the Residual Square Sum with a set of coefficients $\beta = (\beta_0, \beta_1, ..., \beta_p)^T$, given by the following equation:

$$ RSS(\beta) = (\bold{y} - \bold{X} \beta)^T (\bold{y} - \bold{X}  \beta) $$

Denote by $\bold{X}$ the $N \times (p + 1)$ matrix with each row an input vector (with a 1 in the first position) (this is due to the necessity of incorporating the intercept to the coefficients $\beta$ as the element $\beta_0$), and similarly let $\bold{y}$ be the N-vector of outputs in the training set.

One way to minimize this function is by setting it's derivative in respect to $\beta$ to zero.

$$ \frac{\partial RSS}{\partial\beta}  = -2\bold{X}^T (\bold{y} - \bold{X}  \beta)$$
$$ \bold{X}^T (\bold{y} - \bold{X}  \beta) = 0 $$

$$ \hat{\beta} =  (\bold{X}^T \bold{X})^{-1} \bold{X}^T \bold{y}$$

In [None]:
X = np.array([[1], [2], [3], [4]])
y = np.array([1, 3, 2, 5])

# Adding a new column with ones to represent the intercept
_X = np.hstack((np.ones([X.shape[0],1], X.dtype), X))


In [14]:
_X

array([[1, 1],
       [1, 2],
       [1, 3],
       [1, 4]])

In [16]:
weights = np.linalg.inv(_X.T @ _X) @ _X.T @ y
weights

array([0. , 1.1])

## Custom Model

In [4]:
class OLSRegressor(BaseEstimator, RegressorMixin):
    def __init__(self):
        ...

    def fit(self, X, y):
        # Simple fit method that calculates the mean of the target
        self.X = np.hstack((np.ones([X.shape[0],1], X.dtype), X))
        self.y = y
        self.N, self.p = X.shape
        
        self.weights = np.linalg.inv(self.X.T @ self.X) @ self.X.T @ y
        
        return self

    def predict(self, X):
        _X = np.hstack((np.ones([X.shape[0],1], X.dtype), X))
        return _X @ self.weights
    
    def get_variance(self):
        y_hat = self.predict(self.X)
        return 1/(self.N - self.p - 1) * np.sum((y_hat - self.y)**2)
    
    def get_params_covariance(self) -> np.ndarray:
        """The variance–covariance matrix of the least squares parameter estimates

        Returns:
            np.ndarray: The variance–covariance matrix
        """
        return np.linalg.inv(self.X.T @ self.X) * self.get_variance()

In [5]:
ols = OLSRegressor().fit(X, y)
y_hat = ols.predict(X)
y_hat

array([1.1, 2.2, 3.3, 4.4])

In [6]:
ols.get_params_covariance()

array([[ 0.27 , -0.675],
       [-0.675,  2.025]])