# Multiple Regression

## key concepts
- For a slightly advanced model in the linear domain, we can add more variables (features) to our model.

$$
y_i = \beta_1 x_{i1} + \beta_2 x_{i2} + \beta_3 x_{i3} + \alpha + \varepsilon_i
$$

Where:
- $y_i$ = response (dependent) variable (response)
- $x_{i1}, x_{i2}, x_{i3}$ = predictor variables
- $\beta_1$, $\beta_2$, $\beta_3$ = coefficients
- $\alpha$ = intercept
- $\varepsilon_i$ = error term (noise)

> The $\varepsilon_i$ can be understood in better detail as :
> - Error term to represent that there are other factors not accounted for by this simple model.
> - In 3b1b term, model is trying to do _book-keeping_. [Reference](https://youtu.be/9-Jl0dxWQs8?si=35nObUMC-FQVgPPh)

-----------

## Multiple Linear regression intuition

- Model a response variable $y_i$ using multiple input features.

### Model:
$$
y_i = \beta_0 + \beta_1 x_{i1} + ... + \beta_k x_{ik} + \varepsilon_i
$$

#### In vector notation:

$$
y_i = \mathbf{x}_i \cdot \boldsymbol{\beta}
$$

Where:
- $\mathbf{x}i = [1, x_{i1}, x_{i2}, x_{i3}, .... x_{ik}]$ → input vector with a leading 1 for the intercept
- $\boldsymbol{\beta} = [\beta_0, \beta_1, \beta_2, \beta_3, ..., \beta_k]$ → parameter vector (includes intercept)

------------

In [3]:
import os
os.chdir("..")

In [9]:
from support.linear_algebra import dot, Vector

def predict(x, beta):
    """Predicts y given input vector x and coeffecients beta.
    Assumes x[0] is 1 for the intercept term.
    """

    return dot(x, beta)

In [7]:
x_i = [1, 49, 4, 0] # [intercept, var-1, var-2, categorical variable 3]
beta = [1.0, 1.5, -2.0, 3.0] # example weights

predict(x_i, beta)

66.5

# Assumptions Behind Multiple Linear Regression (Least Squares)

### For multiple regression to work correctly and give reliable estimates, two key assumptions must hold:

1. No Perfect Multicollinearity

> No predictor `(e.g., var-2)` should be an exact linear combination of other predictors (e.g., var-1, var-3, etc.).

**Why is this important?**

> If one variable can be formed exactly by combining others, the model can’t determine how much weight to assign to each.
> It becomes impossible to uniquely estimate the coefficients.

------------------

2. No Correlation should exist b/w `Predictors` and `Errors`.

The model assumes that the input variables are ***not*** correlated with the error terms.

**Why does this matter?**
> If this assumption fails, the model can systematically mis-estimate the coefficients.
> Even if the predictors (features) look good individually, they may be indirectly masking the effect of others, leading to biased results.

------------------

#### Summary
- Predictors should be independent from each other.
- Predictors should also be independent from the errors.

If these assumptions are not met, the model might still work, but the results will be misleading. Coefficients will not reflect the correct underlying relationships.

------------------

## Fitting the model

> choosing the `beta` to minimise the sum of squared errors.
> *gradient descent* will be used.

In [10]:
from typing import List

def error(x: Vector, y: float, beta: Vector) -> float:
    return predict(x, beta) - y 

def squared_error(x: Vector, y: float, beta: Vector) -> float:
    return error(x, y, beta) ** 2

In [11]:
x = [1,2,3]
y = 30
beta = [4,4,4] # so prediction = 4+8+12=24

In [12]:
assert error(x, y, beta) == -6
assert squared_error(x, y, beta) == 36

### Formula Behind `sqerror_gradient`

> The gradient of the squared error loss function wrt the parameter vector $\boldsymbol{\beta}$ is:

$$
\nabla_{\beta} \text{SE} = \frac{\partial}{\partial \beta} (y - \hat{y})^2 = -2 (y - \hat{y}) \cdot \mathbf{x}
$$

Or rewritten (since $\hat{y} = \mathbf{x} \cdot \boldsymbol{\beta}$):

$$
\nabla_{\beta} \text{SE} = 2 \cdot (\hat{y} - y) \cdot \mathbf{x}
$$

In [13]:
# using calculus

def sqerror_gradient(x: Vector, y: float, beta: Vector) -> Vector:
    err = error(x, y, beta)
    return [2 * err * x_i for x_i in x]

In [14]:
assert sqerror_gradient(x, y, beta) == [-12, -24, -36]

-----
# Script Complete