# Multivariate-Linear-Regression-From-Scratch

**Model:**
$$
y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \dots + \beta_k x_{ik} + \epsilon
$$

## General Assumptions

- Features (independent variable) are linearly independent and it has a linear relationship with the dependent variable

- Features are all uncorrelated with epslon (error of model)

- For the sake of implementition, we assume the value of epsilon is zero on average (hence ignoring its effects)


**Implemented Model**

```python
y_i = beta_0 + beta_1 * x_i_1 + beta_2 * x_i_2 + ..........+ beta_k * x_i_k + epsilon

Beta = [b_0, b_1, b_2, ....... b_k]

X_i = [x_i_1, x_i_2, ........ x_i_k]

Y_i = Beta * X_i + beta_0 + epsilon   # dot product of Beta_vector and x_i
```

## Imports

In [1]:
import math
import random
import numpy as np
import pandas as pd
from collections import Counter
from typing import List
import tqdm

Type definitions:

In [None]:
Vector = np.ndarray

## Gradient Decsent

Using Gradient Decsent to find the minimum value of the loss function

In [None]:
def descent_one_step(v: Vector, gradient: Vector, learning_rate: float) -> Vector:
    """Starts from v and moves a step units in oppsite direction of gradient"""
    assert len(v) == len(gradient), "vector and its gradient are of different lengths"
 
    step = (-learning_rate) * gradient
    return v + step

Model Prediction

In [None]:
def predict(x: Vector, beta: Vector) -> float:
    """Returns the predicted 'y' (target) value given 'x' (features) and co-efficients"""
    return x.dot(beta)

loss function (error function)

In [None]:
def error(x: Vector, y: float, beta: Vector) -> float:
    """Returns the error of predicted value from the actual value (True value)"""
    return predict(x, beta) - y


def squared_error(x: Vector, y: float, beta: Vector) -> float:
    """Returns the squared value of error"""
    return error(x, y, beta) ** 2


def squarred_error_gradient(x: Vector, y: float, beta: Vector) -> Vector:
    """Returns the gradient of error function"""
    err = error(x, y, beta)
    return [2 * err] + [2 * err * x_i for x_i in x[1:]]

## Fit Data

In [None]:
# stochastic

def least_squares_fit(
    xs: Vector[Vector],
    ys: Vector,
    learning_rate: float = 0.001,
    num_steps: int = 1000,
    batch_pct: int = 1,
) -> Vector:
    """Computes the parameters (Beta) that minimizes sum of squared errors using gradient descent"""

    random.seed(0)
    sample_size = len(xs)
    features_num = len(xs[0])
    batch_size = int(sample_size * batch_pct)

    # start with a random starting point for the parameters of the model (Beta)
    Beta = [random.random() for _ in range(features_num)]

    with tqdm.trange(num_steps, desc="Ordinary Least Squares Fit") as t:
        for epoch in t:
            index = random.randint(0, sample_size - 1 - batch_size) if batch_size != sample_size else 0
            batch_xs = xs[index : index + batch_size]
            batch_ys = ys[index : index + batch_size]

            err_func_grads = [
                squarred_error_gradient(x, y, Beta) for x, y in zip(batch_xs, batch_ys)
            ]
            gradient = np.mean(err_func_grads, axis=0)

            Beta = descent_one_step(Beta, gradient, learning_rate)

    return Beta