### **Multiple linear regression**

---

Formula for linear regression in n-dimensions, derived from the equation of a hyperplane in n-dimensions -

### $$ y = \beta_0 + \sum_{i=1}^n \beta_i X_i $$

#### Closed form solution -

### $$ \beta = (X^T X) ^ {-1} X^T y $$

where,

$ \beta $ = $[\beta_0, \beta_1, ..., \beta_n] ^ T$

The inverse function for matrices is of cubic time-complexity, hence Gradient descent is preferred for large datasets instead.

---

#### **Custom Linear Regression**

In [16]:
import numpy as np

class CustomLinearRegression:
    def __init__(self):
        self.bias = None
        self.weights = None

    def fit(self, X, y):
        # 1 is added so that beta_0 is generated on dot product
        X_mod = np.insert(arr=X, obj=0, axis=1, values=1)

        betas = np.linalg.inv(np.dot(X_mod.T, X_mod).dot(X_mod.T).dot(y))
        self.weights = betas[1:]
        self.bias = betas[0]

    def predict(self, X):
        y_pred = np.dot(X, self.weights) + self.bias
        return y_pred

---

In [1]:
from sklearn.datasets import load_diabetes 

In [5]:
df = load_diabetes(as_frame=True).frame

In [6]:
df.sample(5)

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,target
238,0.034443,0.05068,-0.009439,0.059744,-0.035968,-0.007577,-0.076536,0.07121,0.011011,-0.021788,257.0
22,-0.08543,-0.044642,-0.00405,-0.009113,-0.002945,0.007767,0.022869,-0.039493,-0.061176,-0.013504,68.0
90,0.012648,-0.044642,-0.025607,-0.040099,-0.030464,-0.045155,0.078093,-0.076395,-0.072133,0.011349,98.0
163,0.016281,0.05068,0.072474,0.076958,-0.008449,0.005575,-0.006584,-0.002592,-0.023647,0.061054,131.0
278,0.067136,0.05068,-0.036385,-0.084856,-0.007073,0.019667,-0.054446,0.034309,0.001148,0.032059,102.0


In [7]:
X = df.drop("target", axis=1)
y = df["target"]

In [9]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

In [10]:
from sklearn.linear_model import LinearRegression

lr = LinearRegression()

In [11]:
lr.fit(X_train, y_train)

In [13]:
y_pred = lr.predict(X_test)

In [14]:
from sklearn.metrics import r2_score

r2_score(y_test, y_pred)

0.4601536278036501

---