<a href="https://colab.research.google.com/github/eduardocarbo/Biography/blob/main/004_Linear_and_Logistic_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Linear regression

- Implement a Python function that estimates the coefficients β of a multivariate linear regression model.
- Use the implemented function to estimate the parameters (β) for the following training dataset:

|row|X1|X2|y|
|-----|-------|------|----|
|1|0.1|2.2|5.4|
|2|0.9|3.9|4.1|
|3|2.1|6.1|2.9|
|4|2.7|7.7|1.1|
|5|4.2|10.3|0.3|

- Implement a small code in Python that estimates the performance metrics studied in the lecture (i.e., RSS, RSE, R2, MSE, and RMSE) of the above multiple linear regression model.


In [1]:
import numpy as np

In [2]:
# implements part a) -> function to estimate MLR coeffcients (betas): beta = (Xt*X)^-1*Xt*y
from numpy.linalg import inv  # inverse matrix function
def mlr_coeff(X,y):
    X = np.c_[np.ones(len(X)), X]  # append 1s column (beta0)
    return inv((X.T) @ X) @ X.T @ y

In [3]:
Xtrain = np.array([[0.1, 2.2],
                   [0.9, 3.9],
                   [2.1, 6.1],
                   [2.7, 7.7],
                   [4.2, 10.3]])
Xtrain

array([[ 0.1,  2.2],
       [ 0.9,  3.9],
       [ 2.1,  6.1],
       [ 2.7,  7.7],
       [ 4.2, 10.3]])

In [4]:
ytrain = np.array([[5.4, 4.1, 2.9, 1.1, 0.3]]).T
ytrain

array([[5.4],
       [4.1],
       [2.9],
       [1.1],
       [0.3]])

In [5]:
Beta = mlr_coeff(Xtrain,ytrain)
Beta

array([[10.92601381],
       [ 4.10026583],
       [-2.70969296]])

In [6]:
def mlr_predict(BETA, Xnew):
    Xnew = np.c_[np.ones(len(Xnew)), Xnew]
    return Xnew @ BETA

In [7]:
y_pred = mlr_predict(BETA=Beta, Xnew=Xtrain)
y_pred

array([[5.37471588],
       [4.04845052],
       [3.00744501],
       [1.13209577],
       [0.23729282]])

In [8]:
def mlr_scores(y_true, y_pred):
    err = y_true - y_pred
    RSS = ( err.T @  err ).item(0) # RSS is still an array. converts to scalar with `item`
    n = len(y_true)
    RSE = np.sqrt(RSS/(n-2))
    ybar = np.mean(y_true)
    R2 = ( 1 - RSS / ( (y_true - ybar).T @ (y_true - ybar) ) ).item(0)
    MSE = RSS/n
    RMSE = np.sqrt(MSE)
    return RSS, RSE, R2, MSE, RMSE

In [9]:
RSS, RSE, R2, MSE, RMSE = mlr_scores(y_true=ytrain, y_pred=y_pred)
print("RSS = ", RSS)
print("RSE = ", RSE)
print("R2 = ", R2)
print("RMSE = ", RMSE)

RSS =  0.019803393338210504
RSE =  0.08124734526577981
R2 =  0.9988742955128348
RMSE =  0.06293392302758585
