# Multiple ways to implement Multiple Linear Regression (MLR) Modelling in Python.

This notebook aims to illustrate different ways in which one can implement Multiple Linear Regression in Python. To achieve this objective, I will be using a revised dataset created to predict the end semester examination marks. 

Here, I have added one more feature to make the task of predication a little bit interesting. I added a feature called HRS, which indicates the number of hours spent studying the course day before the examination. The values for the HRS vector are randomly generated in the range of 0 to 24, and they are not actual.

The notebook illustrates three different ways to implement MLR. These are:
1. Ordinary Least Square method through Matrix operation.
2. Ordinary Least Square method from StatsModels
3. Gradient Descent Method from Scikit-Learn

My YouTube videos on MLR using [OLS](https://www.youtube.com/watch?v=0qv9Ck24q1s) and [Gradient Descent algorithm](https://www.youtube.com/watch?v=Y1J22hk_Vf0) explain the theory behind these methods.

> The follwoing code segment imports required modules and loads dataset as Panda's Dataframe.

In [None]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import seaborn as sns
import matplotlib.pyplot as plt
import math


The dataset has four different feature vectors called Attendance, MSE, HRS and ESE. The 'ESE' is an output vector, and the rest of the columns are input vectors representing independent variables.

In [None]:
df = pd.read_csv('../input/predictingese/amsPrediction - Sheet1.csv')
df.head()

In [None]:
df.describe()

The correlation analysis shows no strong correlation between the input vectors, namely 'Attendance'(-0.10), 'HRS'(0.25) and output vector 'ESE'.

In [None]:
corr=df.corr()
corr.style.background_gradient(cmap='coolwarm')

The following code segment separates input and output vectors. Also, it adds a constant unit vector as a coefficient for *beta0* or *intercept*.

In [None]:
endog = df['ESE']
exog = sm.add_constant(df[['MSE','Attendance','HRS']])
print(exog)

The follwing code converts the input and output vectors as numpy array in order to implement the  formula in a stepwise manner.



It also transpose the input vector.

In [None]:
X=exog.to_numpy()
Y= endog.to_numpy()
s1_xt =np.transpose(X)
print(s1_xt)


In [None]:
s2_mul1= np.matmul(s1_xt,X)
print(s2_mul1)

In [None]:
s3_inv=np.linalg.inv(s2_mul1)
print(s3_inv)

In [None]:
s4_mul= np.matmul(s3_inv,s1_xt)
print(s4_mul)

In [None]:
s5_res =np.matmul(s4_mul,Y)
print(s5_res)

The following code segment build the MLR model using  the OLS method from the statsmodel. 

In [None]:
mod = sm.OLS(endog, exog)
results = mod.fit()
print (results.summary())

In [None]:
def RSE(y_true, y_predicted):
   
    y_true = np.array(y_true)
    y_predicted = np.array(y_predicted)
    RSS = np.sum(np.square(y_true - y_predicted))

    rse = math.sqrt(RSS / (len(y_true) - 2))
    return rse

In [None]:
yp= results.predict()
ypa = np.array(yp)
yta = df['ESE']
eterms =yta-ypa


df1 = pd.DataFrame(eterms)
df1['ESE'].hist(bins=10)






In [None]:
rse= RSE(df['ESE'],results.predict())
print(rse)

The following code segment builds the MLR model from Scikit-Learn module.

In [None]:
from sklearn import linear_model
X = df[['MSE','Attendance','HRS']]
y = df['ESE']

lm = linear_model.LinearRegression()
model = lm.fit(X,y)
lm.coef_

In [None]:
lm.intercept_