# `Multiple Linear regression`

> - [Normal Equation](https://www.geeksforgeeks.org/ml-normal-equation-in-linear-regression/)
> - [Gradient Descent](https://www.geeksforgeeks.org/gradient-descent-in-linear-regression/)

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [2]:
data = pd.read_excel("Folds5x2_pp.xlsx")

## 1. `Multiple Linear Regression - sklearn implementation`

In [3]:
data.corr()

Unnamed: 0,AT,V,AP,RH,PE
AT,1.0,0.844107,-0.507549,-0.542535,-0.948128
V,0.844107,1.0,-0.413502,-0.312187,-0.86978
AP,-0.507549,-0.413502,1.0,0.099574,0.518429
RH,-0.542535,-0.312187,0.099574,1.0,0.389794
PE,-0.948128,-0.86978,0.518429,0.389794,1.0


### Due to multi collinearity between AT and V, we choose AT (with higher correlation)
### RH do not have any correlation with PE, so I dropped it

In [4]:
X = data[["AT","AP"]]
y = data["PE"]

In [5]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X,y)

LinearRegression()

In [6]:
theta1,theta2 = model.coef_   # theta1 and theta2
print(f'theta1 = {theta1} and theta2 = {theta2}')

theta1 = -2.1130659428774066 and theta2 = 0.14402918985400698


In [7]:
theta0 = model.intercept_   # theta0
print(f'theta0 = {theta0}')

theta0 = 349.95047262174114


In [8]:
y_pred = model.predict(X) # prediction for all values

In [9]:
y_pred

array([465.83497857, 443.659007  , 484.93329046, ..., 429.65929428,
       444.24805277, 450.81906105])

In [16]:
import math
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error,mean_absolute_percentage_error
print(f"R2_scores {r2_score(y,y_pred)}")
print(f"mean_squared_error {mean_squared_error(y,y_pred)}")
print(f"root_mean_squared_error {math.sqrt(mean_squared_error(y,y_pred))}")
print(f"mean_absolute_error {mean_absolute_error(y,y_pred)}")
print(f"mean_absolute_percentage_error {mean_absolute_percentage_error(y,y_pred)}")

R2_scores 0.9008123274074772
mean_squared_error 28.888595613229647
root_mean_squared_error 5.374811216520041
mean_absolute_error 4.2781877319852555
mean_absolute_percentage_error 0.009454646988296442


---

## `Predictor ----------------------------`

---

In [43]:
x1 = float(input('Enter the value of AT: '))
x2 = float(input('Enter the value of AP: '))
y_pred = theta0 + theta1 * x1 + theta2 * x2
print(f'The predicted value y_pred={y_pred}')

Enter the value of AT: 14.96
Enter the value of AP: 1024.07
The predicted value y_pred=465.83497857181


## 2. `Multiple Linear Regression - Normal equation`

In [None]:
import numpy as np

In [None]:
X.insert(0, 'x0', 1)

In [None]:
X

In [None]:
X_matrix = X.values
y_matrix = y.values

In [None]:
X_matrix

In [None]:
# normal equation
a = np.linalg.inv(np.matmul(X_matrix.T,X_matrix))
b = np.matmul(X_matrix.T,y_matrix)
theta0, theta1, theta2 = np.matmul(a,b)
print(f'theta0 = {theta0}')
print(f'theta1 = {theta1}')
print(f'theta2 = {theta2}')

In [None]:
y_pred = np.matmul(theta, X.T )

In [60]:
import math
from sklearn.metrics import r2_score, mean_squared_error, me
print(f"R2_scores {r2_score(y,y_pred)}")
print(f"mean_squared_error {mean_squared_error(y,y_pred)}")
print(f"root_mean_squared_error {math.sqrt(mean_squared_error(y,y_pred))}")
print(f"mean_absolute_error {math.sqrt(mean_squared_error(y,y_pred))}")

R2_scores 0.9008123274074772
mean_squared_error 28.88859561322964
root_mean_squared_error 5.37481121652004


---

## `Predictor ----------------------------`

---

In [44]:
x1 = float(input('Enter the value of AT: '))
x2 = float(input('Enter the value of AP: '))
y_pred = theta0 * 1 + theta1 * x1 + theta2 * x2
print(f'The predicted value y_pred={y_pred}')

Enter the value of AT: 14.96
Enter the value of AP: 1024.07
The predicted value y_pred=465.83497857181


---

### `correction -------------------------`
#### you have not added x0 column to X, to have theta0, theta1, theta2 ??

---

## 3. `Multiple Linear Regression - ols (ordinary least square ) using statsmodel`

In [31]:
import statsmodels.api as sm

In [32]:
X = data[['AT','AP']]
X

Unnamed: 0,AT,AP
0,14.96,1024.07
1,25.18,1020.04
2,5.11,1012.16
3,20.86,1010.24
4,10.82,1009.23
...,...,...
9563,16.65,1014.01
9564,13.19,1023.67
9565,31.32,1012.92
9566,24.48,1013.86


In [34]:
X = sm.add_constant(X)
X.head()

Unnamed: 0,const,AT,AP
0,1.0,14.96,1024.07
1,1.0,25.18,1020.04
2,1.0,5.11,1012.16
3,1.0,20.86,1010.24
4,1.0,10.82,1009.23


In [35]:
X

Unnamed: 0,const,AT,AP
0,1.0,14.96,1024.07
1,1.0,25.18,1020.04
2,1.0,5.11,1012.16
3,1.0,20.86,1010.24
4,1.0,10.82,1009.23
...,...,...,...
9563,1.0,16.65,1014.01
9564,1.0,13.19,1023.67
9565,1.0,31.32,1012.92
9566,1.0,24.48,1013.86


In [36]:
y

0       463.26
1       444.37
2       488.56
3       446.48
4       473.90
         ...  
9563    460.03
9564    469.62
9565    429.57
9566    435.74
9567    453.28
Name: PE, Length: 9568, dtype: float64

In [37]:
model = OLS(y,X)
results = model.fit()

In [38]:
results.params

const    349.950473
AT        -2.113066
AP         0.144029
dtype: float64

In [39]:
theta0 = 349.9504726231098
theta1 = -2.1130659428763465
theta2 = 0.14402918985433644

In [41]:
y_pred = results.predict(X)
y_pred

0       465.834979
1       443.659007
2       484.933290
3       451.375966
4       472.445678
           ...    
9563    460.814963
9564    469.517494
9565    429.659294
9566    444.248053
9567    450.819061
Length: 9568, dtype: float64

### model summary

In [40]:
results.summary()

0,1,2,3
Dep. Variable:,PE,R-squared:,0.901
Model:,OLS,Adj. R-squared:,0.901
Method:,Least Squares,F-statistic:,43430.0
Date:,"Wed, 18 Aug 2021",Prob (F-statistic):,0.0
Time:,19:47:16,Log-Likelihood:,-29667.0
No. Observations:,9568,AIC:,59340.0
Df Residuals:,9565,BIC:,59360.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,349.9505,10.969,31.902,0.000,328.448,371.453
AT,-2.1131,0.009,-246.881,0.000,-2.130,-2.096
AP,0.1440,0.011,13.410,0.000,0.123,0.165

0,1,2,3
Omnibus:,402.309,Durbin-Watson:,2.034
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1056.99
Skew:,-0.204,Prob(JB):,3e-230
Kurtosis:,4.577,Cond. No.,202000.0


---

## `Predictor ----------------------------`

---

In [45]:
x1 = float(input('Enter the value of AT: '))
x2 = float(input('Enter the value of AP: '))
y_pred = theta0 * 1 + theta1 * x1 + theta2 * x2
print(f'The predicted value y_pred={y_pred}')

Enter the value of AT: 14.96
Enter the value of AP: 1024.07
The predicted value y_pred=465.83497857181


## END `------------------------`