<a href="https://colab.research.google.com/github/krishnanands17/DataScienceLab/blob/main/CO2PG2-Multiple_Linear_Regression-03-01-22.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# MultipleLinearRegression

**Program to implement multiple linear regression technique using any standard dataset available in the public domain and evaluate its performance.**



The description for all the columns containing data for air pollutants, temperature, relative humidity and absolute humidity is provided below.


|Columns|Description|
|-|-|
|PT08.S1(CO)|PT08.S1 (tin oxide) hourly averaged sensor response (nominally $\text{CO}$ targeted)|
|C6H6(GT)|True hourly averaged Benzene concentration in $\frac{\mu g}{m^3}$|
|PT08.S2(NMHC)|PT08.S2 (titania) hourly averaged sensor response (nominally $\text{NMHC}$ targeted)|
|PT08.S3(NOx)|PT08.S3 (tungsten oxide) hourly averaged sensor response (nominally $\text{NO}_x$ targeted)|
|PT08.S4(NO2)|PT08.S4 (tungsten oxide) hourly averaged sensor response (nominally $\text{NO}_2$ targeted)|
|PT08.S5(O3) |PT08.S5 (indium oxide) hourly averaged sensor response (nominally $\text{O}_3$ targeted)|
|T|Temperature in Â°C|
|RH|Relative Humidity (%)|
|AH|AH Absolute Humidity|

---

#### Multiple Linear Regression Model Using `sklearn` Module


In [None]:
#Load Dataset & display 1st 5 rows. Github link is as follows:
# https://raw.githubusercontent.com/jiss-sngce/air/main/airquality.csv.csv
import pandas as pd
df=pd.read_csv('https://raw.githubusercontent.com/jiss-sngce/air/main/airquality.csv.csv')
df.head()

Unnamed: 0,DateTime,PT08.S1(CO),C6H6(GT),PT08.S2(NMHC),PT08.S3(NOx),PT08.S4(NO2),PT08.S5(O3),T,RH,AH,Year,Month,Day,Day Name
0,2004-03-10 18:00:00,1360.0,11.9,1046.0,1056.0,1692.0,1268.0,13.6,48.9,0.7578,2004,3,10,Wednesday
1,2004-03-10 19:00:00,1292.0,9.4,955.0,1174.0,1559.0,972.0,13.3,47.7,0.7255,2004,3,10,Wednesday
2,2004-03-10 20:00:00,1402.0,9.0,939.0,1140.0,1555.0,1074.0,11.9,54.0,0.7502,2004,3,10,Wednesday
3,2004-03-10 21:00:00,1376.0,9.2,948.0,1092.0,1584.0,1203.0,11.0,60.0,0.7867,2004,3,10,Wednesday
4,2004-03-10 22:00:00,1272.0,6.5,836.0,1205.0,1490.0,1110.0,11.2,59.6,0.7888,2004,3,10,Wednesday


In [None]:
#Display the columns in dataframe
df.info

<bound method DataFrame.info of                  DateTime  PT08.S1(CO)  C6H6(GT)  ...  Month  Day   Day Name
0     2004-03-10 18:00:00       1360.0      11.9  ...      3   10  Wednesday
1     2004-03-10 19:00:00       1292.0       9.4  ...      3   10  Wednesday
2     2004-03-10 20:00:00       1402.0       9.0  ...      3   10  Wednesday
3     2004-03-10 21:00:00       1376.0       9.2  ...      3   10  Wednesday
4     2004-03-10 22:00:00       1272.0       6.5  ...      3   10  Wednesday
...                   ...          ...       ...  ...    ...  ...        ...
9352  2005-04-04 10:00:00       1314.0      13.5  ...      4    4     Monday
9353  2005-04-04 11:00:00       1163.0      11.4  ...      4    4     Monday
9354  2005-04-04 12:00:00       1142.0      12.4  ...      4    4     Monday
9355  2005-04-04 13:00:00       1003.0       9.5  ...      4    4     Monday
9356  2005-04-04 14:00:00       1071.0      11.9  ...      4    4     Monday

[9357 rows x 14 columns]>

In [None]:
# Build a linear regression model using the sklearn module by including all the features except DateTime,Day Name & RH.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
features=list(df.columns.values[1:-1])
features.remove('RH')
X=df[features]
y=df['RH']


# Splitting the DataFrame into the train and test sets.
# Test set will have 33% of the values.
X_train, X_test, y_train,y_test= train_test_split(X,y,test_size=0.33,random_state=42)
y_train_reshaped=y_train.values.reshape(-1,1)
y_test_reshaped=y_test.values.reshape(-1,1)



# Build a linear regression model using the 'sklearn.linear_model' module.
sklearn_lin_reg=LinearRegression()
sklearn_lin_reg.fit(X_train,y_train_reshaped)


# Print the value of the intercept .
print('Intercept :',sklearn_lin_reg.intercept_[0])

# Print the names of the features along with the values of their corresponding coefficients.
print("coefficent : ",sklearn_lin_reg.coef_)
for item in list(zip(X.columns.values,sklearn_lin_reg.coef_[0])):
  print(item[0],item[1])

Intercept : -15028.451823247718
coefficent :  [[ 1.48327948e-02 -9.03464156e-01 -5.88095941e-03  1.50325488e-03
   2.64965020e-02 -1.06574176e-03 -2.35491907e+00  2.95517421e+01
   7.50515310e+00  1.16786097e+00  3.52321248e-02]]
PT08.S1(CO) 0.014832794792690625
C6H6(GT) -0.9034641560183382
PT08.S2(NMHC) -0.005880959405385411
PT08.S3(NOx) 0.0015032548783276978
PT08.S4(NO2) 0.026496502045666503
PT08.S5(O3) -0.001065741763271788
T -2.354919067592639
AH 29.551742104329783
Year 7.505153097892558
Month 1.1678609682998067
Day 0.03523212478929974


In [None]:
# Evaluate the linear regression model using the 'r2_score', 'mean_squared_error' & 'mean_absolute_error' functions of the 'sklearn' module.
from sklearn.metrics import r2_score, mean_squared_error,mean_absolute_error
import numpy as np
y_train_pred=sklearn_lin_reg.predict(X_train)
y_test_pred=sklearn_lin_reg.predict(X_test)
print('Train Set')
print('')
print('R-squared : ',r2_score(y_train_reshaped,y_train_pred))
print('\nmean squared error : ',mean_squared_error(y_train_reshaped,y_train_pred))
print('\nroot mean squared error : ',np.sqrt(mean_squared_error(y_train_reshaped,y_train_pred)))
print('\nmean absolute error : ',mean_absolute_error(y_train_reshaped,y_train_pred))
print('-------------------------------------------------------------')
print('Test set')
print('')
print('R-squared : ',r2_score(y_test_reshaped,y_test_pred))
print('\nmean squared error : ',mean_squared_error(y_test_reshaped,y_test_pred))
print('\nroot mean squared error : ',np.sqrt(mean_squared_error(y_test_reshaped,y_test_pred)))
print('\nmean absolute error : ',mean_absolute_error(y_test_reshaped,y_test_pred))


Train Set

R-squared :  0.8785638240066055

mean squared error :  35.11591834141915

root mean squared error :  5.925868572742662

mean absolute error :  4.571994849644625
-------------------------------------------------------------
Test set

R-squared :  0.8787020691681189

mean squared error :  34.702124455429534

root mean squared error :  5.8908509109830245

mean absolute error :  4.5644604329243466
