<a href="https://colab.research.google.com/github/sajillidhan123/Data-Science-Lab/blob/main/Using_airquality_dataset_Multiple_Linear_Regression_Sajil_Lidhan_T_2_3_22.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# MultipleLinearRegression

**Program to implement multiple linear regression technique using any standard dataset available in the public domain and evaluate its performance.**



The description for all the columns containing data for air pollutants, temperature, relative humidity and absolute humidity is provided below.


|Columns|Description|
|-|-|
|PT08.S1(CO)|PT08.S1 (tin oxide) hourly averaged sensor response (nominally $\text{CO}$ targeted)|
|C6H6(GT)|True hourly averaged Benzene concentration in $\frac{\mu g}{m^3}$|
|PT08.S2(NMHC)|PT08.S2 (titania) hourly averaged sensor response (nominally $\text{NMHC}$ targeted)|
|PT08.S3(NOx)|PT08.S3 (tungsten oxide) hourly averaged sensor response (nominally $\text{NO}_x$ targeted)|
|PT08.S4(NO2)|PT08.S4 (tungsten oxide) hourly averaged sensor response (nominally $\text{NO}_2$ targeted)|
|PT08.S5(O3) |PT08.S5 (indium oxide) hourly averaged sensor response (nominally $\text{O}_3$ targeted)|
|T|Temperature in Â°C|
|RH|Relative Humidity (%)|
|AH|AH Absolute Humidity|

---

#### Multiple Linear Regression Model Using `sklearn` Module


In [None]:
#Load Dataset & display 1st 5 rows. Github link is as follows:
# https://raw.githubusercontent.com/jiss-sngce/air/main/airquality.csv.csv
import pandas as pd
import numpy as np
df=pd.read_csv('https://raw.githubusercontent.com/jiss-sngce/air/main/airquality.csv.csv')
df.head()

Unnamed: 0,DateTime,PT08.S1(CO),C6H6(GT),PT08.S2(NMHC),PT08.S3(NOx),PT08.S4(NO2),PT08.S5(O3),T,RH,AH,Year,Month,Day,Day Name
0,2004-03-10 18:00:00,1360.0,11.9,1046.0,1056.0,1692.0,1268.0,13.6,48.9,0.7578,2004,3,10,Wednesday
1,2004-03-10 19:00:00,1292.0,9.4,955.0,1174.0,1559.0,972.0,13.3,47.7,0.7255,2004,3,10,Wednesday
2,2004-03-10 20:00:00,1402.0,9.0,939.0,1140.0,1555.0,1074.0,11.9,54.0,0.7502,2004,3,10,Wednesday
3,2004-03-10 21:00:00,1376.0,9.2,948.0,1092.0,1584.0,1203.0,11.0,60.0,0.7867,2004,3,10,Wednesday
4,2004-03-10 22:00:00,1272.0,6.5,836.0,1205.0,1490.0,1110.0,11.2,59.6,0.7888,2004,3,10,Wednesday


In [None]:
#Display the columns in dataframe
df.columns

Index(['DateTime', 'PT08.S1(CO)', 'C6H6(GT)', 'PT08.S2(NMHC)', 'PT08.S3(NOx)',
       'PT08.S4(NO2)', 'PT08.S5(O3)', 'T', 'RH', 'AH', 'Year', 'Month', 'Day',
       'Day Name'],
      dtype='object')

In [None]:
# Build a linear regression model using the sklearn module by including all the features except DateTime,Day Name & RH.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

features=list(df.columns.values[1:-1])
features.remove('RH')

x=df[features]
y=df['RH']

# Splitting the DataFrame into the train and test sets.
# Test set will have 33% of the values.
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.33,random_state=42)
y_train_reshaped=y_train.values.reshape(-1,1)
y_test_reshaped=y_test.values.reshape(-1,1)

# Build a linear regression model using the 'sklearn.linear_model' module.
sklearn_lin_reg=LinearRegression()
sklearn_lin_reg.fit(x_train,y_train_reshaped)


# Print the value of the intercept .
print("\nConstant".ljust(15," "),f"{sklearn_lin_reg.intercept_[0]:.6f}")

# Print the names of the features along with the values of their corresponding coefficients.
for item in list(zip(x.columns.values,sklearn_lin_reg.coef_[0])):
  print(f"{item[0]}".ljust(15," "),f"{item[1]:.6f}")


Constant       -15028.451823
PT08.S1(CO)     0.014833
C6H6(GT)        -0.903464
PT08.S2(NMHC)   -0.005881
PT08.S3(NOx)    0.001503
PT08.S4(NO2)    0.026497
PT08.S5(O3)     -0.001066
T               -2.354919
AH              29.551742
Year            7.505153
Month           1.167861
Day             0.035232


In [None]:
# Evaluate the linear regression model using the 'r2_score', 'mean_squared_error' & 'mean_absolute_error' functions of the 'sklearn' module.
from sklearn.metrics import r2_score,mean_squared_error,mean_absolute_error

y_train_pred=sklearn_lin_reg.predict(x_train)
y_test_pred=sklearn_lin_reg.predict(x_test)

print(f"Train Set\n{'-'*50}")
print(f"R-Squared:{r2_score(y_train_reshaped,y_train_pred):3f}")
print(f"Mean Squared Error:{mean_squared_error(y_train_reshaped,y_train_pred):3f}")
print(f"Root Mean Squared Error:{np.sqrt(mean_squared_error(y_train_reshaped,y_train_pred)):3f}")
print(f"Mean Absolute Error:{mean_absolute_error(y_train_reshaped,y_train_pred):3f}")

print(f"Test Set\n{'-'*50}")
print(f"R-Squared:{r2_score(y_test_reshaped,y_test_pred):3f}")
print(f"Mean Squared Error:{mean_squared_error(y_test_reshaped,y_test_pred):3f}")
print(f"Root Mean Squared Error:{np.sqrt(mean_squared_error(y_test_reshaped,y_test_pred)):3f}")
print(f"Mean Absolute Error:{mean_absolute_error(y_test_reshaped,y_test_pred):3f}")

Train Set
--------------------------------------------------
R-Squared:0.878564
Mean Squared Error:35.115918
Root Mean Squared Error:5.925869
Mean Absolute Error:4.571995
Test Set
--------------------------------------------------
R-Squared:0.878702
Mean Squared Error:34.702124
Root Mean Squared Error:5.890851
Mean Absolute Error:4.564460
