### introduction

In the era of smart homes, the ability to predict energy consumption not only save money for 
users but also help in generating money for the user by giving excess energy back to Grid (in case of solar 
panels usage). In this case regression analysis will be used to predict Appliance energy usage based on 
data collected from various sensors.

The energy prediction will come under supervised machine learning task aiming to Appliance energy 
consumption for a house based on factors like temperature, humidity & pressure. Many techniques, 
Gradient descent algorithm, and linear regression (in built function) have been applied to credit predict 
the energy consumption.

### dataset

The dataset for the remainder of this quiz is the Appliances Energy Prediction data. 
The data set is at 10 min for about 4.5 months. The house temperature and humidity conditions were monitored 
with a ZigBee wireless sensor network. Each wireless node transmitted the temperature and humidity conditions around 3.3 min. 
Then, the wireless data was averaged for 10 minutes periods. The energy data was logged every 10 minutes 
with m-bus energy meters. Weather from the nearest airport weather station (Chievres Airport, Belgium) was 
downloaded from a public data set from Reliable Prognosis (rp5.ru), and merged together with the experimental data sets
using the date and time column. Two random variables have been included in the data set for testing 
the regression models and to filter out non predictive attributes (parameters). The attribute information can be seen below.

In [312]:
import numpy as np
import pandas as pd

In [313]:
d=pd.read_csv("D:\hamoye\energydata_complete.csv")

In [314]:
d.head()

Unnamed: 0,date,Appliances,lights,T1,RH_1,T2,RH_2,T3,RH_3,T4,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
0,2016-01-11 17:00:00,60,30,19.89,47.596667,19.2,44.79,19.79,44.73,19.0,...,17.033333,45.53,6.6,733.5,92.0,7.0,63.0,5.3,13.275433,13.275433
1,2016-01-11 17:10:00,60,30,19.89,46.693333,19.2,44.7225,19.79,44.79,19.0,...,17.066667,45.56,6.483333,733.6,92.0,6.666667,59.166667,5.2,18.606195,18.606195
2,2016-01-11 17:20:00,50,30,19.89,46.3,19.2,44.626667,19.79,44.933333,18.926667,...,17.0,45.5,6.366667,733.7,92.0,6.333333,55.333333,5.1,28.642668,28.642668
3,2016-01-11 17:30:00,50,40,19.89,46.066667,19.2,44.59,19.79,45.0,18.89,...,17.0,45.4,6.25,733.8,92.0,6.0,51.5,5.0,45.410389,45.410389
4,2016-01-11 17:40:00,60,40,19.89,46.333333,19.2,44.53,19.79,45.0,18.89,...,17.0,45.4,6.133333,733.9,92.0,5.666667,47.666667,4.9,10.084097,10.084097


In [315]:
d.shape

(19735, 29)

In [316]:
d.isnull().sum()

date           0
Appliances     0
lights         0
T1             0
RH_1           0
T2             0
RH_2           0
T3             0
RH_3           0
T4             0
RH_4           0
T5             0
RH_5           0
T6             0
RH_6           0
T7             0
RH_7           0
T8             0
RH_8           0
T9             0
RH_9           0
T_out          0
Press_mm_hg    0
RH_out         0
Windspeed      0
Visibility     0
Tdewpoint      0
rv1            0
rv2            0
dtype: int64

In [317]:
d.columns

Index(['date', 'Appliances', 'lights', 'T1', 'RH_1', 'T2', 'RH_2', 'T3',
       'RH_3', 'T4', 'RH_4', 'T5', 'RH_5', 'T6', 'RH_6', 'T7', 'RH_7', 'T8',
       'RH_8', 'T9', 'RH_9', 'T_out', 'Press_mm_hg', 'RH_out', 'Windspeed',
       'Visibility', 'Tdewpoint', 'rv1', 'rv2'],
      dtype='object')

In [318]:
d.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19735 entries, 0 to 19734
Data columns (total 29 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   date         19735 non-null  object 
 1   Appliances   19735 non-null  int64  
 2   lights       19735 non-null  int64  
 3   T1           19735 non-null  float64
 4   RH_1         19735 non-null  float64
 5   T2           19735 non-null  float64
 6   RH_2         19735 non-null  float64
 7   T3           19735 non-null  float64
 8   RH_3         19735 non-null  float64
 9   T4           19735 non-null  float64
 10  RH_4         19735 non-null  float64
 11  T5           19735 non-null  float64
 12  RH_5         19735 non-null  float64
 13  T6           19735 non-null  float64
 14  RH_6         19735 non-null  float64
 15  T7           19735 non-null  float64
 16  RH_7         19735 non-null  float64
 17  T8           19735 non-null  float64
 18  RH_8         19735 non-null  float64
 19  T9  

In [319]:
d.describe()

Unnamed: 0,Appliances,lights,T1,RH_1,T2,RH_2,T3,RH_3,T4,RH_4,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
count,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,...,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0
mean,97.694958,3.801875,21.686571,40.259739,20.341219,40.42042,22.267611,39.2425,20.855335,39.026904,...,19.485828,41.552401,7.411665,755.522602,79.750418,4.039752,38.330834,3.760707,24.988033,24.988033
std,102.524891,7.935988,1.606066,3.979299,2.192974,4.069813,2.006111,3.254576,2.042884,4.341321,...,2.014712,4.151497,5.317409,7.399441,14.901088,2.451221,11.794719,4.194648,14.496634,14.496634
min,10.0,0.0,16.79,27.023333,16.1,20.463333,17.2,28.766667,15.1,27.66,...,14.89,29.166667,-5.0,729.3,24.0,0.0,1.0,-6.6,0.005322,0.005322
25%,50.0,0.0,20.76,37.333333,18.79,37.9,20.79,36.9,19.53,35.53,...,18.0,38.5,3.666667,750.933333,70.333333,2.0,29.0,0.9,12.497889,12.497889
50%,60.0,0.0,21.6,39.656667,20.0,40.5,22.1,38.53,20.666667,38.4,...,19.39,40.9,6.916667,756.1,83.666667,3.666667,40.0,3.433333,24.897653,24.897653
75%,100.0,0.0,22.6,43.066667,21.5,43.26,23.29,41.76,22.1,42.156667,...,20.6,44.338095,10.408333,760.933333,91.666667,5.5,40.0,6.566667,37.583769,37.583769
max,1080.0,70.0,26.26,63.36,29.856667,56.026667,29.236,50.163333,26.2,51.09,...,24.5,53.326667,26.1,772.3,100.0,14.0,66.0,15.5,49.99653,49.99653


In [320]:
d.drop(columns=['date'],inplace=True)

In [321]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

In [322]:
s = MinMaxScaler()

In [323]:
normalised_df = pd.DataFrame(s.fit_transform(d), columns = d.columns)

In [324]:
features_df = normalised_df.drop(['Appliances'],axis=1)
target_df = normalised_df['Appliances']

In [325]:
x_train,x_test,y_train,y_test = train_test_split(features_df,target_df,random_state = 1, test_size=0.3)

In [326]:
x_train.shape

(13814, 27)

In [327]:
y_train.shape

(13814,)

In [328]:
from sklearn.linear_model import LinearRegression

In [329]:
m=LinearRegression()

In [330]:
m.fit(x_train,y_train)

LinearRegression()

In [331]:
pred = m.predict(x_test)

In [332]:
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test,pred)     
round(mae,3)

0.05

In [333]:
# mean_squared_error
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test,pred))
round(rmse,3)

0.087

In [334]:
# R-Squared
from sklearn.metrics import r2_score
r2_score = r2_score(y_test, pred)
round(r2_score,3)

0.171

In [335]:
from sklearn.linear_model import Ridge
r = Ridge(alpha=0.5)
r.fit(x_train, y_train)

Ridge(alpha=0.5)

In [336]:
from sklearn.linear_model import Lasso
l = Lasso(alpha=0.001)
l.fit(x_train, y_train)
     

Lasso(alpha=0.001)

In [337]:
#comparing the effects of regularization
def fun(model, feature, col):
  weights =  pd.Series(model.coef_, feature.columns).sort_values()
  weights_df = pd.DataFrame(weights).reset_index()
  weights_df.columns = ['Features',col]
  weights_df[col].round(3)
  return weights_df

In [338]:
linearModel_weights = fun(m, x_train, 'Linear_Model_Weight')
ridge_weights =  fun(r, x_train, 'Ridge_Weight')
lasso_weights = fun(l, x_train, 'Lasso_weight')

In [339]:
final_weights = pd.merge(linearModel_weights, ridge_weights, on='Features')
final_weights = pd.merge(final_weights, lasso_weights, on='Features')

In [340]:
final_weights

Unnamed: 0,Features,Linear_Model_Weight,Ridge_Weight,Lasso_weight
0,RH_2,-0.423771,-0.372912,-0.0
1,T_out,-0.320352,-0.248813,0.0
2,T2,-0.214165,-0.175412,0.0
3,T9,-0.145026,-0.143838,-0.0
4,RH_8,-0.120021,-0.119414,-0.0
5,RH_out,-0.089059,-0.061193,-0.048364
6,T4,-0.042154,-0.043456,-0.0
7,RH_7,-0.039308,-0.040724,-0.0
8,RH_9,-0.022781,-0.024555,-0.0
9,RH_4,-0.022686,-0.023828,0.0


In [341]:
y_pred_lg = m.predict(x_test)
y_pred_r = r.predict(x_test)
y_pred_l = l.predict(x_test)

In [342]:
mse_lg = mean_squared_error(y_test,y_pred_lg)     
round(mse_lg,3)

0.008

In [343]:
mse_r = mean_squared_error(y_test,y_pred_r)     
round(mse_r,3)

0.008

In [344]:
mse_l = mean_squared_error(y_test,y_pred_l)     
round(mse_l,3)

0.009

In [345]:
m.fit(x_train[['T2']], x_train.T6)
T6_pred = m.predict(x_test[['T2']])

In [347]:
r2_t = r2_score(x_test['T6'], T6_pred)
round(r2_t,2)