<a href="https://colab.research.google.com/github/rajivnexgen/appliance_energy_prediction/blob/main/Appliance_Energy_Prediction_Rajiv_Pratap_Singh.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Prediction of the energy use of appliances**

The data set is at 10 min for about 4.5 months. The house temperature and humidity conditions were monitored with a ZigBee wireless sensor network. Each wireless node transmitted the temperature and humidity conditions around 3.3 min. Then, the wireless data was averaged for
10 minutes periods. 
The energy data was logged every 10 minutes with m-bus energy meters.
Weather from the nearest airport weather station (Chievres Airport, Belgium) was downloaded from a public data set from Reliable Prognosis (rp5.ru) and merged together with the experimental data sets using the date and time column. Two random variables have been included in the data set for testing the regression models and to filter out non-predictive attributes (parameters).

## ***Features details of dataset***

**date**: time year-month-day hour:minute:second.

**Appliances**: energy use in Wh (Dependent variable).

**lights**: energy use of light fixtures in the house in Wh (to be Dropped this column).

**T1**: Temperature in kitchen area, in Celsius.

**RH1**: Humidity in kitchen area in % 

**T2**: Temperature in living room area, in Celsius.

**RH2**: Humidity in living room area in %.

**T3**: Temperature in laundry room area.

**RH3**: Humidity in laundry room area in %.

**T4**: Temperature in office room in Celsius.

**RH4**: Humidity in office room in %.

**T5**: Temperature in bathroom in Celsius.

**RH5**: Humidity in bathroom, in % .

**T6**: Temperature outside the building (north side) in Celsius.

**RH6**: Humidity outside the building (north side) in %.

**T7**: Temperature in ironing room  in Celsius.

**RH7**: Humidity in ironing room, in % .

**T8**: Temperature in teenager room 2 in Celsius. 

**RH8**: Humidity in teenager room 2 in %.

**T9**: Temperature in parents room, in Celsius.

**RH9**: Humidity in parents room, in % .

**T_out**: Temperature outside (from Chievres weather station), in Celsius 

**Press_mm_hg**: (from Chievres weather station), in mm Hg 

**RHout**: Humidity outside (from Chievres weather station), in %.

**Windspeed**: (from Chievres weather station), in m/s.

**Visibility**: (from Chievres weather station), in km.

**Tdewpoint**: (from Chievres weather station), Â°C.

**rv1**: Random variable 1, nondimensional.

**rv2**: Random variable 2, nondimensional.

# **Problem Statement**
We need to forecast the energy consumption of appliances based on temperature , humidity and weather conditions. To get the solution of this problem we have to make an energy prediction engine in which we shall use supervised machine learning algorithm. also it is a regression type of problem so we shall use regression algorithm for this.

In [2]:
#importing basic libraries 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing, model_selection, metrics

In [5]:
#mount the google drive
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [6]:
#load the dataset
df=pd.read_csv('/content/drive/MyDrive/almabetter/capstone project_production/regression supervised/Copy of data_application_energy.csv')

# Data overview

In [33]:
#remove 'lights' dependents feature in the dataset which is not required for model
df.drop("lights",axis=1,inplace=True)

In [34]:

df.head()

Unnamed: 0,date,Appliances,T1,RH_1,T2,RH_2,T3,RH_3,T4,RH_4,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
0,2016-01-11 17:00:00,60,19.89,47.596667,19.2,44.79,19.79,44.73,19.0,45.566667,...,17.033333,45.53,6.6,733.5,92.0,7.0,63.0,5.3,13.275433,13.275433
1,2016-01-11 17:10:00,60,19.89,46.693333,19.2,44.7225,19.79,44.79,19.0,45.9925,...,17.066667,45.56,6.483333,733.6,92.0,6.666667,59.166667,5.2,18.606195,18.606195
2,2016-01-11 17:20:00,50,19.89,46.3,19.2,44.626667,19.79,44.933333,18.926667,45.89,...,17.0,45.5,6.366667,733.7,92.0,6.333333,55.333333,5.1,28.642668,28.642668
3,2016-01-11 17:30:00,50,19.89,46.066667,19.2,44.59,19.79,45.0,18.89,45.723333,...,17.0,45.4,6.25,733.8,92.0,6.0,51.5,5.0,45.410389,45.410389
4,2016-01-11 17:40:00,60,19.89,46.333333,19.2,44.53,19.79,45.0,18.89,45.53,...,17.0,45.4,6.133333,733.9,92.0,5.666667,47.666667,4.9,10.084097,10.084097


In [35]:
df.tail()

Unnamed: 0,date,Appliances,T1,RH_1,T2,RH_2,T3,RH_3,T4,RH_4,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
19730,2016-05-27 17:20:00,100,25.566667,46.56,25.89,42.025714,27.2,41.163333,24.7,45.59,...,23.2,46.79,22.733333,755.2,55.666667,3.333333,23.666667,13.333333,43.096812,43.096812
19731,2016-05-27 17:30:00,90,25.5,46.5,25.754,42.08,27.133333,41.223333,24.7,45.59,...,23.2,46.79,22.6,755.2,56.0,3.5,24.5,13.3,49.28294,49.28294
19732,2016-05-27 17:40:00,270,25.5,46.596667,25.628571,42.768571,27.05,41.69,24.7,45.73,...,23.2,46.79,22.466667,755.2,56.333333,3.666667,25.333333,13.266667,29.199117,29.199117
19733,2016-05-27 17:50:00,420,25.5,46.99,25.414,43.036,26.89,41.29,24.7,45.79,...,23.2,46.8175,22.333333,755.2,56.666667,3.833333,26.166667,13.233333,6.322784,6.322784
19734,2016-05-27 18:00:00,430,25.5,46.6,25.264286,42.971429,26.823333,41.156667,24.7,45.963333,...,23.2,46.845,22.2,755.2,57.0,4.0,27.0,13.2,34.118851,34.118851


In [36]:
df.sample(5)

Unnamed: 0,date,Appliances,T1,RH_1,T2,RH_2,T3,RH_3,T4,RH_4,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
10638,2016-03-25 14:00:00,250,21.7,41.73,19.533333,45.26,23.533333,40.06,19.79,39.7,...,19.5,41.626667,10.2,755.7,71.0,6.0,40.0,5.2,45.870858,45.870858
15822,2016-04-30 14:00:00,100,21.79,36.126667,21.025,35.6175,25.426667,37.53,20.0,37.626667,...,19.463333,38.663333,10.5,759.4,63.0,3.0,22.0,3.7,46.931675,46.931675
10205,2016-03-22 13:50:00,70,22.26,37.2,22.39,35.4,22.1,36.433333,22.0,35.4,...,19.79,39.9,11.983333,756.966667,56.666667,3.0,40.0,3.516667,39.288637,39.288637
15484,2016-04-28 05:40:00,270,20.7,35.863333,17.1,39.933333,21.29,34.863333,19.89,34.0,...,18.89,38.2,-0.3,755.4,95.666667,1.333333,49.666667,-0.933333,4.894289,4.894289
5853,2016-02-21 08:30:00,50,20.7,43.626667,19.39,44.626667,21.79,42.2,19.6,44.7,...,18.26,47.06,10.65,756.35,88.5,9.0,40.0,8.75,24.961764,24.961764


In [37]:
df.shape

(19735, 28)

In [38]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19735 entries, 0 to 19734
Data columns (total 28 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   date         19735 non-null  object 
 1   Appliances   19735 non-null  int64  
 2   T1           19735 non-null  float64
 3   RH_1         19735 non-null  float64
 4   T2           19735 non-null  float64
 5   RH_2         19735 non-null  float64
 6   T3           19735 non-null  float64
 7   RH_3         19735 non-null  float64
 8   T4           19735 non-null  float64
 9   RH_4         19735 non-null  float64
 10  T5           19735 non-null  float64
 11  RH_5         19735 non-null  float64
 12  T6           19735 non-null  float64
 13  RH_6         19735 non-null  float64
 14  T7           19735 non-null  float64
 15  RH_7         19735 non-null  float64
 16  T8           19735 non-null  float64
 17  RH_8         19735 non-null  float64
 18  T9           19735 non-null  float64
 19  RH_9

In [40]:
df.describe()

Unnamed: 0,Appliances,T1,RH_1,T2,RH_2,T3,RH_3,T4,RH_4,T5,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
count,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,...,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0
mean,97.694958,21.686571,40.259739,20.341219,40.42042,22.267611,39.2425,20.855335,39.026904,19.592106,...,19.485828,41.552401,7.411665,755.522602,79.750418,4.039752,38.330834,3.760707,24.988033,24.988033
std,102.524891,1.606066,3.979299,2.192974,4.069813,2.006111,3.254576,2.042884,4.341321,1.844623,...,2.014712,4.151497,5.317409,7.399441,14.901088,2.451221,11.794719,4.194648,14.496634,14.496634
min,10.0,16.79,27.023333,16.1,20.463333,17.2,28.766667,15.1,27.66,15.33,...,14.89,29.166667,-5.0,729.3,24.0,0.0,1.0,-6.6,0.005322,0.005322
25%,50.0,20.76,37.333333,18.79,37.9,20.79,36.9,19.53,35.53,18.2775,...,18.0,38.5,3.666667,750.933333,70.333333,2.0,29.0,0.9,12.497889,12.497889
50%,60.0,21.6,39.656667,20.0,40.5,22.1,38.53,20.666667,38.4,19.39,...,19.39,40.9,6.916667,756.1,83.666667,3.666667,40.0,3.433333,24.897653,24.897653
75%,100.0,22.6,43.066667,21.5,43.26,23.29,41.76,22.1,42.156667,20.619643,...,20.6,44.338095,10.408333,760.933333,91.666667,5.5,40.0,6.566667,37.583769,37.583769
max,1080.0,26.26,63.36,29.856667,56.026667,29.236,50.163333,26.2,51.09,25.795,...,24.5,53.326667,26.1,772.3,100.0,14.0,66.0,15.5,49.99653,49.99653


In [43]:
#check null values
df.isnull().sum()

date           0
Appliances     0
T1             0
RH_1           0
T2             0
RH_2           0
T3             0
RH_3           0
T4             0
RH_4           0
T5             0
RH_5           0
T6             0
RH_6           0
T7             0
RH_7           0
T8             0
RH_8           0
T9             0
RH_9           0
T_out          0
Press_mm_hg    0
RH_out         0
Windspeed      0
Visibility     0
Tdewpoint      0
rv1            0
rv2            0
dtype: int64

we observed that:

a)data types of date is object type,Appliance is in int64 data types and remaining all features is in float64 data types

b)we can convert the date feature object to date time data types.

c)There is no null values in the dataset.