<a href="https://colab.research.google.com/github/nivalf/Predict_Energy_Usage/blob/main/predict_energy_usage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Predict Energy Usage



A dataset with information on weather in an area and the energy use of a residential building is given.
The data set comprises the timestamp the data were taken, the amount of energy used by various
home appliances and rooms, the amount of energy produced by a solar panel that was installed, *and*
various weather conditions including cloud cover, wind speed, precipitation, etc. The dataset has a
total of 30 features. This assignment’s goal is to create multiple machine learning models to
predict how much energy will be consumed and assess the effectiveness of each model using various
matrices.

###Domain Analysis

The household electric energy consumption is highly correlated to air temperature, moderately correlated to precipitation and duration of sunshine [1].

When the daily temperature rises, the daily electricity demand falls [2]. The decline in demand as the temperature rises could be attributed to a number of factors, including an increase in outside activity and a decrease in the need for heating [1]. However, it is interesting to note that this pattern isn’t essentially true from late night hours to early morning time. This is because the people would be resting during this time. The electricity usage shows high sensitivity to temperature during the day time and early night time [1].

Given that individuals are less inclined to go outside when it rains more heavily, more rainfall may be related to higher electricity demand. As in the case of temperature, rain doesn’t have much effect on the energy consumption during late nights and early mornings for the same reasons [1].

People tend to engage in more outdoor activities during sunny days leading to a reduction in the household energy consumption. However a clear pattern is observed before and after the time 15:00. It can be explained as more indoor activities might take place from morning to around 15:00 and later people get engaged in outdoor activities [1] [3] [4].

The daytime till late-night demand for power is increased by relative humidity and wind speed. This pattern is similar to that of temperature. However, all the effects of humidity and wind speed are relatively insignificant [1].

The energy consumption during weekends follow a different pattern than weekdays. This could be because people spend more time at house rather than office during weekends and the activities during weekend vary considerably from that of weekdays [1]. 

Thus, temperature, precipitation and duration of sunshine are major factors for deciding energy consumption whereas humidity and wind speed has very small influence [1]. However, the humidity and wind speed can be combined with temperature to get apparent temperature which is the perceived temperature in degrees Fahrenheit for the specified hour determined by either a combination of temperature and wind (Wind Chill) or by a combination of temperature and humidity (Heat Index). The wind chill factor will be applied to a grid point’s apparent temperature when the outside temperature drops to 50 F or less at that location. The heat index will be utilised for Apparent Temperature whenever a grid point’s temperature exceeds 80 F. The ambient air temperature is represented as the
apparent temperature between 51 and 80 F [5].

A solar panel attached to a house can generate a fair amount of energy at suitable environmental conditions. Even while solar energy can still be captured on cloudy and rainy days, the solar panel’s energy generation effectiveness is reduced. Sunlight is necessary for solar panels to efficiently collect solar energy. As a result, a few days of cloudy, rainy weather can significantly affect the energy grid. Furthermore, higher temperature doesn’t necessarily mean higher energy generation. Solar panels
have ambient working temperature which is dependent on its manufacture [6].


####References

[1] J. Kang and D. M. Reiner, “What is the effect of weather on household electricity consumption?
Empirical evidence from Ireland,” Energy Economics, vol. 111, p. 106023, 2022.

[2] L. Bl ́azquez, N. Boogen, and M. Filippini, “Residential electricity demand in spain: New empirical
evidence using aggregate data,” Energy Economics, vol. 36, p. 648–657, 2013.

[3] J. Harold, S. Lyons, and J. Cullinan, “The determinants of residential gas demand in ireland,”
Energy Economics, vol. 51, p. 475–483, 2015.

[4] V. D. Cosmo and D. O’Hora, “Nudging electricity consumption using TOU pricing and feedback:
Evidence from Irish households,” Journal of Economic Psychology, vol. 61, p. 1–14, 2017.

[5] "What is apparent temperature?". [Online]. Available: https://meteor.geol.iastate.edu/~ckarsten/bufkit/apparent_temperature.html

[6] “How does the weather affect your solar panels?” Feb 2020. [Online]. Available:
https://www.penrithsolar.com.au/blog/how-does-the-weather-affect-your-solar-panels



# 1. Import Libraries & Load Data

### Setup

First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that Python 3.5 or later is installed, as well as Scikit-Learn ≥0.20.

In [2]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np
import os

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "predict_diamond_price"
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

# Ignore useless warnings (see SciPy issue #5998)
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")

### Load the Data

In [3]:
import pandas as pd

def load_energy_usage_data():
    return pd.read_csv('https://raw.githubusercontent.com/nivalf/Predict_Energy_Usage/main/data/dataset.csv')

In [4]:
energy_usage = load_energy_usage_data()

  energy_usage = load_energy_usage_data()


In [14]:
energy_usage.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50392 entries, 0 to 50391
Data columns (total 30 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   time                 50392 non-null  object 
 1   use [kW]             50391 non-null  float64
 2   gen [kW]             50391 non-null  float64
 3   House overall [kW]   50391 non-null  float64
 4   Dishwasher [kW]      50391 non-null  float64
 5   Furnace 1 [kW]       50391 non-null  float64
 6   Furnace 2 [kW]       50391 non-null  float64
 7   Home office [kW]     50391 non-null  float64
 8   Fridge [kW]          50391 non-null  float64
 9   Wine cellar [kW]     50391 non-null  float64
 10  Garage door [kW]     50391 non-null  float64
 11  Kitchen 12 [kW]      50391 non-null  float64
 12  Kitchen 14 [kW]      50391 non-null  float64
 13  Kitchen 38 [kW]      50391 non-null  float64
 14  Barn [kW]            50391 non-null  float64
 15  Well [kW]            50391 non-null 

# 2. Pre-Processing & Feature Engineering

### Clean the Data

The error observed after loading the data indicates mixed values in Column 0. 

Checking for NaN values in the dataframe:

In [13]:
energy_usage[energy_usage.isna().any(axis=1)]

Unnamed: 0,time,use [kW],gen [kW],House overall [kW],Dishwasher [kW],Furnace 1 [kW],Furnace 2 [kW],Home office [kW],Fridge [kW],Wine cellar [kW],...,Weather icon,humidity,visibility,apparentTemperature,pressure,windSpeed,windBearing,precipIntensity,dewPoint,precipProbability
50391,\,,,,,,,,,,...,,,,,,,,,,


Row number 50391, which is the last row in the dataframe contains invalid data for all the columns. Thus, imputing this row.

In [29]:
energy_usage = energy_usage.drop(50391)