# Energy Consumption Forecasting Using Machine Learning. 

The following blocks of code will take you through a step by step process in which you will perform exploratory analysis on the dataset you upload and use machine learning to make predictions on energy consumption.

Please make you read and understand the user manual before you continue. The intstructions contain specific details on how to load your data and execute each section of the notebook.

Once all the packages are installed, you can proceed and import them:

In [None]:
from ipywidgets import interact, interactive, interact_manual
import ipywidgets as widgets
from matplotlib.dates import DateFormatter
import matplotlib.dates as mdates
import Loading_and_cleansing_01 as ld  #This is the script that loads the data.
import pandas as pd
import numpy as np
from datetime import datetime
import calendar
import time
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose

Use the following cell to provide the full file path of the file you want to upload (between the quotation marks next to r):

In [None]:
file=r""
categorical=[] #optional - depends on the dataset
data=ld.load(file)

## Exploratory Data Analysis

Now that the data have been loaded, let's have a look at the first and last rows of the dataset.

In [None]:
print("First 4 rows:")
data.head(4)

In [None]:
print("Last 4 rows:")
data.tail(4)

The following are the data types of each column and some general properties of the dataset:


In [None]:
data.info(memory_usage='deep')
data.describe()

If you want to view a different time period, execute the following cell and change start and end dates using the date pickers: 

In [None]:
print (' ')
print ('Minimum date in dataset: '+str(data.index.min()))
print ('Maximum date in dataset: '+str(data.index.max()))
print (' ')

def filter_data(initial_d, last_d):
    filtered = data.loc[initial_d : last_d].copy()
    return filtered
    
return_1 = interact(filter_data,
             initial_d=widgets.DatePicker(value=pd.to_datetime(str(data.index.min()))),
             last_d=widgets.DatePicker(value=pd.to_datetime(str(data.index.max()))))




Now we will create a graphical representation of our data using a line chart, select an appropriate time range using the date pickers and press the "Run Interact" button, please note that the first time you execute the cell below and every time you change the parameters, you have to press the button so that the values you provide are captured. To create the chart, follow the previous steps and then run the cell below it:



In [None]:
print (' ')
print ('Minimum date in dataset: '+str(data.index.min()))
print ('Maximum date in dataset: '+str(data.index.max()))
print (' ')

def filter_for_chart(initial_date, last_date):
    filtered = data.loc[initial_date : last_date].copy()
    return filtered
    
return_2 = interactive(filter_for_chart, {'manual': True},
             initial_date=widgets.DatePicker(value=pd.to_datetime(str(data.index.min()))),
             last_date=widgets.DatePicker(value=pd.to_datetime(str(data.index.max()))))

display(return_2)


In [None]:

p=return_2.kwargs#get the dictionary produced by the widget
i=str(p.get('initial_date'))#extract dates captured in date picker
f=str(p.get('last_date'))
plot_data= data.loc[i : f].copy()

# Define the date format
myFmt = DateFormatter('%Y-%m-%d  %H:%M') 

# plot the data
fig, ax = plt.subplots()
ax.plot(plot_data.loc[:,'energy_consumption'], color='green', marker='o', linewidth=0.3, markersize=3 )
ax.set(xlabel="Date-Time", ylabel='Energy consumption.')
ax.set(title="Energy consumption")
ax.xaxis.set_major_formatter(myFmt) 
fig.autofmt_xdate()
fig.set_size_inches(17.5, 5.2)

del(p, i, f, plot_data)

The following is the decomposition of your time series data (make sure you update freq and filter parameters according to your data):

In [None]:
filter= 500 #<-- update with the number od observations that you want to see

decomposition = seasonal_decompose(data.iloc[0:filter,-1],
                                   freq=48, # <-- Update
                                   model='multiplicative')

decomposition.plot()

## Machine Learning Models

### Single-Step Ahead Predictions

The following section contains a number of machine learning alogrithms that you can use to make one-step ahead predictions. 
Please specify the frequency of you time-series data below, for more information on the format of the frequency variable please refer to the user manual:

In [None]:
data_frequency='30min'    #Please type in the frequency between the quotation marks
duplicate=pd.DataFrame(data, copy=True)       #<-- crate a copy of the original dataset in case you need to run this again
new_data=pd.DataFrame(duplicate, copy=True)
print("Executed")


#### Data Preparation:

If you want to reduce the size of your dataset, use the following date picker to select a different time period, once you select the range of dates you want, please click on Run Interact and then run the following cell:

In [None]:
print (' ')
print ('Minimum date in dataset: '+str(new_data.index.min()))
print ('Maximum date in dataset: '+str(new_data.index.max()))
print (' ')

def reduce_size(initial_date, last_date):
    new_data = pd.DataFrame(duplicate.loc[initial_date : last_date],copy=True)
    
    return new_data
    
return_3 = interactive(reduce_size, {'manual': True},
             initial_date=widgets.DatePicker(value=pd.to_datetime(str(duplicate.index.min()))),
             last_date=widgets.DatePicker(value=pd.to_datetime(str(duplicate.index.max()))))

display(return_3)

In [None]:
p=return_3.kwargs#get the dictionary produced by the widget
i=str(p.get('initial_date'))#extract dates captured in date picker
f=str(p.get('last_date'))

if i != 'None' and f != 'None':
    new_data = pd.DataFrame(duplicate.loc[i : f],copy=True)
    print('')
    print('Your data has been re-sized.')
    print('Number of rows and columns=  '+ str(new_data.shape))
    print('')

Now you will perform a number of transformations that will help to improve the models' accuracy: 

In [None]:
#To substract trend and seasonal components from time series data
#Please update the frequency argument accordingly (only integer values)
trend, seasonal, residual, new_data, decomposition = ld.ts_decomposition(new_data, 48)# <-- Update with an integer value
print("Executed")

In [None]:
#You might get some warnings due to data type convesions, if so, you can ignore them.

s, new_data= ld.scale_data(new_data)             #<-- Data normalization
print("Executed")


In [None]:
#The variable to predict contains the values of X that will be used to make a prediction on t 
#(the next value after the end of the dataset)

new_data, to_predict=ld.shift_data(new_data, 1)  #<-- Number of lags you would like to create (change 2nd argument as needed)
print("Executed")


In [None]:
x_train, y_train, x_test, y_test= ld.split_data(new_data, 75)#You can specify the percentage of the training data size (as int)
print("Executed")


In [None]:
#Dataset for supervised learning - VISUAL INSPECTION:
new_data.head(5) #<-- first 5  observations

The following cell will import all the machine learning algorithms that will be utilized to predict energy consumption:

In [None]:
import ML_models as mlm
print("Executed")


Before using the ML models, you need to define the window size and horizon parameters: 

In [None]:
window = 100 # update accordingly
horizon = 10 # update accordingly

### K-NEAREST NEIGHBOURS

The following cell will fit the KNN to your data, the window and horizon arguments will be passed to the function:

In [None]:

c_v_results, best_k, KNN_train_MSE, KNN_test_MSE, KNN_test_y_hat = mlm.fit_knn_forecasting(x_train, y_train, x_test, y_test, window, horizon)
print('')
print('Best K: ' + str(best_k) + '    Train MSE: ' + str(KNN_train_MSE) + '        Test MSE: '+ str(KNN_test_MSE))
print('')
print('')
print('')

The following table shows the results of forward chaining cross validation where K represents the number o KNN and 'avg' represents the average MSE across all folds (from fold 1 to fold 9):

In [None]:
c_v_results

At this point, the algorithm has already found the best K using the training data and it has performed single-setp ahead predictions on the test set. The following graph shows a comparison between the test data and the predictions that were made for the same time range:

In [None]:
to_plot=y_test.merge(KNN_test_y_hat, left_index=True, right_index=True)
to_plot.columns=['Y', 'Y_HAT']

# Define the date format
myFmt = DateFormatter('%Y-%m-%d  %H:%M') 

# plot the data
fig, ax = plt.subplots()
ax.plot(to_plot.iloc[:,0], color='green', marker='o', linewidth=0.8, markersize=2, label='Observed')
ax.plot(to_plot.iloc[:,1], color='blue', marker='o', linewidth=0.8, markersize=2, label='KNN Prediction' )
ax.set(xlabel="Date-Time", ylabel='Energy consumption.')
ax.set(title="Comparison between actual and predicted energy consumption.")
ax.xaxis.set_major_formatter(myFmt) 
fig.autofmt_xdate()
plt.legend()
fig.set_size_inches(17.5, 5.2)
del(to_plot)

Now that you have trained the model and found an optimal value for K, the following cell will generate a prediction on t (the value that is after the last observation in the test set) which means that this a prediction on unseen data, it is important to notice that variable best_k that was produced when the model was fitted, is passed to this function as "best_k", if you wish to change the parameter K you can replace "best_k" by the value you want:

In [None]:
knn_prediction =mlm.knn_predict_t(best_k, x_test.iloc[-window:,:], y_test.iloc[-window:,:], to_predict,f= data_frequency)


Graphical representation of the prediction you just made: 

In [None]:
# Define the date format
myFmt = DateFormatter('%Y-%m-%d  %H:%M') 

# plot the data
fig, ax = plt.subplots()
ax.plot(y_test.iloc[-window:,:], color='green', marker='o', linewidth=0.8, markersize=2, label='Original Time-Series')
ax.plot(knn_prediction, color='blue', marker='o', linewidth=0.8, markersize=5, label='Future prediction' )
ax.set(xlabel="Date-Time", ylabel='Energy consumption.')
ax.set(title="Energy Consumption Prediction")
ax.xaxis.set_major_formatter(myFmt) 
fig.autofmt_xdate()
plt.legend()
fig.set_size_inches(17.5, 5.2)

### Gaussian Process Regression

Now Gaussian Process Regression will be trained and tested:


In [None]:
#Again in this case the window and horizon parameters you defined above are passed to this function
#You might get a "FutureWarning" when you run the model, if that is the case, you can ignore it
gp_kernel_parameters, gp_train_mse, gp_test_mse, gp_test_y_hat= mlm.fit_gaussian_process_forecasting(x_train, y_train, x_test, y_test, [0.01, 0.01, 0.01], window, horizon)

print(" ")
print(" ")
print("Training MSE= "+str(gp_train_mse)+"     Test MSE= "+str(gp_test_mse))


Graphical representation of the test results: 

In [None]:
# Define the date format
myFmt = DateFormatter('%Y-%m-%d  %H:%M') 

# plot the data
fig, ax = plt.subplots()
ax.plot(gp_test_y_hat, color='blue', linewidth=1, markersize=2, label='prediction_GP')
ax.plot(y_test.iloc[window:window+horizon,:], color='green', marker='o', linewidth=1, markersize=2, label='Observed')
ax.set(xlabel="Date-Time", ylabel='Energy consumption.')
ax.set(title="Energy Consumption Prediction")
ax.xaxis.set_major_formatter(myFmt) 
fig.autofmt_xdate()
plt.legend()
fig.set_size_inches(17.5, 5.2)

Now you will predict an unseen observation t (which is right after the end of the dataset). Please note that the kernel parameters that were obtained in mlm.fit_gaussian_process_forecasting are now passed as arguments to the following function: 

In [None]:
 gp_prediction= mlm.gaussian_process_predict(x_test.iloc[-window:,:], y_test.iloc[-window:,:], to_predict, data_frequency,gp_kernel_parameters)

Graphical representation:

In [None]:
# Define the date format
myFmt = DateFormatter('%Y-%m-%d  %H:%M') 
# plot the data
fig, ax = plt.subplots()
ax.plot(y_test.iloc[-window:,:], color='green', marker='o', linewidth=0.8, markersize=2, label='Original Time-Series')
ax.plot(gp_prediction, color='blue', marker='o', linewidth=0.8, markersize=7, label='Future prediction' )
ax.set(xlabel="Date-Time", ylabel='Energy consumption.')
ax.set(title="Energy Consumption Prediction")
ax.xaxis.set_major_formatter(myFmt)
fig.autofmt_xdate()
plt.legend()
fig.set_size_inches(17.5, 5.2)

### Support Vector Regression

In the following section you will use SVM to make predictions in the same manner as the previous models. You might get  "FutureWarning" or "DeprecationWarning" during the execution of the function, if that is the case you can ignore those warnings:

In [None]:
search_result, svr_training_mse, svr_test_mse, svr_test_pred= mlm.train_svm(x_train, y_train, x_test, y_test, window, horizon)
print(" ")
print(" ")
print("Training MSE= "+str(svr_training_mse)+"     Test MSE= "+str(svr_test_mse))

Graphical representation of the test results:

In [None]:
# Define the date format
myFmt = DateFormatter('%Y-%m-%d  %H:%M') 

# plot the data
fig, ax = plt.subplots()
ax.plot(svr_test_pred, color='blue', linewidth=1, markersize=2, label='SVR Prediction')
ax.plot(y_test.iloc[window:window+horizon,:], color='green', marker='o', linewidth=1, markersize=2, label='Observed')
ax.set(xlabel="Date-Time", ylabel='Energy consumption.')
ax.set(title="Energy Consumption Prediction")
ax.xaxis.set_major_formatter(myFmt) 
fig.autofmt_xdate()
plt.legend()
fig.set_size_inches(17.5, 5.2)

Now you will predict an unseen observation t (which is right after the end of the dataset). Please note that the results of the grid search are passed obtained from the training phases are passed to the following function: 

In [None]:
svm_prediction= mlm.predict_svm(search_result, x_test.iloc[[-1],:],  to_predict, data_frequency)
print("Executed")

In [None]:
# Define the date format
myFmt = DateFormatter('%Y-%m-%d  %H:%M') 
# plot the data
fig, ax = plt.subplots()
ax.plot(y_test.iloc[-window:,:], color='green', marker='o', linewidth=0.8, markersize=2, label='Original Time-Series')
ax.plot(svm_prediction, color='blue', marker='o', linewidth=0.8, markersize=7, label='Future prediction' )
ax.set(xlabel="Date-Time", ylabel='Energy consumption.')
ax.set(title="Energy Consumption Prediction")
ax.xaxis.set_major_formatter(myFmt)
fig.autofmt_xdate()
plt.legend()
fig.set_size_inches(17.5, 5.2)