

# A forecast of the impact of AI and Crypto Minning in Global Energy Consumption


A predictive modelling of the future impact of AI and Crypto Mining in the world energy cnsumption.


## Content

* [1. Problem Statement](#0)
* [2. Getting Started - Load Libraries and Dataset](#1)
    * [2.1. Load Libraries](#1.1)    
    * [2.2. Load Dataset](#1.2)
* [3. Exploratory Data Analysis](#2)
    * [3.1 Descriptive Statistics](#2.1)    
    * [3.2. Data Visualisation](#2.2)
* [4. Data Preparation](#3)
    * [4.1 Data Cleaning](#3.1)
    * [4.3.Feature Selection](#3.2)
    * [4.3.Data Transformation](#3.3)
        * [4.3.1 Rescaling ](#3.3.1)
        * [4.3.2 Standardization](#3.3.2)
        * [4.3.3 Normalization](#3.3.3)    
* [5.Evaluate Algorithms and Models](#4)        
    * [5.1. Train/Test Split](#4.1)
    * [5.2. Test Options and Evaluation Metrics](#4.2)
    * [5.3. Compare Models and Algorithms](#4.3)
        * [5.3.1 Common Regression Models](#4.3.1)
        * [5.3.2 Ensemble Models](#4.3.2)
        * [5.3.3 Deep Learning Models](#4.3.3)  
    * [5.4. Time Series based Models-ARIMA and LSTM](#4.4)
        * [5.4.1 ARIMA Model](#4.4.1)
        * [5.4.2 LSTM Model](#4.4.2)
* [6. Model Tuning and Grid Search](#5)
    * [6.1 Common Regression, Ensemble and DeepNNRegressor Grid Search](#5.1)
    * [6.2 ARIMA and LSTM Grid Search](#5.2)
* [7. Finalize the Model](#6)  
    * [7.1. Results on test dataset](#6.1)
    * [7.1. Variable Intuition/Feature Selection](#6.2)
    * [7.3. Save model for later use](#6.3)


<a id='0'></a>
# 1. Problem Statement

The goal in this jupyter notebook is to create a pipeline to analyze the problem of enegy consumption prediction in the next 10 years, evaluating the increase in demand by AI and Minning of Crypto currencies.
The pipeline is based on the following steps:
- Load the data avaliable in csv files and other sources from IEA (International Energy Agency) https://www.iea.org/
- Transform the data applying the required formats and merging the datasets
- Plot and analyze the statistics to understand patterns and prepare data to training in machine learning models
- Apply ensemble methods and tuning of ensemble methods to improve model performance.
- The Following Models are implemented and evaluated

    * Ada Boost
    * Gradient Boosting Method
    * Random Forest
    * Extra Trees
    * Neural Network - Shallow - Using sklearn
    * Deep Neural Network - Using Keras
- Time Series Models
    * ARIMA Model
    * LSTM - Using Keras
    

<a id='1'></a>
# 2. Getting Started- Loading the data and python packages

<a id='1.1'></a>
## 2.1. Loading the python packages

In [1]:
!pip install scikeras
!pip install --upgrade pandas-datareader

Collecting scikeras
  Downloading scikeras-0.13.0-py3-none-any.whl.metadata (3.1 kB)
Downloading scikeras-0.13.0-py3-none-any.whl (26 kB)
Installing collected packages: scikeras
Successfully installed scikeras-0.13.0


In [3]:
# Load libraries
import numpy as np
import pandas as pd
from datetime import datetime
import pandas_datareader.data as web
from matplotlib import pyplot
from pandas.plotting import scatter_matrix
import seaborn as sns
from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR

from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.neural_network import MLPRegressor

#Libraries for Deep Learning Models
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from keras.layers import LSTM
from scikeras.wrappers import KerasClassifier

#Libraries for Statistical Models
import statsmodels.api as sm

#Libraries for Saving the Model
from pickle import dump
from pickle import load

ModuleNotFoundError: No module named 'pandas_datareader'

<a id='1.2'></a>
## 2.2. Loading the Data

In [1]:
# load dataset

world = pd.read_excel('C:\Users\gusta\Documents\PyCodes\crypto_ai_energy/World Energy Balances Highlights 2024.xlsx', sheet_name='TimeSeries_1971-2023', header=1)

bitcoin = pd.read_excel('C:\Users\gusta\Documents\PyCodes\crypto_ai_energy/Bitcoin_Energy.xlsx')

etherium = pd.read_csv('C:\Users\gusta\Documents\PyCodes\crypto_ai_energy/Historical Ethereum network annualised electricity consumption (PoS).csv')

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape (Temp/ipykernel_16988/3429415382.py, line 3)

In [None]:
# Transforming the datasets
world = world[(world.Country == 'World') & (world.Product == 'Total') & (world.Flow == 'Total final consumption (PJ)')].iloc[:,6:world.shape[1]-1]

world = world.T

world.reset_index(inplace=True)

world.columns = ['Year','Consumption']

# Converting the Energy Consumption from PJ to TWh to annalyze the data in the same base of Crypto Minning and Data Centre Energy Consumption
world['Consumption'] = pd.to_numeric(world['Consumption']) * 0.27778

# Converting the column Date and Time of Bitcoin and Etherium to date
bitcoin['Date and Time'] = pd.to_datetime(bitcoin['Date and Time'], errors='coerce')

etherium['Date and Time'] = pd.to_datetime(etherium['Date and Time'], errors='coerce')

# Conveting Etherium Consumption from GWh to TWh
etherium.iloc[:,4:] =  etherium.iloc[:,4:]/1000

etherium.columns = etherium.columns.str.split(',').str[0]

bitcoin.columns = etherium.columns.str.split(',').str[0]

# Merging both crypto datasets
crypto = pd.merge(bitcoin[['Date and Time','annualised consumption MAX']], etherium[['Date and Time','annualised consumption GUESS']], on= 'Date and Time',how='left')

# Summarising data by year
crypto_year = crypto.groupby(crypto['Date and Time'].dt.year)[['annualised consumption MAX','annualised consumption GUESS']].mean().reset_index()

crypto_year.fillna(0, inplace=True)

crypto_year['Consumption'] = crypto_year['annualised consumption MAX'] + crypto_year['annualised consumption GUESS']

crypto_year.drop(columns=['annualised consumption MAX','annualised consumption GUESS'], inplace=True)

crypto_year = crypto_year.rename(columns={'Date and Time':'Year'})

# Creating the dataframe of Data Center Energy consumption with data provided by IEA
datacenter = pd.DataFrame({'Year':list(range(2014,2025)), 'Consumption':[194, 191, 195, 195, 198, 192, 269, 300, 330, 361, 416]})

# Merging Crypto and Data Centers in a single dataframe
crypto_datacenter = pd.merge(crypto_year, datacenter, on='Year', how='inner')

crypto_datacenter['Total Consumption'] = crypto_datacenter['Consumption_x'] +  crypto_datacenter['Consumption_y']

crypto_datacenter.drop(columns=['Consumption_x', 'Consumption_y'], inplace=True)

In [None]:
#Diable the warnings
import warnings
warnings.filterwarnings('ignore')

<a id='2'></a>
# 3. Exploratory Data Analysis

<a id='2.1'></a>
## 3.1. Descriptive Statistics

In [None]:
# Crypto and Data Center shape
crypto_datacenter.shape

In [None]:
# World Energy shape
world.shape

In [None]:
# Crypto and Data Center types
pd.set_option('display.max_rows', 500)
crypto_datacenter.dtypes

In [None]:
# World Energy types
pd.set_option('display.max_rows', 500)
world.dtypes

In [None]:
# describe Crypto and Data Center data
pd.set_option('display.precision', 3)
crypto_datacenter.describe()

In [None]:
# describe World Energy data
pd.set_option('display.precision', 3)
world.describe()

<a id='2.2'></a>
## 3.2. Data Visualization

In [None]:
# Plotting data to visualize
import plotly.express as px

df = pd.merge(world, crypto_datacenter, how = 'left')
df.columns = ['Year', 'World', 'Crypto and AI']

fig = px.line(df, x='Year', y=['World','Crypto and AI'], labels={'variable':'Source'}, title="Total Energy Consumption by Source",
              line_dash_sequence=['solid', 'dash'], height=500, width=1000)
fig.show()

In [None]:
# Plotting individual charts to analyze the patterns - Crypto and Data Centers Energy Consumption
fig = px.line(crypto_datacenter, x='Year', y=['Total Consumption'], color_discrete_sequence=['red'], title="Total Energy Consumption by Crypto Minning and Datacenters", height=500, width=1000)
fig.show()

In [None]:
# Plotting individual charts to analyze the patterns - World Energy Consumption
fig = px.line(world, x='Year', y=['Consumption'], color_discrete_sequence=['blue'], title="Total World Energy Consumption", height=500, width=1000)
fig.show()

In [None]:
# histograms
df.hist(sharex=False, sharey=False, xlabelsize=1, ylabelsize=1, figsize=(10,5))

pyplot.show()

In [None]:
# density
df.plot(kind='density', subplots=True, layout=(4,4), sharex=False, legend=True, fontsize=1, figsize=(18,15))
pyplot.show()

In [None]:
#Box and Whisker Plots
df.plot(kind='box', subplots=True, layout=(4,4), sharex=False, sharey=False, figsize=(20,15))
pyplot.show()

In [None]:
# correlation
correlation = df.corr()
pyplot.figure(figsize=(8,8))
pyplot.title('Correlation Matrix')
sns.heatmap(correlation, vmax=1, square=True,annot=True,cmap='cubehelix')

In [None]:
# Scatterplot Matrix
from pandas.plotting import scatter_matrix
pyplot.figure(figsize=(12,12))
scatter_matrix(df,figsize=(12,12))
pyplot.show()

<a id='2.3'></a>
## 3.3. Time Series Analysis

Time series broken down into different time series comonent

In [None]:
Y= crypto_datacenter["Total Consumption"]
res = sm.tsa.seasonal_decompose(Y, period=1)
fig = res.plot()
fig.set_figheight(8)
fig.set_figwidth(15)
pyplot.show()

<a id='3'></a>
## 4. Data Preparation

<a id='3.1'></a>
## 4.1. Data Cleaning
Check for the NAs in the rows, either drop them or fill them with the mean of the column

In [None]:
#Checking for any null values and removing the null values'''
print('Null Values =',crypto_datacenter.isnull().values.any())
print('Null Values =',world.isnull().values.any())

Given that there are null values drop the rown contianing the null values.

In [None]:
# Drop the rows containing NA
#dataset.dropna(axis=0)
# Fill na with 0
#dataset.fillna('0')

#Filling the NAs with the mean of the column.
#dataset['col'] = dataset['col'].fillna(dataset['col'].mean())

<a id='3.3'></a>
## 4.3. Spliting train and validation set

In [None]:
Ycd= crypto_datacenter['Total Consumption']
Yw = world['Consumption']
Xcd = crypto_datacenter.loc[:,crypto_datacenter.columns != 'Total Consumption']
Xw = world.loc[:,world.columns != 'Consumption']

<a id='3.4'></a>
## 4.4. Data Transformation

<a id='3.4.1'></a>
### 4.4.1. Rescale Data
When your data is comprised of attributes with varying scales, many machine learning algorithms
can benefit from rescaling the attributes to all have the same scale. Often this is referred to
as normalization and attributes are often rescaled into the range between 0 and 1.

In [None]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
rescaledXcd = pd.DataFrame(scaler.fit_transform(Xcd))
rescaledXw = pd.DataFrame(scaler.fit_transform(Xw))
# summarize transformed data
rescaledXcd.head(5)
rescaledXw.head(5)

<a id='3.4.2'></a>
### 4.4.2. Standardize Data
Standardization is a useful technique to transform attributes with a Gaussian distribution and
differing means and standard deviations to a standard Gaussian distribution with a mean of
0 and a standard deviation of 1.

In [None]:
from sklearn.preprocessing import StandardScaler
scaler_cd = StandardScaler().fit(Xcd)
scaler_w = StandardScaler().fit(Xw)
StandardisedX_cd = pd.DataFrame(scaler_cd.fit_transform(Xcd))
StandardisedX_w = pd.DataFrame(scaler_w.fit_transform(Xw))
# summarize transformed data
StandardisedX_cd.head(5)
StandardisedX_w.head(5)

<a id='3.4.3'></a>
### 4.4.1. Normalize Data
Normalizing in scikit-learn refers to rescaling each observation (row) to have a length of 1 (called
a unit norm or a vector with the length of 1 in linear algebra).

In [None]:
from sklearn.preprocessing import Normalizer
scaler_cd = Normalizer().fit(Xcd)
scaler_w = Normalizer().fit(Xw)
NormalizedX_cd = pd.DataFrame(scaler_cd.fit_transform(Xcd))
NormalizedX_w = pd.DataFrame(scaler_w.fit_transform(Xw))
# summarize transformed data
NormalizedX_cd.head(5)
NormalizedX_w.head(5)

<a id='4'></a>
# 5. Evaluate Algorithms and Models

<a id='4.1'></a>
## 5.1. Train Test Split

In [None]:
# split out validation dataset for the end

validation_size = 0.2

#In case the data is not dependent on the time series, then train and test split randomly
seed = 7
# X_train, X_validation, Y_train, Y_validation = train_test_split(X, Y, test_size=validation_size, random_state=seed)

#In case the data is not dependent on the time series, then train and test split should be done based on sequential sample
#This can be done by selecting an arbitrary split point in the ordered list of observations and creating two new datasets.

train_size_cd = int(len(Xcd) * (1-validation_size))
train_size_w = int(len(Xw) * (1-validation_size))
X_train_cd, X_validation_cd = Xcd[0:train_size_cd], Xcd[train_size_cd:len(Xcd)]
X_train_w, X_validation_w = Xw[0:train_size_w], Xw[train_size_w:len(Xw)]
Y_train_w, Y_validation_w = Yw[0:train_size_w], Yw[train_size_w:len(Xw)]
Y_train_cd, Y_validation_cd = Ycd[0:train_size_cd], Ycd[train_size_cd:len(Xcd)]

<a id='4.2'></a>
## 5.2. Test Options and Evaluation Metrics


In [None]:
# test options for regression
num_folds = 8
scoring = 'neg_mean_squared_error'
#scoring ='neg_mean_absolute_error'
#scoring = 'r2'

<a id='4.3'></a>
## 5.3. Compare Models and Algorithms

<a id='4.3.1'></a>
### 5.3.1. Common Models

In [None]:
# spot check the algorithms
models = []
models.append(('LR', LinearRegression()))
models.append(('LASSO', Lasso()))
models.append(('EN', ElasticNet()))
models.append(('KNN', KNeighborsRegressor()))
models.append(('CART', DecisionTreeRegressor()))
models.append(('SVR', SVR()))
#Neural Network
#models.append(('MLP', MLPRegressor()))

<a id='4.3.2'></a>
### 5.3.2. Ensemble Models

In [None]:
#Ensable Models
# Boosting methods
models.append(('ABR', AdaBoostRegressor()))
models.append(('GBR', GradientBoostingRegressor()))
# Bagging methods
models.append(('RFR', RandomForestRegressor()))
models.append(('ETR', ExtraTreesRegressor()))

<a id='4.3.3'></a>
### 5.3.3. Deep Learning Model-NN Regressor

In [None]:
#Running deep learning models and performing cross validation takes time
#Set the following Flag to 0 if the Deep LEarning Models Flag has to be disabled
EnableDeepLearningRegreesorFlag = 0

def create_model(neurons=12, activation='relu', learn_rate = 0.01, momentum=0):
        # create model
        model = Sequential()
        model.add(Dense(neurons, input_dim=X_train.shape[1], activation=activation))
        #The number of hidden layers can be increased
        model.add(Dense(2, activation=activation))
        # Final output layer
        model.add(Dense(1, kernel_initializer='normal'))
        # Compile model
        optimizer = SGD(lr=learn_rate, momentum=momentum)
        model.compile(loss='mean_squared_error', optimizer='adam')
        return model

In [None]:
#Add Deep Learning Regressor
if ( EnableDeepLearningRegreesorFlag == 1):
    models.append(('DNN', KerasRegressor(build_fn=create_model, epochs=100, batch_size=100, verbose=1)))


### K-folds cross validation - Crypto and AI

In [None]:
results_cd = []
names_cd = []
for name, model in models:
    kfold = KFold(n_splits=num_folds, random_state=seed, shuffle=True)
    #converted mean square error to positive. The lower the beter
    cv_results_cd = -1* cross_val_score(model, X_train_cd, Y_train_cd, cv=kfold, scoring=scoring)
    results_cd.append(cv_results_cd)
    names_cd.append(name)
    msg = "%s: %f (%f)" % (name, cv_results_cd.mean(), cv_results_cd.std())
    print(msg)

### K-folds cross validation - World

In [None]:
results_w = []
names_w = []
for name, model in models:
    kfold = KFold(n_splits=num_folds, random_state=seed, shuffle=True)
    #converted mean square error to positive. The lower the beter
    cv_results_w = -1* cross_val_score(model, X_train_w, Y_train_w, cv=kfold, scoring=scoring)
    results_w.append(cv_results_w)
    names_w.append(name)
    msg = "%s: %f (%f)" % (name, cv_results_w.mean(), cv_results_w.std())
    print(msg)

### Algorithm comparison

In [None]:
# compare algorithms - Crypto and AI
fig = pyplot.figure()
fig.suptitle('Algorithm Comparison - Crypto and AI')
ax = fig.add_subplot(111)
pyplot.boxplot(results_cd)
ax.set_xticklabels(names_cd)
fig.set_size_inches(15,8)
pyplot.show()

The chart shows MSE. Lower the MSE, better is the model performance.

In [None]:
# compare algorithms - World
fig = pyplot.figure()
fig.suptitle('Algorithm Comparison - World')
ax = fig.add_subplot(111)
pyplot.boxplot(results_w)
ax.set_xticklabels(names_w)
fig.set_size_inches(15,8)
pyplot.show()

<a id='4.4'></a>
## 5.4. Time Series based Models- ARIMA and LSTM

<a id='4.4.1'></a>
### 5.4.1 Time Series Model - ARIMA Model

In [None]:
#Preparing data for the ARIMAX Model, seperating endogeneous and exogenous variables
# Crypto and AI
X_train_ARIMA_cd = X_train_cd
X_validation_ARIMA_cd =X_validation_cd
tr_len_cd = len(X_train_ARIMA_cd)
te_len_cd = len(X_validation_ARIMA_cd)
to_len_cd = len(Xcd)
#World
X_train_ARIMA_w = X_train_w
X_validation_ARIMA_w = X_validation_w
tr_len_w = len(X_train_ARIMA_w)
te_len_w = len(X_validation_ARIMA_w)
to_len_w = len(Xw)

In [None]:
# ARIMA Crypto and AI
from statsmodels.tsa.arima.model import ARIMA # Corrected import path
# from statsmodels.tsa.statespace.sarimax import SARIMAX # Keep this if you need SARIMAX later

from sklearn.metrics import mean_squared_error

modelARIMA_cd = ARIMA(endog = Y_train_cd, exog = X_train_ARIMA_cd, order=(1,0,0)) # Note: order is a tuple
# modelARIMA= SARIMAX(Y_train,order=(1,1,0),seasonal_order=(1,0,0,0),exog = X_train_ARIMA)

model_fit_cd = modelARIMA_cd.fit()

print(model_fit_cd.summary())

In [None]:
error_Training_ARIMA_cd = mean_squared_error(Y_train_cd, model_fit_cd.fittedvalues)
predicted_cd = model_fit_cd.predict(start = tr_len_cd -1 ,end = to_len_cd -1, exog = X_validation_ARIMA_cd)[1:]
error_Test_ARIMA_cd = mean_squared_error(Y_validation_cd, predicted_cd)
error_Test_ARIMA_cd

In [None]:
# ARIMA World
from statsmodels.tsa.arima.model import ARIMA # Corrected import path
# from statsmodels.tsa.statespace.sarimax import SARIMAX # Keep this if you need SARIMAX later

from sklearn.metrics import mean_squared_error

modelARIMA_w = ARIMA(endog=Y_train_w, exog=X_train_ARIMA_w, order=(1,0,0)) # Note: order is a tuple
# modelARIMA= SARIMAX(Y_train,order=(1,1,0),seasonal_order=(1,0,0,0),exog = X_train_ARIMA)

model_fit_w = modelARIMA_w.fit()

print(model_fit_w.summary())

In [None]:
error_Training_ARIMA_w = mean_squared_error(Y_train_w, model_fit_w.fittedvalues)
predicted_w = model_fit_w.predict(start = tr_len_w -1 ,end = to_len_w -1, exog = X_validation_ARIMA_w)[1:]
error_Test_ARIMA_w = mean_squared_error(Y_validation_w,predicted_w)
error_Test_ARIMA_w

In [None]:
#Add Cross validation if possible
# #model = build_model(_alpha=1.0, _l1_ratio=0.3)
# from sklearn.model_selection import TimeSeriesSplit
# tscv = TimeSeriesSplit(n_splits=5)
# scores = cross_val_score(modelARIMA, X_train, Y_train, cv=tscv, scoring=scoring)

<a id='4.4.2'></a>
### 5.4.2 LSTM Model

The data needs to be in 3D format for the LSTM model. So, Performing the data transform.

In [None]:
# LSTM training - Crypto and AI
X_train_LSTM_cd, X_validation_LSTM_cd = np.array(X_train_cd), np.array(X_validation_cd)
Y_train_LSTM_cd, Y_validation_LSTM_cd = np.array(Y_train_cd), np.array(Y_validation_cd)
X_train_LSTM_cd = X_train_LSTM_cd.reshape((X_train_LSTM_cd.shape[0], 1, X_train_LSTM_cd.shape[1]))
X_validation_LSTM_cd = X_validation_LSTM_cd.reshape((X_validation_LSTM_cd.shape[0], 1, X_validation_LSTM_cd.shape[1]))
print(X_train_LSTM_cd.shape, Y_train_LSTM_cd.shape, X_validation_LSTM_cd.shape, Y_validation_LSTM_cd.shape)

In [None]:
# LSTM training - World
X_train_LSTM_w, X_validation_LSTM_w = np.array(X_train_w), np.array(X_validation_w)
Y_train_LSTM_w, Y_validation_LSTM_w = np.array(Y_train_w), np.array(Y_validation_w)
X_train_LSTM_w = X_train_LSTM_w.reshape((X_train_LSTM_w.shape[0], 1, X_train_LSTM_w.shape[1]))
X_validation_LSTM_w = X_validation_LSTM_w.reshape((X_validation_LSTM_w.shape[0], 1, X_validation_LSTM_w.shape[1]))
print(X_train_LSTM_w.shape, Y_train_LSTM_w.shape, X_validation_LSTM_w.shape, Y_validation_LSTM_w.shape)

In [None]:
# design network - Crypto and AI
from matplotlib import pyplot

def create_LSTMmodel(neurons=12, learn_rate = 0.01, momentum=0.1):
        # create model
    model = Sequential()
    model.add(LSTM(50, input_shape=(X_train_LSTM_cd.shape[1], X_train_LSTM_cd.shape[2])))
    #More number of cells can be added if needed
    model.add(Dense(1))
    optimizer = SGD(learning_rate=learn_rate, momentum=0.1)
    model.compile(loss='mse', optimizer='adam')
    return model
LSTMModel_cd = create_LSTMmodel(12, learn_rate = 0.01, momentum=0)
LSTMModel_fit_cd = LSTMModel_cd.fit(X_train_LSTM_cd, Y_train_LSTM_cd, validation_data=(X_validation_LSTM_cd, Y_validation_LSTM_cd),epochs=50, batch_size=72, verbose=0, shuffle=False)# plot history


In [None]:
# design network - World
from matplotlib import pyplot

def create_LSTMmodel(neurons=12, learn_rate = 0.01, momentum=0.1):
        # create model
    model = Sequential()
    model.add(LSTM(50, input_shape=(X_train_LSTM_w.shape[1], X_train_LSTM_w.shape[2])))
    #More number of cells can be added if needed
    model.add(Dense(1))
    optimizer = SGD(learning_rate=learn_rate, momentum=0.1)
    model.compile(loss='mse', optimizer='adam')
    return model
LSTMModel_w = create_LSTMmodel(12, learn_rate = 0.01, momentum=0)
LSTMModel_fit_w = LSTMModel_w.fit(X_train_LSTM_w, Y_train_LSTM_w, validation_data=(X_validation_LSTM_w, Y_validation_LSTM_w),epochs=50, batch_size=72, verbose=0, shuffle=False)# plot history


In [None]:
#Visual plot to check if the error is reducing- Crypto and AI
pyplot.plot(LSTMModel_fit_cd.history['loss'], label='train')
pyplot.plot(LSTMModel_fit_cd.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

In [None]:
error_Training_LSTM_cd = mean_squared_error(Y_train_LSTM_cd, LSTMModel_cd.predict(X_train_LSTM_cd))
predicted_cd = LSTMModel_cd.predict(X_validation_LSTM_cd)
error_Test_LSTM_cd = mean_squared_error(Y_validation_cd, predicted_cd)
error_Test_LSTM_cd

In [None]:
#Visual plot to check if the error is reducing- World
pyplot.plot(LSTMModel_fit_w.history['loss'], label='train')
pyplot.plot(LSTMModel_fit_w.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

In [None]:
error_Training_LSTM_w = mean_squared_error(Y_train_LSTM_w, LSTMModel_cd.predict(X_train_LSTM_w))
predicted_w = LSTMModel_w.predict(X_validation_LSTM_w)
error_Test_LSTM_w = mean_squared_error(Y_validation_w, predicted_w)
error_Test_LSTM_w

### Overall Comparison of all the algorithms ( including Time Series Algorithms)

In [None]:
# compare algorithms - Crypto and AI
results_cd.append(error_Test_ARIMA_cd)
names_cd.append("ARIMA")
fig = pyplot.figure()
fig.suptitle('Algorithm Comparison-Post Time Series - Crypto and AI')
ax = fig.add_subplot(111)
pyplot.boxplot(results_cd)
ax.set_xticklabels(names_cd)
fig.set_size_inches(15,8)
pyplot.show()

Grid Search uses Cross validation which isn't appropriate for the time series models such as LSTM

In [None]:
# compare algorithms - World
results_w.append(error_Test_ARIMA_w)
names_w.append("ARIMA")
fig = pyplot.figure()
fig.suptitle('Algorithm Comparison-Post Time Series')
ax = fig.add_subplot(111)
pyplot.boxplot(results_w)
ax.set_xticklabels(names_w)
fig.set_size_inches(15,8)
pyplot.show()

<a id='5'></a>
# 6. Model Tuning and Grid Search
This section shown the Grid search for all the Machine Learning and time series models mentioned in the book.

<a id='5.1'></a>
### 6.1. Common Regression, Ensemble and DeepNNRegressor Grid Search


Grid Search Crypto and AI

In [None]:
# 8. Grid search : RandomForestRegressor - Crypto and AI
'''
n_estimators : integer, optional (default=10)
    The number of trees in the forest.
'''
param_grid = {'n_estimators': [50,100,150,200,250,300,350,400]}
model = RandomForestRegressor()
kfold = KFold(n_splits=num_folds, random_state=seed, shuffle=True)
grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring=scoring, cv=kfold)
grid_result = grid.fit(X_train_cd, Y_train_cd)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

In [None]:
# 10. Grid search : ExtraTreesRegressor - Crypto and AI
'''
n_estimators : integer, optional (default=10)
    The number of trees in the forest.
'''
param_grid = {'n_estimators': [50,100,150,200,250,300,350,400]}
model = ExtraTreesRegressor(random_state=seed)
kfold = KFold(n_splits=num_folds, random_state=seed, shuffle=True)
grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring=scoring, cv=kfold)
grid_result = grid.fit(X_train_cd, Y_train_cd)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

In [None]:
# 11. Grid search : AdaBoostRegre - Crypto and AI
'''
n_estimators : integer, optional (default=50)
    The maximum number of estimators at which boosting is terminated.
    In case of perfect fit, the learning procedure is stopped early.

learning_rate : float, optional (default=1.)
    Learning rate shrinks the contribution of each regressor by
    ``learning_rate``. There is a trade-off between ``learning_rate`` and
    ``n_estimators``.
'''
param_grid = {'n_estimators': [50,100,150,200,250,300,350,400],
             'learning_rate': [1, 2, 3]}
model = AdaBoostRegressor(random_state=seed)
kfold = KFold(n_splits=num_folds, random_state=seed, shuffle=True)
grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring=scoring, cv=kfold)
grid_result = grid.fit(X_train_cd, Y_train_cd)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Grid Search World

In [None]:
# 8. Grid search : RandomForestRegressor - World
'''
n_estimators : integer, optional (default=10)
    The number of trees in the forest.
'''
param_grid = {'n_estimators': [50,100,150,200,250,300,350,400]}
model = RandomForestRegressor()
kfold = KFold(n_splits=num_folds, random_state=seed, shuffle=True)
grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring=scoring, cv=kfold)
grid_result = grid.fit(X_train_w, Y_train_w)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

In [None]:
# 10. Grid search : ExtraTreesRegressor - World
'''
n_estimators : integer, optional (default=10)
    The number of trees in the forest.
'''
param_grid = {'n_estimators': [50,100,150,200,250,300,350,400]}
model = ExtraTreesRegressor(random_state=seed)
kfold = KFold(n_splits=num_folds, random_state=seed, shuffle=True)
grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring=scoring, cv=kfold)
grid_result = grid.fit(X_train_w, Y_train_w)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

<a id='5.2'></a>
### 6.2. Grid Search- Time Series Models

In [None]:
#Grid Search for ARIMA Model - Crypto and AI
#Change p,d and q and check for the best result

# evaluate an ARIMA model for a given order (p,d,q)
#Assuming that the train and Test Data is already defined before
def evaluate_arima_model(arima_order):
    #predicted = list()
    modelARIMA_cd = ARIMA(endog=Y_train_cd,exog=X_train_ARIMA_cd,order=arima_order)
    model_fit_cd = modelARIMA_cd.fit()
    #error on the test set
#     tr_len = len(X_train_ARIMA)
#     to_len = len(X_train_ARIMA) + len(X_validation_ARIMA)
#     predicted = model_fit.predict(start = tr_len -1 ,end = to_len -1, exog = X_validation_ARIMA)[1:]
#     error = mean_squared_error(predicted, Y_validation)
    # error on the training set
    error = mean_squared_error(Y_train_cd, model_fit_cd.fittedvalues)
    return error

# evaluate combinations of p, d and q values for an ARIMA model
def evaluate_models(p_values, d_values, q_values):
    best_score, best_cfg = float("inf"), None
    for p in p_values:
        for d in d_values:
            for q in q_values:
                order = (p,d,q)
                try:
                    mse = evaluate_arima_model(order)
                    if mse < best_score:
                        best_score, best_cfg = mse, order
                    print('ARIMA%s MSE=%.7f' % (order,mse))
                except:
                    continue
    print('Best ARIMA%s MSE=%.7f' % (best_cfg, best_score))

# evaluate parameters
p_values = [0, 1, 2, 3]
d_values = range(0, 3)
q_values = range(0, 3)
warnings.filterwarnings("ignore")
evaluate_models(p_values, d_values, q_values)

In [None]:
#Grid Search for ARIMA Model - World
#Change p,d and q and check for the best result

# evaluate an ARIMA model for a given order (p,d,q)
#Assuming that the train and Test Data is already defined before
def evaluate_arima_model(arima_order):
    #predicted = list()
    modelARIMA=ARIMA(endog=Y_train_w,exog=X_train_ARIMA_w,order=arima_order)
    model_fit = modelARIMA.fit()
    #error on the test set
#     tr_len = len(X_train_ARIMA)
#     to_len = len(X_train_ARIMA) + len(X_validation_ARIMA)
#     predicted = model_fit.predict(start = tr_len -1 ,end = to_len -1, exog = X_validation_ARIMA)[1:]
#     error = mean_squared_error(predicted, Y_validation)
    # error on the training set
    error = mean_squared_error(Y_train_w, model_fit.fittedvalues)
    return error

# evaluate combinations of p, d and q values for an ARIMA model
def evaluate_models(p_values, d_values, q_values):
    best_score, best_cfg = float("inf"), None
    for p in p_values:
        for d in d_values:
            for q in q_values:
                order = (p,d,q)
                try:
                    mse = evaluate_arima_model(order)
                    if mse < best_score:
                        best_score, best_cfg = mse, order
                    print('ARIMA%s MSE=%.7f' % (order,mse))
                except:
                    continue
    print('Best ARIMA%s MSE=%.7f' % (best_cfg, best_score))

# evaluate parameters
p_values = [0, 1, 2, 3]
d_values = range(0, 3)
q_values = range(0, 3)
#warnings.filterwarnings("ignore")
evaluate_models(p_values, d_values, q_values)

<a id='6'></a>
# 7. Finalise the Model

Let us select one of the model to finalize the data. Looking at the results for the Random Forest Model. Looking at the results for the RandomForestRegressor model

<a id='6.1'></a>
## 7.1. Results on the Test Dataset

In [None]:
# Fit model - Crypto and AI
# prepare model
#scaler = StandardScaler().fit(X_train)
#rescaledX = scaler.transform(X_train)
model_cd_et = ExtraTreesRegressor(n_estimators=50) # rbf is default kernel
model_cd_et.fit(X_train_cd, Y_train_cd)

# ARIMA
modelARIMA_cd = ARIMA(endog = Y_train_cd, exog = X_train_ARIMA_cd, order=(3,0,1)) # Note: order is a tuple
model_fit_ARIMA_cd = modelARIMA_cd.fit()
# Fitting the ARIMA model
error_Training_ARIMA_cd = mean_squared_error(Y_train_cd, model_fit_ARIMA_cd.fittedvalues)

In [None]:
# estimate accuracy on validation set - Crypto and AI
# transform the validation dataset
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
#rescaledValidationX = scaler.transform(X_validation)
predictions_et_cd = model_cd_et.predict(X_validation_cd)
print('-'*50)
print('EXTRA TREES MODEL')
print('-'*50)
print('Mean Squared Error:',mean_squared_error(Y_validation_cd, predictions_et_cd))
print('r2 Score:', r2_score(Y_validation_cd, predictions_et_cd))

print('-'*50)
print('ARIMA MODEL')
print('-'*50)
predictions_arima_cd = model_fit_ARIMA_cd.predict(start = tr_len_cd - 1, end = to_len_cd - 1, exog = X_validation_ARIMA_cd)[1:]
print('Mean Squared Error:',mean_squared_error(Y_validation_cd, predictions_arima_cd))
print('r2 Score:', r2_score(Y_validation_cd, predictions_arima_cd))

In [None]:
# Fit model - World
# prepare model
#scaler = StandardScaler().fit(X_train)
#rescaledX = scaler.transform(X_train)
model_w_et = ExtraTreesRegressor(n_estimators=50) # rbf is default kernel
model_w_et.fit(X_train_w, Y_train_w)

modelARIMA_w = ARIMA(endog = Y_train_w, exog = X_train_ARIMA_w, order=(3,0,2)) # Note: order is a tuple
model_fit_ARIMA_w = modelARIMA_w.fit()
# Fitting the ARIMA model
error_Training_ARIMA_cd = mean_squared_error(Y_train_cd, model_fit_ARIMA_cd.fittedvalues)

In [None]:
# estimate accuracy on validation set - World
# transform the validation dataset
#rescaledValidationX = scaler.transform(X_validation)
predictions_et_w = model_cd_et.predict(X_validation_w)
print('-'*50)
print('EXTRA TREES MODEL')
print('-'*50)
print('Mean Squared Error:',mean_squared_error(Y_validation_w, predictions_et_w))
print('r2 Score:', r2_score(Y_validation_w, predictions_et_w))

print('-'*50)
print('ARIMA MODEL')
print('-'*50)
predictions_arima_w = model_fit_ARIMA_w.predict(start = tr_len_w - 1, end = to_len_w - 1, exog = X_validation_ARIMA_w)[1:]
print('Mean Squared Error:',mean_squared_error(Y_validation_w, predictions_arima_w))
print('r2 Score:', r2_score(Y_validation_w, predictions_arima_w))

In [None]:
df = pd.merge(X_validation_ARIMA_w, Y_validation_w, left_index=True, right_index=True)
df_f = pd.merge(df, predictions_arima_w, left_index=True, right_index=True)
df_f

In [None]:
model_fit_ARIMA_w.fittedvalues

In [None]:
fig = px.line(df_f, x= 'Year', y=['Consumption','predicted_mean'], color_discrete_sequence=['blue', 'red'], title="Total World Energy Consumption", height=500, width=1000)
fig.show()

In [None]:
df2 = pd.merge(X_validation_ARIMA_cd, Y_validation_cd, left_index=True, right_index=True)
df2_f = pd.merge(df2, predictions_arima_cd, left_index=True, right_index=True)
fig = px.line(df2_f, x= 'Year', y=['Total Consumption','predicted_mean'], color_discrete_sequence=['blue', 'red'], title="Total World Energy Consumption", height=500, width=1000)
fig.show()

<a id='6.2'></a>
## 7.2. Variable Intuition/Feature Importance
Let us look into the Feature Importance of the Random Forest model

In [None]:
import pandas as pd
import numpy as np
model = RandomForestRegressor()
model.fit(X_train,Y_train)
print(model.feature_importances_) #use inbuilt class feature_importances of tree based regressors
#plot graph of feature importances for better visualization
feat_importances = pd.Series(model.feature_importances_, index=X.columns)
feat_importances.nlargest(10).plot(kind='barh')
pyplot.show()

<a id='6.3'></a>
## 7.3. Save Model for Later Use

In [None]:
# Save Model Using Pickle
from pickle import dump
from pickle import load

# save the model to disk
filename = 'finalized_model.sav'
dump(model, open(filename, 'wb'))

In [None]:
# some time later...
# load the model from disk
loaded_model = load(open(filename, 'rb'))
# estimate accuracy on validation set
#rescaledValidationX = scaler.transform(X_validation) #in case the data is scaled.
#predictions = model.predict(rescaledValidationX)
predictions = model.predict(X_validation)
result = mean_squared_error(Y_validation, predictions)
print(result)