# **<center>Predictive Analytics: Bidirectional Long Short-Term Memory(Bi-LSTM), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs)</center>**

<a id="TOC"></a> <br>
# <center>Table of Contents</center>

* [Importing Libraries](#0)
* [Set Seed](#1)
* [Universal Downcasting Function for Dataframes](#2)
* [Universal Basic Summary Function for Dataframes](#3)

<h4>Asset Details</h4>

* [Importing and Reading Asset Data](#4)
* [Calculate Percentage Weight for Each Coin.](#5)
* [Plotting the Amount of Weight Each Crypto, Received in the Metric.](#6)

<h4>Train Data</h4>

* [Import, read, and convert Train Data into the proper format.](#7)
* [Histroy of CRYPTO Data from 2018-2021(Sep)](#8)
* [Checking Null Values](#9)
* [Handling the Null values](#10)
* [Data Preprocessing](#11)
* [Replace Otliers](#12)
* [Spliting the dataset into train and test data](#13)
* [Train and Test Data Set Plot](#14)
* [Separating the target and dependent variables](#15)
* [Data Transformation](#16)
* [Normalization](#17)
* [Create a 3D input dataset for Sk-Learn](#18)
* [Bi-LSTM, LSTM and GRUs models](#19)
* [Fit the Models](#20)
* [Bidirectional Long Short-Term Memory Model History](#21)
* [Long Short-Term Memory Model History](#22)
* [Gated Recurrent Units Model History](#23)
* [Train and Validation Loss Plot Function](#24)
* [Train/Validation loss Plot for Bidirectional Long Short-Term Memory ](#25)
* [Train/Validation loss Plot for Long Short-Term Memory](#26)
* [Train/Validation loss Plot for Gated Recurrent Units](#27)
* [Inverse and transform the target variable](#28)
* [BiLSTM, LSTM and GRU Models Predictions](#29)
* [Actual vs Prediction Plot Function](#30)
* [Actual vs Prediction Plot for Bidirectional Long Short-Term Memory](#31)
* [Actual vs Prediction Plot for Long Short-Term Memory](#32)
* [Actual vs Prediction Plot for Gated Recurrent Units](#33)
* [Calculate RMSE and MAE for Performance](#34)
* [Future Forecasting](#35)

<a id="0"></a> <br>
# Importing Libraries

In [None]:
import random
from tensorflow.keras import regularizers
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression 
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout, Bidirectional, GRU
import warnings
warnings.filterwarnings("ignore")

**<p id="1">Set Seed</p>**
<p><li>Set global seed but the operation seed is not set here, we get different results for every call to the random op, but the same sequence for every re-run of the program.</li></p>

In [None]:
tf.random.set_seed(1234)

<a id="2"></a> <br>
# Universal Downcasting Function for Dataframes

<p>For Memory Optimizaton and Utilization.</p>

In [None]:
def downcastMemoryUsage(dataFrame):
    startMemoryOptimization = dataFrame.memory_usage().sum() / 1024 ** 2
    print('Memory usage of dataframe is: {:.2f} MB'.format(startMemoryOptimization))
    subTypeInt = ['uint8','uint16','uint32','uint64','int8','int16','int32','int64']
    subTypeFloat = ['float16','float32','float64']
    for column in dataFrame.columns:
        columnType = str(dataFrame[column].dtypes)
        maximumColumn = dataFrame[column].max()
        minimumColumn = dataFrame[column].min()
        if 'int' in columnType:
            for element in subTypeInt:
                if minimumColumn > np.iinfo(element).min and maximumColumn < np.iinfo(element).max:
                    dataFrame[column] = dataFrame[column].astype(element)
                    break
        elif 'float' in columnType:
            for element in subTypeFloat:
                if minimumColumn > np.finfo(element).min and maximumColumn < np.finfo(element).max:
                    dataFrame[column] = dataFrame[column].astype(element)
                    break
        elif 'object' in columnType:
            if column =='date':
                dataFrame['date'] = pd.to_datetime(dataFrame['date'],format='%Y-%m-%d')
            else:
                numberOfUnique = len(dataFrame[column].unique())
                numberOfTotal = len(dataFrame[column])
                if numberOfUnique / numberOfTotal < 0.5:
                    dataFrame[column] = dataFrame[column].astype('category')
    endMemoryOptimization = dataFrame.memory_usage().sum() / 1024 ** 2
    print('Memory usage after optimization is: {:.2f} MB'.format(endMemoryOptimization))
    print('Compressed by: {:.2f} %'.format(100*(startMemoryOptimization - endMemoryOptimization) / startMemoryOptimization))
    
    return dataFrame

<a id="3"></a> <br>
# Universal Basic Summary Function for Dataframes

In [None]:
def basicSummary(dataFrameForSummary):
    print(f'Shape : {dataFrameForSummary.shape}')
    summary = pd.DataFrame(dataFrameForSummary.dtypes, columns=['Data Type'])
    summary = summary.reset_index()
    summary = summary.rename(columns={'index': 'Feature'})
    summary['Num of Nulls'] = dataFrameForSummary.isnull().sum().values
    summary['Num of Unique'] = dataFrameForSummary.nunique().values
    summary['First Value'] = dataFrameForSummary.loc[0].values
    summary['Second Value'] = dataFrameForSummary.loc[1].values
    summary['Third Value'] = dataFrameForSummary.loc[2].values
    summary['Fourth Value'] = dataFrameForSummary.loc[3].values
    summary['Fifth Value'] = dataFrameForSummary.loc[4].values
    return summary

<a id="AssetData"></a> <br>
# <center>Assets Data</center>

**<p id="4">Importing and Reading Asset Data</p>**
<p><li>Provides the real name and of the cryptoasset for each Asset_ID and the weight each cryptoasset receives in the metric.</li></p>

In [None]:
assetDetailsData = pd.read_csv('../input/g-research-crypto-forecasting/asset_details.csv')
downcastMemoryUsage(assetDetailsData)

**<p id="5"><li>Calculate Percentage Weight for Each Coin.</li></p>**

In [None]:
assetDetailsData.sort_values(by=['Weight'],ascending=False,inplace=True)
# df[percent] = (df['column_name'] / df['column_name'].sum()) * 100
assetDetailsData['coinWeightPercent'] = (assetDetailsData['Weight'] / assetDetailsData['Weight'].sum()) * 100
assetDetailsData

<a id="6"></a> <br>
# Plotting the Amount of Weight Each Crypto, Received in the Metric.

In [None]:
fig = plt.figure()
ax = plt.gca()
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 5, box.height * 3])
colors = sns.color_palette('colorblind')[0:14]
labels = ['Bitcoin', 'Ethereum', 'Cardano', 'Binance Coin', 'Dogecoin', 'Bitcoin Cash', 'Litecoin', 'Ethereum Classic',
          'Stellar', 'TRON', 'Monero', 'EOS.IO', 'IOTA', 'Maker']
explode = (0.3, 0.3, 0.2, 0.2, 0.1, 0.1, 0.1, 0.0, 0.1, 0.0, 0.1, 0.0, 0.1, 0.0)
plt.pie(assetDetailsData['coinWeightPercent'], colors=colors, autopct='%.0f%%', labels=labels, explode=explode,
        startangle=30, shadow=True, textprops={'fontweight': 'semibold', 'fontsize': 15},
        wedgeprops={'linewidth': 2, 'edgecolor': 'k'}, labeldistance=1.1)
plt.title("Amount of Weight Each Crypto, Received in the Metric.", fontweight="bold", fontsize=22, pad=21)
plt.axis('equal')
plt.show()

<a id="TrainData"></a> <br>
# <center>Train Data</center>

**<p id="7">Import, Read, and Convert Train Data into the Proper Format.</p>**
<p><li>Convert "timestamp" to DateTime[s]</li></p>

In [None]:
trainData = pd.read_csv('../input/g-research-crypto-forecasting/train.csv')
trainData['DateAndTime'] = pd.to_datetime(trainData['timestamp'], unit='s',utc = True,infer_datetime_format = True,).apply(lambda x:x.tz_convert('Europe/London'))
trainData['Date'] = trainData.DateAndTime.dt.date
trainData['Date'] = trainData['Date'].astype('datetime64[ns]')
trainData.set_index(['DateAndTime'], inplace=True)
downcastMemoryUsage(trainData)

<a id="8"></a> <br>
# Histroy of CRYPTO Data from 2018-2021(Sep)

In [None]:
assetsNamesDictionary = {row["Asset_Name"]:row["Asset_ID"] for x, row in assetDetailsData.iterrows()}

assetNames = ['Bitcoin', 'Ethereum', 'Cardano', 'Binance Coin', 'Dogecoin', 'Bitcoin Cash', 'Litecoin', 'Ethereum Classic',
          'Stellar', 'TRON', 'Monero', 'EOS.IO', 'IOTA', 'Maker']

fromToListTimes = []
for crypto in assetNames:
    cryptoDataFrame = trainData[trainData["Asset_ID"] == assetsNamesDictionary[crypto]]
    fromTime = cryptoDataFrame.index[0]
    toTime = cryptoDataFrame.index[-1]
    fromToListTimes.append([crypto, fromTime, toTime])
fromToDataframe = pd.DataFrame(fromToListTimes)
fromToDataframe.columns = ["AssetName", "StartsFrom", "EndsTo"]
fromToDataframe

**<p id="9">Checking Null Values</p>**

In [None]:
trainData.isnull().sum()

**<p id="10">Handling the Null Values</p>**

In [None]:
def replace_missing (attribute):
    return attribute.interpolate(inplace=True)


replace_missing(trainData['VWAP'])
replace_missing(trainData['Target'])

trainData.isnull().sum()

<a id="11"></a> <br>
# Data Preprocessing

In [None]:
startDate = '2021-01-01'
endDate = '2021-09-21'
mask = (trainData['Date'] > startDate) & (trainData['Date'] <= endDate) & (trainData['Asset_ID'] == 1)
newTrainData = trainData.loc[mask]
newTrainData.head()

**<p>A copy of an existing list. The copy() method is added to the end of a list object and so it does not accept any parameters.</p>**

In [None]:
data_training = newTrainData[newTrainData['Date'] >= '2021-01-01'].copy()

**<p id="12">Replace Outliers</p>**
<p><li>Outliers are detected using statistical approaches. The statistical approaches presume that the data points are distributed in a normal manner. Outliers are values that fall outside of a low probability area.</li></p>
<p><li>In statistical methods, I use the concept of maximum likelihood, which means that results beyond the range of μ±2σ  are labeled as outliers. Under the assumption of normal distribution, number μ±2σ  contains 95% of the data.</li></p>

In [None]:
# Outlier detection
upperFence = data_training['Close'].mean() + 2*data_training['Close'].std()
lowwerFence = data_training['Close'].mean() - 2*data_training['Close'].std()

# Replace outlier by interpolation for base consumption
data_training.loc[data_training['Close'] > upperFence, 'Close'] = np.nan
data_training.loc[data_training['Close'] < lowwerFence, 'Close'] = np.nan
data_training['Close'].interpolate(inplace=True)

**<p id="13">Spliting the dataset into train and test data.</p>**
<p><li>I configured the first 80% of the 2021 data as train data and the remaining 20% as test data. I have used train data to train the model and test data to validate its performance.</li></p>

In [None]:
# Split train data and test data
train_size = int(len(data_training)*0.8)
train_dataset, test_dataset = data_training.iloc[:train_size],data_training.iloc[train_size:]

<a id="14"></a> <br>
# Train and Test Data Set Plot

In [None]:
fig, ax = plt.subplots(figsize = (20,10))
ax.plot(train_dataset.Close,color="#004C99")
ax.plot(test_dataset.Close,color="#D96552")
ax.set_facecolor("#D3D3D3")
plt.grid(b=True,axis = 'y')
ax.grid(b=True,axis = 'y')
plt.ylabel('USD')
plt.xlabel('Time')
plt.legend(['Train set', 'Test set'], loc='upper right',prop={'size': 15})
print('Dimension of train data: ',train_dataset.shape)
print('Dimension of test data: ', test_dataset.shape)

**<p id="15">Separating the target and dependent variables.</p>**

In [None]:
# Split train data to X and y
X_train = train_dataset.drop(['timestamp','Asset_ID','Count','VWAP','Target','Date'], axis = 1)
y_train = train_dataset.loc[:,['Close']]

# Split test data to X and y
X_test = test_dataset.drop(['timestamp','Asset_ID','Count','VWAP','Target','Date'], axis = 1)
y_test = test_dataset.loc[:,['Close']]

In [None]:
test_dataset

<a id="16"></a> <br>
# Data Transformation

In [None]:
print("X_train Dimensions:", X_train.shape)
print("y_train Dimensions:", y_train.shape)
print("X_test Dimensions:", X_test.shape)
print("y_test Dimensions:", y_test.shape)

**<p id="17">Normalization</p>**

In [None]:
# MinMaxScaler is used to normalize the data
scaler = MinMaxScaler()

# Apply the scaler to training data
X_train = scaler.fit_transform(X_train)
y_train = scaler.fit_transform(y_train)

# Apply the scaler to test data
X_test = scaler.fit_transform(X_test)
y_test = scaler.fit_transform(y_test)

In [None]:
X_test

**<p id="18">Create a 3D input dataset for Sk-Learn</p>**

In [None]:
# Create a 3D input for Scikit-Learn
def create_dataset (X, y, time_steps = 1):
    Xs, ys = [], []
    for i in range(len(X)-time_steps):
        v = X[i:i+time_steps, :]
        Xs.append(v)
        ys.append(y[i+time_steps])
    return np.array(Xs), np.array(ys)
TIME_STEPS = 30
X_test, y_test = create_dataset(X_test, y_test, TIME_STEPS)
X_train, y_train = create_dataset(X_train, y_train,TIME_STEPS)

print("X_train Dimensions:", X_train.shape)
print("y_train Dimensions:", y_train.shape)
print("X_test Dimensions:", X_test.shape)
print("y_test Dimensions:", y_test.shape)

<a id="19"></a> <br>
# Bi-LSTM, LSTM and GRUs models

In [None]:
# Create BiLSTM model
def create_model_bilstm(units):
    model = Sequential()
    model.add(Bidirectional(LSTM(units = units,return_sequences=True),input_shape=(X_train.shape[1], X_train.shape[2])))
    model.add(Dropout(0.5))
    model.add(Bidirectional(LSTM(units = units)))
    model.add(Dense(1,activation='relu',kernel_regularizer=regularizers.l2(0.01)))
    #Compile model
    model.compile(loss='mse', optimizer='adam')
    return model

# Create LSTM or GRU model
def create_model(units, m):
    model = Sequential()
    model.add(m (units = units, return_sequences = True,input_shape = [X_train.shape[1], X_train.shape[2]]))
    model.add(Dropout(0.5))
    model.add(m (units = units))
    model.add(Dropout(0.5))
    model.add(Dense(units = 1,activation='relu',kernel_regularizer=regularizers.l2(0.01)))
    #Compile model
    model.compile(loss='mse', optimizer='adam')
    return model

# BiLSTM
model_bilstm = create_model_bilstm(4)

# GRU and LSTM
model_gru = create_model(16, GRU)
model_lstm = create_model(32, LSTM)

<a id="20"></a> <br>
# Fit the Models

In [None]:
# Fit BiLSTM, LSTM and GRU
def fit_model(model):
    early_stop = tf.keras.callbacks.EarlyStopping(monitor = 'val_loss',
                                               patience = 10)
    history = model.fit(X_train, y_train, epochs = 100,  
                        validation_split = 0.2, batch_size = 1024, 
                        shuffle = False, callbacks = [early_stop])
    return history

**<p id="21">Bidirectional Long Short-Term Memory Model History</p>**

In [None]:
history_bilstm = fit_model(model_bilstm)

**<p id="22">Long Short-Term Memory Model History</p>**

In [None]:
# history_lstm = fit_model(model_lstm)

**<p id="23">Gated Recurrent Units Model History</p>**

In [None]:
# history_gru = fit_model(model_gru)

**<p id="24"><li>Train and Validation Loss Plot Function</p>**

In [None]:
# # Plot train loss and validation loss
# def plot_loss (history):
#     fig, ax = plt.subplots(figsize = (20,10))
#     ax.plot(history.history['loss'],color="#004C99")
#     ax.plot(history.history['val_loss'],color="#D96552")
#     ax.set_facecolor("#D3D3D3")
#     plt.grid(b=True,axis = 'y')
#     ax.grid(b=True,axis = 'y')
#     plt.ylabel('Loss')
#     plt.xlabel('epoch')
#     plt.legend(['Train loss', 'Validation loss'], loc='upper right',prop={'size': 15})

<a id="25"></a> <br>
# Train/Validation loss Plot for Bidirectional Long Short-Term Memory (Good Fit)

In [None]:
# plot_loss (history_bilstm)

<a id="26"></a> <br>
# Train/Validation loss Plot for Long Short-Term Memory

In [None]:
# plot_loss (history_lstm)

<a id="27"></a> <br>
# Train/Validation loss Plot for Gated Recurrent Units

In [None]:
# plot_loss (history_gru)

**<p id="28">Inverse and transform the target variable</p>**

In [None]:
y_test = scaler.inverse_transform(y_test)
y_train = scaler.inverse_transform(y_train)

**<p id="29">BiLSTM, LSTM and GRU Models Predictions</p>**

In [None]:

import gresearch_crypto
env = gresearch_crypto.make_env()
iter_test = env.iter_test()
def prediction(model):
    for (test_df, sample_prediction_df) in iter_test:
        sample_prediction_df['Target'] = model.predict(X_test)
        env.predict(sample_prediction_df)


# # Make prediction
# def prediction(model):
#     prediction = model.predict(X_test)
#     prediction = scaler.inverse_transform(prediction)
#     return prediction


# prediction_bilstm = prediction(model_bilstm)
# prediction_lstm = prediction(model_lstm)
# prediction_gru = prediction(model_gru)

**<p id="30">Actual vs Prediction Plot Function</p>**

In [None]:
# # Plot true future vs prediction
# def plot_future(prediction, y_test):
#     fig, ax = plt.subplots(figsize = (20,10))
#     range_future = len(prediction)
#     ax.plot(np.arange(range_future), np.array(y_test),label='Actual',color="#004C99")
#     ax.plot(np.arange(range_future),np.array(prediction),label='Prediction',color="#D96552")
#     ax.set_facecolor("#D3D3D3")
#     plt.grid(b=True,axis = 'y')
#     ax.grid(b=True,axis = 'y')
#     plt.ylabel('USD')
#     plt.legend(loc='upper left',prop={'size': 15})
#     plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)    

<a id="31"></a> <br>
# Actual vs Prediction Plot for Bidirectional Long Short-Term Memory

In [None]:
# plot_future(prediction_bilstm, y_test)

<a id="32"></a> <br>
# Actual vs Prediction Plot for Long Short-Term Memory

In [None]:
# plot_future(prediction_lstm, y_test)

<a id="33"></a> <br>
# Actual vs Prediction Plot for Gated Recurrent Units

In [None]:
# plot_future(prediction_gru, y_test)

**<p id="34">Calculate RMSE and MAE for Performance</p>**

In [None]:
# # Define a function to calculate MAE and RMSE
# def evaluate_prediction(predictions, actual, model_name):
#     errors = predictions - actual
#     mse = np.square(errors).mean()
#     rmse = np.sqrt(mse)
#     mae = np.abs(errors).mean()
#     print(model_name + ':')
#     print('Mean Absolute Error: {:.4f}'.format(mae))
#     print('Root Mean Square Error: {:.4f}'.format(rmse))
#     print('')
    
    
# evaluate_prediction(prediction_bilstm, y_test, 'Bidirectional LSTM')
# evaluate_prediction(prediction_lstm, y_test, 'LSTM')
# evaluate_prediction(prediction_gru, y_test, 'GRU')

<a id="35"></a> <br>
# Future Forecasting

In [None]:
# # Import new CRYPTO data 
# newinput = pd.read_csv('new.csv', parse_dates=['Date'], index_col = 'Date')

# # Order of the variable are important
# X_new = newinput.loc['2022-01-01':'2032-01-01',:] 
# X_new

In [None]:
## Plot histoy and future data
# def plot_history_future(y_train, prediction):
#     fig, ax = plt.subplots(figsize = (20,10))
#     range_history = len(y_train)
#     range_future = list(range(range_history, range_history + len(prediction)))
#     ax.plt.plot(np.arange(range_history), np.array(y_train),label='History',color="#004C99")
#     ax.plt.plot(range_future, np.array(prediction),label='Prediction',color="#D96552")
#     ax.set_facecolor("#D3D3D3")
#     plt.grid(b=True,axis = 'y')
#     ax.grid(b=True,axis = 'y')
#     plt.ylabel('USD')
#     plt.legend(loc='upper left',prop={'size': 15})
#     plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)

In [None]:
# # Multi-step forecasting 
# def forecast(X_input, time_steps):
#     X = X_train.transform(X_input)
#     Xs = []
#     for i in range(len(X) - time_steps):
#         v = X[i:i+time_steps, :]
#         Xs.append(v)
        
#     X_transformed = np.array(Xs)
    
#     prediction = model_bilstm.predict(X_transformed)
#     prediction_actual = scaler.inverse_transform(prediction)
#     return prediction_actual

# prediction = forecast(X_new, TIME_STEPS)
# plot_history_future(y_train, prediction)

In [None]:
trainData.head()

In [None]:
# startDateEthereum = '2021-01-01'
# endDateEthereum = '2021-09-21'
# maskEthereum  = (trainData['Date'] > startDateEthereum) & (trainData['Date'] <= endDateEthereum) & (trainData['Asset_ID'] == 6)
# newTrainDataEthereum  = trainData.loc[maskEthereum]
# newTrainDataEthereum .head()

In [None]:
# dataTrainingEthereum = newTrainDataEthereum[newTrainDataEthereum['Date'] >= '2021-01-01'].copy()

In [None]:
# # Outlier detection
# upperFenceEthereum = dataTrainingEthereum['Close'].mean() + 2*dataTrainingEthereum['Close'].std()
# lowwerFenceEthereum = dataTrainingEthereum['Close'].mean() - 2*dataTrainingEthereum['Close'].std()

# # Replace outlier by interpolation for base consumption
# dataTrainingEthereum.loc[dataTrainingEthereum['Close'] > upperFenceEthereum, 'Close'] = np.nan
# dataTrainingEthereum.loc[dataTrainingEthereum['Close'] < lowwerFenceEthereum, 'Close'] = np.nan
# dataTrainingEthereum['Close'].interpolate(inplace=True)

In [None]:
# # Split train data and test data
# train_size_ethereum = int(len(dataTrainingEthereum)*0.8)
# train_dataset_ethereum, test_dataset_ethereum = dataTrainingEthereum.iloc[:train_size_ethereum],dataTrainingEthereum.iloc[train_size_ethereum:]

In [None]:
# fig, ax = plt.subplots(figsize = (20,10))
# ax.plot(train_dataset_ethereum.Close,color="#004C99")
# ax.plot(test_dataset_ethereum.Close,color="#D96552")
# ax.set_facecolor("#D3D3D3")
# plt.grid(b=True,axis = 'y')
# ax.grid(b=True,axis = 'y')
# plt.ylabel('USD')
# plt.xlabel('Time')
# plt.legend(['Train set', 'Test set'], loc='upper right',prop={'size': 15})
# print('Dimension of train data: ',train_dataset.shape)
# print('Dimension of test data: ', test_dataset.shape)

In [None]:
# # Split train data to X and y
# xTrainEthereum = train_dataset_ethereum.drop(['timestamp','Asset_ID','Count','VWAP','Target','Date'], axis = 1)
# yTrainEthereum = train_dataset_ethereum.loc[:,['Close']]

# # Split test data to X and y
# xTestEthereum = test_dataset_ethereum.drop(['timestamp','Asset_ID','Count','VWAP','Target','Date'], axis = 1)
# yTestEthereum = test_dataset_ethereum.loc[:,['Close']]

# # #output
# # yEthereumOutput= test_dataset_ethereum[['Close']]
 
# # #input
# # xEthereumInput=test_dataset_ethereum.drop(['timestamp','Asset_ID','Count','VWAP','Target','Date'],axis=1)

# # #splitting
# # xTrainEthereum,xTestEthereum,yTrainEthereum,yTestEthereum=train_test_split(xEthereumInput,yEthereumOutput,test_size=0.2)

In [None]:
# #printing shapes of testing and training sets :
# print("shape of original dataset :", dataTrainingEthereum.shape)
# print("shape of input - training set", xTrainEthereum.shape)
# print("shape of output - training set", yTrainEthereum.shape)
# print("shape of input - testing set", xTestEthereum.shape)
# print("shape of output - testing set", yTestEthereum.shape)

In [None]:
# # MinMaxScaler is used to normalize the data
# scaler = MinMaxScaler()

# # Apply the scaler to training data
# xTrainEthereum = scaler.fit_transform(xTrainEthereum)
# yTrainEthereum = scaler.fit_transform(yTrainEthereum)

# # Apply the scaler to test data
# xTestEthereum = scaler.fit_transform(xTestEthereum)
# yTestEthereum = scaler.fit_transform(yTestEthereum)

In [None]:
# # Create a 3D input for Scikit-Learn
# def create_dataset_ethereum (X, y, time_steps = 1):
#     Xs, ys = [], []
#     for i in range(len(X)-time_steps):
#         v = X[i:i+time_steps, :]
#         Xs.append(v)
#         ys.append(y[i+time_steps])
#     return np.array(Xs), np.array(ys)
# TIME_STEPS = 30
# xTestEthereum, yTestEthereum = create_dataset_ethereum(xTestEthereum, yTestEthereum, TIME_STEPS)
# xTrainEthereum, yTrainEthereum = create_dataset_ethereum(xTrainEthereum, yTrainEthereum,TIME_STEPS)

# print("X_train Dimensions:", xTrainEthereum.shape)
# print("y_train Dimensions:", yTrainEthereum.shape)
# print("X_test Dimensions:", xTestEthereum.shape)
# print("y_test Dimensions:", yTestEthereum.shape)

In [None]:
# # Create BiLSTM model
# def create_model_bilstm_ethereum(units):
#     model = Sequential()
#     model.add(Bidirectional(LSTM(units = units,return_sequences=True),input_shape=(xTrainEthereum.shape[1], xTrainEthereum.shape[2])))
# #     model.add(Dropout(0.5))
#     model.add(Bidirectional(LSTM(units = units)))
#     model.add(Dense(1))
#     #Compile model
#     model.compile(loss='mse', optimizer='adam')
#     return model

# # Create LSTM or GRU model
# def create_model_ethereum(units, m):
#     model = Sequential()
#     model.add(m (units = units, return_sequences = True,input_shape = [xTrainEthereum.shape[1], xTrainEthereum.shape[2]]))
#     model.add(Dropout(0.5))
#     model.add(m (units = units))
#     model.add(Dropout(0.5))
#     model.add(Dense(units = 1,activation='relu',kernel_regularizer=keras.regularizers.l2(0.01)))
#     #Compile model
#     model.compile(loss='mse', optimizer='adam')
#     return model

# # BiLSTM
# model_bilstm_ethereum = create_model_bilstm_ethereum(4)

# # GRU and LSTM
# model_gru_ethereum = create_model_ethereum(16, GRU)
# model_lstm_ethereum = create_model_ethereum(16, LSTM)

In [None]:
# # Fit BiLSTM, LSTM and GRU
# def fit_model_ethereum(model):
#     early_stop = keras.callbacks.EarlyStopping(monitor = 'val_loss',
#                                                patience = 10)
#     historyEthereum = model.fit(xTrainEthereum, yTrainEthereum, epochs = 100,  
#                         validation_split = 0.2, batch_size = 1024, 
#                         shuffle = False, callbacks = [early_stop])
#     return historyEthereum

In [None]:
# history_bilstm_ethereum = fit_model_ethereum(model_bilstm_ethereum)

In [None]:
# history_lstm_ethereum = fit_model_ethereum(model_lstm_ethereum)

In [None]:
# history_gru_ethereum = fit_model_ethereum(model_gru_ethereum)

In [None]:
# # Plot train loss and validation loss
# def plot_loss_ethereum (historyEthereum):
#     fig, ax = plt.subplots(figsize = (20,10))
#     ax.plot(historyEthereum.history['loss'],color="#004C99")
#     ax.plot(historyEthereum.history['val_loss'],color="#D96552")
#     ax.set_facecolor("#D3D3D3")
#     plt.grid(b=True,axis = 'y')
#     ax.grid(b=True,axis = 'y')
#     plt.ylabel('Loss')
#     plt.xlabel('epoch')
#     plt.legend(['Train loss', 'Validation loss'], loc='upper right',prop={'size': 15})

In [None]:
# plot_loss_ethereum (history_bilstm_ethereum)

In [None]:
# plot_loss_ethereum (history_lstm_ethereum)

In [None]:
# plot_loss_ethereum (history_gru_ethereum)

In [None]:
# yTestEthereum = scaler.inverse_transform(yTestEthereum)
# yTrainEthereum = scaler.inverse_transform(yTrainEthereum)

In [None]:
# # Make prediction
# def predictionEthereum(model):
#     predictionEthereum = model.predict(xTestEthereum)
#     predictionEthereum = scaler.inverse_transform(predictionEthereum)
#     return predictionEthereum


# prediction_bilstm_ethereum = predictionEthereum(model_bilstm_ethereum)
# prediction_lstm_ethereum = predictionEthereum(model_lstm_ethereum)
# prediction_gru_ethereum = predictionEthereum(model_gru_ethereum)

In [None]:
# # Plot true future vs prediction
# def plot_future_ethereum(predictionEthereum, yTestEthereum):
#     fig, ax = plt.subplots(figsize = (20,10))
#     range_future_ethereum = len(predictionEthereum)
#     ax.plot(np.arange(range_future_ethereum), np.array(yTestEthereum),label='Actual',color="#004C99")
#     ax.plot(np.arange(range_future_ethereum),np.array(predictionEthereum),label='Prediction',color="#D96552")
#     ax.set_facecolor("#D3D3D3")
#     plt.grid(b=True,axis = 'y')
#     ax.grid(b=True,axis = 'y')
#     plt.ylabel('USD')
#     plt.legend(loc='upper left',prop={'size': 15})
#     plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)    

In [None]:
# plot_future_ethereum(prediction_bilstm_ethereum, yTestEthereum)

In [None]:
# plot_future_ethereum(prediction_lstm_ethereum, yTestEthereum)

In [None]:
# plot_future_ethereum(prediction_gru_ethereum, yTestEthereum)

In [None]:
# # Define a function to calculate MAE and RMSE
# def evaluate_prediction_ethereum(predictions, actual, model_name):
#     errors = predictions - actual
#     mse = np.square(errors).mean()
#     rmse = np.sqrt(mse)
#     mae = np.abs(errors).mean()
#     print(model_name + ':')
#     print('Mean Absolute Error: {:.4f}'.format(mae))
#     print('Root Mean Square Error: {:.4f}'.format(rmse))
#     print('')
    
    
# evaluate_prediction_ethereum(prediction_bilstm_ethereum, yTestEthereum, 'Bidirectional LSTM')
# evaluate_prediction_ethereum(prediction_lstm_ethereum, yTestEthereum, 'LSTM')
# evaluate_prediction_ethereum(prediction_gru_ethereum, yTestEthereum, 'GRU')