In this notebook, we'll be using a GRU model for a time series prediction task and we will compare the performance of the GRU model against an LSTM model as well. The dataset that we will be using is the Hourly Energy Consumption dataset which can be found on [Kaggle](https://www.kaggle.com/robikscube/hourly-energy-consumption). The dataset contains power consumption data across different regions around the United States recorded on an hourly basis.

You can run the code implementation in this article on FloydHub using their GPUs on the cloud by clicking the following link and using the main.ipynb notebook.

[![Run on FloydHub](https://static.floydhub.com/button/button-small.svg)](https://floydhub.com/run?template=https://github.com/gabrielloye/https://github.com/gabrielloye/GRU_Prediction)

This will speed up the training process significantly. Alternatively, the link to the GitHub repository can be found [here]().

The goal of this implementation is to create a model that can accurately predict the energy usage in the next hour given historical usage data. We will be using both the GRU and LSTM model to train on a set of historical data and evaluate both models on an unseen test set. To do so, we’ll start with feature selection, data-preprocessing, followed by defining, training and eventually evaluating the models.

We will be using the PyTorch library to implement both types of models along with other common Python libraries used in data analytics.

In [1]:
#https://blog.floydhub.com/gru-with-pytorch/

import os
import time
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import random
from datetime import datetime

import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
import torch.nn.functional as F
import torch.optim as optim

from tqdm import tqdm_notebook
from sklearn.preprocessing import MinMaxScaler,QuantileTransformer
from sklearn.model_selection import train_test_split






### Local ###
#from data_processing import *



# Define data root directory

#data_dir = "./data/"
#print(os.listdir(data_dir))

In /home/scott/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The text.latex.preview rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In /home/scott/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The mathtext.fallback_to_cm rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In /home/scott/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: Support for setting the 'mathtext.fallback_to_cm' rcParam is deprecated since 3.3 and will be removed two minor releases later; use 'mathtext.fallback : 'cm' instead.
In /home/scott/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The validate_bool_maybe_none function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In /home/scott/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylel

We have a total of **12** *.csv* files containing hourly energy trend data (*'est_hourly.paruqet'* and *'pjm_hourly_est.csv'* are not used). In our next step, we will be reading these files and pre-processing these data in this order:
- Getting the time data of each individual time step and generalizing them
    - Hour of the day *i.e. 0-23*
    - Day of the week *i.e. 1-7*
    - Month *i.e. 1-12*
    - Day of the year *i.e. 1-365*
    
    
- Scale the data to values between 0 and 1
    - Algorithms tend to perform better or converge faster when features are on a relatively similar scale and/or close to normally distributed
    - Scaling preserves the shape of the original distribution and doesn't reduce the importance of outliers.
    
    
- Group the data into sequences to be used as inputs to the model and store their corresponding labels
    - The **sequence length** or **lookback period** is the number of data points in history that the model will use to make the prediction
    - The label will be the next data point in time after the last one in the input sequence
    

- The inputs and labels will then be split into training and test sets

In [2]:
choppeddata=pd.read_csv('choppeddata_9_21_2021.csv')#.head()

choppedheaders=[]
lookback=11 #save only the last 11 timesteps
for i in range(lookback):  
    label=str(i)
    choppedheaders.append("header"+label)

#put chopped data in np.arrays
State=np.zeros((96,5,11)) #96 runs,with 5 sets of data (x,y,z,roll,pitch) each, and each run is 11 timesteps long
Labels=np.zeros((96,11)) #96 runs, each run is 11 timesteps long
runcounter=0

for i in range(0,575,6):
            State[runcounter][0][:]=(choppeddata[choppedheaders[:]].iloc[i]).tolist()
            State[runcounter][1][:]=(choppeddata[choppedheaders[:]].iloc[i+1]).tolist()
            State[runcounter][2][:]=(choppeddata[choppedheaders[:]].iloc[i+2]).tolist()
            State[runcounter][3][:]=(choppeddata[choppedheaders[:]].iloc[i+3]).tolist()
            State[runcounter][4][:]=(choppeddata[choppedheaders[:]].iloc[i+4]).tolist()
            Labels[runcounter][:]=(choppeddata[choppedheaders[:]].iloc[i+5]).tolist()  #labels   
            runcounter+=1
print(State[0])
print(Labels[0])

[[0.27735378 0.26661055 0.40000611 0.25023186 0.27677533 0.41424219
  0.37389517 0.27581724 0.3110282  0.31710912 0.46400543]
 [0.56315052 0.57954961 0.5818278  0.58090968 0.53162291 0.54802549
  0.55059242 0.54762166 0.52668456 0.51825401 0.53976031]
 [0.85986378 0.78037356 0.77334782 0.77890898 0.78463198 0.77979283
  0.93127611 0.93926496 0.93533267 0.94344913 0.93128845]
 [0.46247134 0.44671463 0.45213433 0.44932497 0.48087943 0.48424637
  0.48025341 0.47566825 0.49441604 0.49760116 0.49163918]
 [0.29442925 0.28496873 0.37848349 0.28663437 0.29311135 0.39597582
  0.36943065 0.29519415 0.31469048 0.32070428 0.42058123]]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [3]:
#X= range(0,575,6)
#y= range(0,575,6)

X=State
y=Labels
 
random_seed=int(time.time())
#print(int(time.time()))
train_x, test_x, train_y,test_y = train_test_split(X, y, test_size=0.33, random_state=random_seed)
print("Train")
print(train_x[0])
print(train_y[0])
print("Test")
print(test_x[0])
print(test_y[0])


Train
[[0.51183473 0.46222369 0.48206075 0.48050173 0.55946227 0.55813073
  0.60989736 0.58846769 0.6215347  0.51098129 0.55628836]
 [0.69961641 0.7175335  0.75191146 0.72680027 0.73684956 0.78262294
  0.75973344 0.7606505  0.74575952 0.74026182 0.75378531]
 [0.96243408 0.94080944 0.95356181 0.95758873 0.9617879  0.95752641
  0.94652153 0.95101888 0.95271051 0.95383118 0.9358358 ]
 [0.33876607 0.32425633 0.29917555 0.31009013 0.31528458 0.27944411
  0.28687748 0.28835731 0.2971522  0.30142814 0.28823484]
 [0.54240844 0.51242499 0.51466233 0.51784909 0.56372218 0.56955146
  0.61123468 0.60252763 0.63136454 0.56049055 0.57762667]]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Test
[[0.45419251 0.46686163 0.45483211 0.46185944 0.51856605 0.52281725
  0.41090235 0.42638899 0.41858621 0.42139499 0.45600691]
 [0.6784647  0.62306744 0.60217101 0.56717729 0.58446159 0.59226002
  0.58623637 0.58682605 0.5680028  0.57527609 0.56401073]
 [0.9483827  0.94431913 0.94400126 0.93889676 0.94132392 0.93755264
  0

In [4]:
print(train_x.shape)  #example was (980185, 90, 5)

(64, 5, 11)


We have a total of 980,185 sequences of training data

To improve the speed of our training, we can process the data in batches so that the model does not need to update its weights as frequently. The Torch *Dataset* and *DataLoader* classes are useful for splitting our data into batches and shuffling them.

In [5]:
batch_size = 8
#a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.

train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size, drop_last=True)

test_data   = TensorDataset( torch.from_numpy( test_x ), torch.from_numpy( test_y ) )
test_loader = DataLoader( test_data, shuffle = True, batch_size = batch_size, drop_last = True )


In [6]:
print(train_loader)

<torch.utils.data.dataloader.DataLoader object at 0x7f5c5d84e0f0>


We can also check if we have any GPUs to speed up our training time by many folds. If you’re using FloydHub with GPU to run this code, the training time will be significantly reduced.

In [7]:
# torch.cuda.is_available() checks and returns a Boolean True if a GPU is available, else it'll return False
is_cuda = torch.cuda.is_available()

# If we have a GPU available, we'll set our device to GPU. We'll use this device variable later in our code.
if is_cuda:
    device = torch.device("cuda")
else:
    device = torch.device("cpu")
    
print(device)



def get_torch_device( v=0 ):
    # torch.cuda.is_available() checks and returns a Boolean True if a GPU is available, else it'll return False
    is_cuda = torch.cuda.is_available()
    # If we have a GPU available, we'll set our device to GPU. We'll use this device variable later in our code.
    if is_cuda:
        device = torch.device("cuda")
        if v:  print( "CUDA Available!" )
    else:
        device = torch.device("cpu")
        if v:  print( "NO CUDA" )
    return device

cuda


In [30]:

class GRUNet( nn.Module ):
    """ Simplest GRU Imaginable, with a dense output layer for classification """
    # NOTE: This is taken directly from Gabriel Loye:  https://blog.floydhub.com/gru-with-pytorch/
    
    def __init__( self, input_dim, hidden_dim, output_dim, n_layers, drop_prob = 0.2 ):
        """ Set up the layers and their parameters """
        print( "GRU with Input:", input_dim, ", Hidden:", hidden_dim, ", Output:", output_dim, "...\n" )
        super().__init__()
        # 0. Get device
        self.device = get_torch_device()
        self.cuda( self.device )
        # 1. Set params
        self.hidden_dim = hidden_dim
        self.n_layers   = n_layers
        # 2. Create layers
        self.gru  = nn.GRU( input_dim, hidden_dim, n_layers, batch_first = True, dropout = drop_prob )
        self.fc   = nn.Linear( hidden_dim, output_dim )
        self.sfmx = nn.Softmax( dim=1 ) # https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html
        self.gru.to( self.device )
        self.fc.to( self.device )
        self.sfmx.to( self.device )
        self.h_np = None
        self.h_pt = None
        
    def forward( self, x ):
        """ Return label and hidden state predictions based on input vectors of features and a hidden state """
        
        with torch.no_grad():
            
            if type( x ) != torch.Tensor:
                x = torch.tensor( 
                    np.array([x]) 
                )
                
            x = x.float().to( self.device )
            self.h_pt.to( self.device )
            
            out, self.h_pt = self.gru( 
                x, 
                self.h_pt 
            )
            out = self.fc( self.sfmx( out[:,-1] ) )
            return out
    
    def init_hidden( self, batch_size = 1 ):
        """ Set all weights to zero to support a new training session """
        weight = next( self.parameters() ).data
        self.h_pt = weight.new( self.n_layers, batch_size, self.hidden_dim ).zero_().to( self.device )
    
    def init_hddn_half( self, batch_size = 1 ):
        """ Set all weights to 0.5 to support a new classification session """
        weight = next( self.parameters() ).data
        self.h_pt = weight.new( self.n_layers, batch_size, self.hidden_dim ).fill_( 0.5 ).to( self.device )
        
    def get_hidden_as_np( self ):
        """ Get the current hidden state as a numpy array """
        return self.h_pt.to("cpu").detach().numpy()
    
    def get_hidden_as_pt( self ):
        """ Get the current hidden state as a numpy array """
        return self.h_pt.to("cpu").detach()
    
    
    
##### GRU Training and Evaluation ##############
    
    
def train( train_loader, learn_rate, batch_size=8, 
           hidden_dim = 256, output_dim = 11 , n_layers = 2,  EPOCHS = 5, alpha = 0.85 ):
    """ Train the model for the specified number of epochs """
    print( "Training...\n" )
    
    device = get_torch_device()
    
    # Setting common hyperparameters
    input_dim  = next( iter(train_loader) )[0].shape[2]
    print("input_dim",input_dim)
    
    # Instantiating the models
    model = GRUNet( input_dim, hidden_dim, output_dim, n_layers )
    model.to( device )
    
    # Defining loss function and optimizer
    criterion = nn.MSELoss()
    optimizer = optim.Adam( model.parameters(), lr = learn_rate )
    model.train() # Set model to training mode
    
    print( "Starting Training of GRU model ..." )
    epoch_times = []
    
    _SMOOTH = 0
    
    # 1. For each epoch
    for epoch in range( 1, EPOCHS+1 ):
        
        # 2. Reset hidden state
        #print("batch_size",batch_size)
        h = model.init_hidden( batch_size )
        print("h",h)
        # 3. Init bookkeeping
        start_time = time.monotonic()
        avg_loss   = 0.0
        counter    = 0
        
        # 4. For each example
        for x, label in train_loader:
            counter += 1
            
            # 5. Reset gradient
            model.zero_grad()
            
            if _SMOOTH:
                hp = h.data
                out, hn = model( x.to(self.device).float(), hp )
                h       = alpha*hn  + (1-alpha)*hp
            else:
                h = h.data
                # 6. Get predict and new hidden state
                out, h = model( x.to(device).float(), h )
            
            # 7. Calc loss and run opt update
            loss = criterion( out, label.to(device).float() )
            loss.backward()
            optimizer.step()
            avg_loss += loss.item()
            
            # Report Progress
            if counter%500 == 0:
                print( "Epoch {}......Step: {}/{}....... Average Loss for Epoch: {}".format(
                    epoch, counter, len(train_loader), avg_loss/counter
                ))
        
        # Report Epoch
        current_time = time.monotonic()
        print("Epoch {}/{} Done, Total Loss: {}".format(epoch, EPOCHS, avg_loss/len(train_loader)))
        print("Time Elapsed for Epoch: {} seconds".format(str(current_time-start_time)))
        epoch_times.append( current_time - start_time )
    
    # Final report and return trained model
    print( "Total Training Time: {} seconds".format( str( sum( epoch_times ) ) ) )
    return model


# def evaluate( model, test_x, test_y, label_scalers ):
def evaluate( model, test_x, test_y ):
    """ Evaluate a trained model """
    print( "Evaluating...\n" )
    
    device = get_torch_device()
    
    # Set model for evaluation mode
    model.eval()
   
    # Init Bookkeeping
    outputs    = []
    targets    = []
    start_time = time.monotonic()
    
    # 1. For each example in the test set
    for i in range( len( test_x ) ):
        
        # 2. Fetch test sample
        inp  = torch.from_numpy( np.array( test_x[i] ) )
        labs = torch.from_numpy( np.array( test_y[i] ) )
        
        # 3. Fetch hidden state and predict
        h      = model.init_hidden( inp.shape[0] )
        out, h = model( inp.to(device).float(), h )
        
        # 4. Package the target and prediction in comparable formats
#         outputs.append( label_scalers[i].inverse_transform( out.cpu().detach().numpy()).reshape(-1) )
#         targets.append( label_scalers[i].inverse_transform( labs.numpy()).reshape(-1) )
        outputs.append( out.cpu().detach().numpy().reshape(-1) )
        targets.append( labs.numpy().reshape(-1) )
        
    # Report on evaluation results
    print( "Evaluation Time: {}".format( str( time.monotonic() - start_time ) ) )
    sMAPE = 0
    for i in range( len( outputs ) ):
        sMAPE += np.mean( abs( outputs[i]-targets[i] )/( targets[i]+outputs[i] )/2 ) / len( outputs )
    print( "sMAPE: {}%".format( sMAPE*100 ) )
    return outputs, targets, sMAPE

In [31]:
lr = 0.001
gru_model = train(train_loader, lr, 16)


Training...

input_dim 11
GRU with Input: 11 , Hidden: 256 , Output: 11 ...

Starting Training of GRU model ...
h None


AttributeError: 'NoneType' object has no attribute 'data'

# Old Stuff below------------------------------------------------

In [55]:
#device = torch.device("cpu")

Next, we'll be defining the structure of the GRU and LSTM models. Both models have the same structure, with the only difference being the **recurrent layer** (GRU/LSTM) and the initializing of the hidden state. The hidden state for the LSTM is a tuple containing both the **cell state** and the **hidden state**, whereas the GRU only has a single hidden state.

In [8]:
class GRUNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, n_layers, drop_prob=0.2):
        super(GRUNet, self).__init__()
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        
        self.gru = nn.GRU(input_dim, hidden_dim, n_layers, batch_first=True, dropout=drop_prob)
        self.fc = nn.Linear(hidden_dim, output_dim)
        self.relu = nn.ReLU()
        
    def forward(self, x, h):
        out, h = self.gru(x, h)
        out = self.fc(self.relu(out[:,-1]))
        return out, h
    
    def init_hidden(self, batch_size):
        weight = next(self.parameters()).data
        hidden = weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(device)
        return hidden
    

class LSTMNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, n_layers, drop_prob=0.2):
        super(LSTMNet, self).__init__()
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        
        self.lstm = nn.LSTM(input_dim, hidden_dim, n_layers, batch_first=True, dropout=drop_prob)
        self.fc = nn.Linear(hidden_dim, output_dim)
        self.relu = nn.ReLU()
        
    def forward(self, x, h):
        out, h = self.lstm(x, h)
        out = self.fc(self.relu(out[:,-1]))
        return out, h
    
    def init_hidden(self, batch_size):
        weight = next(self.parameters()).data
        hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(device),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(device))
        return hidden


The training process is defined in a function below so that we can reproduce it for both models. Both models will have the same number of **dimensions** in the *hidden state* and *layers*, trained over the same number of **epochs** and **learning rate**, and trained and tested on the exact same set of data.

For the purpose of comparing the performance of both models as well, we'll being tracking the time it takes for the model to train and eventually comparing the final accuracy of both models on the test set. For our accuracy measure, we'll use *Symmetric Mean Absolute Percentage Error (sMAPE)* to evaluate the models. *sMAPE* is the sum of the **absolute difference** between the predicted and actual values divided by the average of the predicted and actual value, therefore giving a percentage measuring the amount of error. 

This is the formula for *sMAPE*:

$sMAPE = \frac{100%}{n} \sum_{t=1}^n \frac{|F_t - A_t|}{(|F_t + A_t|)/2}$

In [14]:
def train(train_loader, learn_rate, hidden_dim=256, EPOCHS=5, model_type="GRU"):
    
    # Setting common hyperparameters
    input_dim = next(iter(train_loader))[0].shape[2]
    output_dim = 1
    n_layers = 2
    # Instantiating the models
    if model_type == "GRU":
        model = GRUNet(input_dim, hidden_dim, output_dim, n_layers)
    else:
        model = LSTMNet(input_dim, hidden_dim, output_dim, n_layers)
    model.to(device)
    
    # Defining loss function and optimizer
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learn_rate)
    
    model.train()
    print("Starting Training of {} model".format(model_type))
    epoch_times = []
    # Start training loop
    for epoch in range(1,EPOCHS+1):
        start_time = time.clock()
        h = model.init_hidden(batch_size)
        avg_loss = 0.
        counter = 0
        for x, label in train_loader:
            counter += 1
            if model_type == "GRU":
                h = h.data
            else:
                h = tuple([e.data for e in h])
            model.zero_grad()
            
            out, h = model(x.to(device).float(), h)
            loss = criterion(out, label.to(device).float())
            loss.backward()
            optimizer.step()
            avg_loss += loss.item()
            if counter%200 == 0:
                print("Epoch {}......Step: {}/{}....... Average Loss for Epoch: {}".format(epoch, counter, len(train_loader), avg_loss/counter))
        current_time = time.clock()
        print("Epoch {}/{} Done, Total Loss: {}".format(epoch, EPOCHS, avg_loss/len(train_loader)))
        print("Time Elapsed for Epoch: {} seconds".format(str(current_time-start_time)))
        epoch_times.append(current_time-start_time)
    print("Total Training Time: {} seconds".format(str(sum(epoch_times))))
    return model

def evaluate(model, test_x, test_y):#, label_scalers):
    model.eval()
    outputs = []
    targets = []
    start_time = time.clock()
    for i in range( len( test_x )): #in test_x.keys():
         
        inp = torch.from_numpy(np.array(test_x[i]))
        labs = torch.from_numpy(np.array(test_y[i]))
        h = model.init_hidden(inp.shape[0])
        out, h = model(inp.to(device).float(), h)
        #outputs.append(label_scalers[i].inverse_transform(out.cpu().detach().numpy()).reshape(-1))
        #targets.append(label_scalers[i].inverse_transform(labs.numpy()).reshape(-1))
        outputs.append( out.cpu().detach().numpy().reshape(-1) )
        targets.append( labs.numpy().reshape(-1) )
    print("Evaluation Time: {}".format(str(time.clock()-start_time)))
    sMAPE = 0
    for i in range(len(outputs)):
        sMAPE += np.mean(abs(outputs[i]-targets[i])/(targets[i]+outputs[i])/2)/len(outputs)
    print("sMAPE: {}%".format(sMAPE*100))
    return outputs, targets, sMAPE

In [10]:
lr = 0.001
gru_model = train(train_loader, lr, model_type="GRU")

Starting Training of GRU model
Epoch 1/5 Done, Total Loss: 0.05266908719204366
Time Elapsed for Epoch: 0.029360000000000497 seconds
Epoch 2/5 Done, Total Loss: 0.052190449787303805
Time Elapsed for Epoch: 0.019283999999999857 seconds
Epoch 3/5 Done, Total Loss: 0.051224210765212774
Time Elapsed for Epoch: 0.017984000000000222 seconds
Epoch 4/5 Done, Total Loss: 0.05127128306776285
Time Elapsed for Epoch: 0.017778000000000738 seconds
Epoch 5/5 Done, Total Loss: 0.05128198768943548
Time Elapsed for Epoch: 0.020160999999999873 seconds
Total Training Time: 0.10456700000000119 seconds


  return F.mse_loss(input, target, reduction=self.reduction)


In [19]:
lstm_model = train(train_loader, lr, model_type="LSTM")

Starting Training of LSTM model


KeyboardInterrupt: 

As we can see from the training time of both models, the GRU model is the clear winner in terms of speed, as we have mentioned earlier. The GRU finished 5 training epochs 72 seconds faster than the LSTM model.

Moving on to measuring the accuracy of both models, we’ll now use our evaluate() function and test dataset.

In [17]:
def evaluate( model, test_x, test_y ):
    """ Evaluate a trained model """
    print( "Evaluating...\n" )
    
    device = get_torch_device()
    
    # Set model for evaluation mode
    model.eval()
   
    # Init Bookkeeping
    outputs    = []
    targets    = []
    start_time = time.monotonic()
    
    # 1. For each example in the test set
    for i in range( len( test_x ) ):
        
        # 2. Fetch test sample
        inp  = torch.from_numpy( np.array( test_x[i] ) )
        labs = torch.from_numpy( np.array( test_y[i] ) )
        
        # 3. Fetch hidden state and predict
        h      = model.init_hidden( inp.shape[0] )
        out, h = model( inp.to(device).float(), h )
        
        # 4. Package the target and prediction in comparable formats
#         outputs.append( label_scalers[i].inverse_transform( out.cpu().detach().numpy()).reshape(-1) )
#         targets.append( label_scalers[i].inverse_transform( labs.numpy()).reshape(-1) )
        outputs.append( out.cpu().detach().numpy().reshape(-1) )
        targets.append( labs.numpy().reshape(-1) )
        
    # Report on evaluation results
    print( "Evaluation Time: {}".format( str( time.monotonic() - start_time ) ) )
    sMAPE = 0
    for i in range( len( outputs ) ):
        sMAPE += np.mean( abs( outputs[i]-targets[i] )/( targets[i]+outputs[i] )/2 ) / len( outputs )
    print( "sMAPE: {}%".format( sMAPE*100 ) )

In [19]:
gru_outputs, targets, gru_sMAPE = evaluate(gru_model, test_x, test_y)#, label_scalers)


Evaluating...



RuntimeError: input must have 3 dimensions, got 2

In [18]:
lstm_outputs, targets, lstm_sMAPE = evaluate(lstm_model, test_x, test_y, label_scalers)

RuntimeError: Input and parameter tensors are not at the same device, found input tensor at cpu and parameter tensor at cuda:0

While the LSTM model may have made smaller errors and edged the GRU model slightly in terms of performance accuracy, the difference is insignificant and thus inconclusive. There have been many other tests conducted by others comparing both these models but there has largely been no clear winner as to which is the better architecture overall.