In this notebook, we'll be using a GRU model for a time series prediction task and we will compare the performance of the GRU model against an LSTM model as well. The dataset that we will be using is the Hourly Energy Consumption dataset which can be found on [Kaggle](https://www.kaggle.com/robikscube/hourly-energy-consumption). The dataset contains power consumption data across different regions around the United States recorded on an hourly basis.

You can run the code implementation in this article on FloydHub using their GPUs on the cloud by clicking the following link and using the main.ipynb notebook.

[![Run on FloydHub](https://static.floydhub.com/button/button-small.svg)](https://floydhub.com/run?template=https://github.com/gabrielloye/https://github.com/gabrielloye/GRU_Prediction)

This will speed up the training process significantly. Alternatively, the link to the GitHub repository can be found [here]().

The goal of this implementation is to create a model that can accurately predict the energy usage in the next hour given historical usage data. We will be using both the GRU and LSTM model to train on a set of historical data and evaluate both models on an unseen test set. To do so, we’ll start with feature selection, data-preprocessing, followed by defining, training and eventually evaluating the models.

We will be using the PyTorch library to implement both types of models along with other common Python libraries used in data analytics.

In [1]:
#https://www.python-engineer.com/posts/pytorch-rnn-lstm-gru/

#https://blog.floydhub.com/gru-with-pytorch/

import os
import time
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import random
from datetime import datetime

import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
import torch.nn.functional as F
import torch.optim as optim

from tqdm import tqdm_notebook
from sklearn.preprocessing import MinMaxScaler,QuantileTransformer
from sklearn.model_selection import train_test_split






### Local ###
#from data_processing import *



# Define data root directory

#data_dir = "./data/"
#print(os.listdir(data_dir))

In /home/scott/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The text.latex.preview rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In /home/scott/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The mathtext.fallback_to_cm rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In /home/scott/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: Support for setting the 'mathtext.fallback_to_cm' rcParam is deprecated since 3.3 and will be removed two minor releases later; use 'mathtext.fallback : 'cm' instead.
In /home/scott/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The validate_bool_maybe_none function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In /home/scott/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylel

We have a total of **12** *.csv* files containing hourly energy trend data (*'est_hourly.paruqet'* and *'pjm_hourly_est.csv'* are not used). In our next step, we will be reading these files and pre-processing these data in this order:
- Getting the time data of each individual time step and generalizing them
    - Hour of the day *i.e. 0-23*
    - Day of the week *i.e. 1-7*
    - Month *i.e. 1-12*
    - Day of the year *i.e. 1-365*
    
    
- Scale the data to values between 0 and 1
    - Algorithms tend to perform better or converge faster when features are on a relatively similar scale and/or close to normally distributed
    - Scaling preserves the shape of the original distribution and doesn't reduce the importance of outliers.
    
    
- Group the data into sequences to be used as inputs to the model and store their corresponding labels
    - The **sequence length** or **lookback period** is the number of data points in history that the model will use to make the prediction
    - The label will be the next data point in time after the last one in the input sequence
    

- The inputs and labels will then be split into training and test sets

In [2]:
choppeddata=pd.read_csv('choppeddata_9_21_2021.csv')#.head()

choppedheaders=[]
lookback=11 #save only the last 11 timesteps
for i in range(lookback):  
    label=str(i)
    choppedheaders.append("header"+label)

#put chopped data in np.arrays
State=np.zeros((96,5,11)) #96 runs,with 5 sets of data (x,y,z,roll,pitch) each, and each run is 11 timesteps long
Labels=np.zeros((96,11)) #96 runs, each run is 11 timesteps long
runcounter=0

for i in range(0,575,6):
            State[runcounter][0][:]=(choppeddata[choppedheaders[:]].iloc[i]).tolist()
            State[runcounter][1][:]=(choppeddata[choppedheaders[:]].iloc[i+1]).tolist()
            State[runcounter][2][:]=(choppeddata[choppedheaders[:]].iloc[i+2]).tolist()
            State[runcounter][3][:]=(choppeddata[choppedheaders[:]].iloc[i+3]).tolist()
            State[runcounter][4][:]=(choppeddata[choppedheaders[:]].iloc[i+4]).tolist()
            Labels[runcounter][:]=(choppeddata[choppedheaders[:]].iloc[i+5]).tolist()  #labels   
            runcounter+=1
print(State[0])
print(Labels[0])

[[0.27735378 0.26661055 0.40000611 0.25023186 0.27677533 0.41424219
  0.37389517 0.27581724 0.3110282  0.31710912 0.46400543]
 [0.56315052 0.57954961 0.5818278  0.58090968 0.53162291 0.54802549
  0.55059242 0.54762166 0.52668456 0.51825401 0.53976031]
 [0.85986378 0.78037356 0.77334782 0.77890898 0.78463198 0.77979283
  0.93127611 0.93926496 0.93533267 0.94344913 0.93128845]
 [0.46247134 0.44671463 0.45213433 0.44932497 0.48087943 0.48424637
  0.48025341 0.47566825 0.49441604 0.49760116 0.49163918]
 [0.29442925 0.28496873 0.37848349 0.28663437 0.29311135 0.39597582
  0.36943065 0.29519415 0.31469048 0.32070428 0.42058123]]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [3]:
#X= range(0,575,6)
#y= range(0,575,6)

X=State
y=Labels
 
random_seed=int(time.time())
#print(int(time.time()))
train_x, test_x, train_y,test_y = train_test_split(X, y, test_size=0.33, random_state=random_seed)
print("Train")
print(train_x[0])
print(train_y[0])
print("Test")
print(test_x[0])
print(test_y[0])


Train
[[0.40539547 0.31443459 0.341721   0.4050648  0.40182568 0.40766046
  0.399578   0.40480583 0.39891334 0.40167018 0.39993624]
 [0.75253834 0.74760889 0.69387439 0.67836455 0.68019756 0.68148563
  0.69561342 0.6366178  0.65612821 0.6504404  0.66142242]
 [0.80009078 0.81216116 0.82171327 0.95050878 0.943081   0.94995447
  0.94334595 0.94540174 0.94667555 0.94469142 0.87575717]
 [0.28119582 0.27539747 0.31247541 0.33037702 0.328287   0.32922454
  0.31871721 0.3620868  0.35313335 0.35808517 0.34557225]
 [0.44886968 0.38029385 0.39406757 0.4394908  0.43427474 0.4361103
  0.43719741 0.43797787 0.43632384 0.4372016  0.43181407]]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
Test
[[0.54781388 0.52467338 0.48279334 0.48698063 0.48984306 0.4812084
  0.43468989 0.45365582 0.34357181 0.33510071 0.36711351]
 [0.61362526 0.60508136 0.6198988  0.62654126 0.62657678 0.62042025
  0.62153308 0.62722235 0.63083059 0.61136368 0.6168552 ]
 [0.94571654 0.94512104 0.9449886  0.9473057  0.94689729 0.94814949
  0.9

In [4]:
print(train_x.shape)  #example was (980185, 90, 5)

(64, 5, 11)


We have a total of 980,185 sequences of training data

To improve the speed of our training, we can process the data in batches so that the model does not need to update its weights as frequently. The Torch *Dataset* and *DataLoader* classes are useful for splitting our data into batches and shuffling them.

In [5]:
batch_size = 8
#a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.

train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size, drop_last=True)

test_data   = TensorDataset( torch.from_numpy( test_x ), torch.from_numpy( test_y ) )
test_loader = DataLoader( test_data, shuffle = True, batch_size = batch_size, drop_last = True )


In [6]:
print(train_loader)

<torch.utils.data.dataloader.DataLoader object at 0x7f3e30d76470>


We can also check if we have any GPUs to speed up our training time by many folds. If you’re using FloydHub with GPU to run this code, the training time will be significantly reduced.

In [7]:
# torch.cuda.is_available() checks and returns a Boolean True if a GPU is available, else it'll return False
is_cuda = torch.cuda.is_available()

# If we have a GPU available, we'll set our device to GPU. We'll use this device variable later in our code.
if is_cuda:
    device = torch.device("cuda")
else:
    device = torch.device("cpu")
    
print(device)


def get_torch_device( v=0 ):
    # torch.cuda.is_available() checks and returns a Boolean True if a GPU is available, else it'll return False
    is_cuda = torch.cuda.is_available()
    # If we have a GPU available, we'll set our device to GPU. We'll use this device variable later in our code.
    if is_cuda:
        device = torch.device("cuda")
        if v:  print( "CUDA Available!" )
    else:
        device = torch.device("cpu")
        if v:  print( "NO CUDA" )
    return device

cpu


In [9]:
lr = 0.001
gru_model = train(train_loader, lr,batch_size=8)

Training...

GRU with Input: 11 , Hidden: 256 , Output: 2 ...

Starting Training of GRU model ...


AttributeError: 'NoneType' object has no attribute 'data'

In [77]:
class GRUNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, n_layers, drop_prob=0.2):
        super(GRUNet, self).__init__()
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        
        self.gru = nn.GRU(input_dim, hidden_dim, n_layers, batch_first=True, dropout=drop_prob)
        self.fc = nn.Linear(hidden_dim, output_dim)
        self.relu = nn.ReLU()
        
    def forward(self, x, h):
        out, h = self.gru(x, h)
        out = self.fc(self.relu(out[:,-1]))
        return out, h
    
    def init_hidden(self, batch_size):
        weight = next(self.parameters()).data
        hidden = weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(device)
        return hidden
def train(train_loader, learn_rate, hidden_dim=64, EPOCHS=5, model_type="GRU"):
    
    # Setting common hyperparameters
    input_dim = next(iter(train_loader))[0].shape[2]  #  = 11
    #print("input_dim",input_dim)
    output_dim = 11
    n_layers = 2
    # Instantiating the models
    if model_type == "GRU":
        model = GRUNet(input_dim, hidden_dim, output_dim, n_layers)
    else:
        model = LSTMNet(input_dim, hidden_dim, output_dim, n_layers)
    model.to(device)
    
    # Defining loss function and optimizer
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learn_rate)
    
    model.train()
    print("Starting Training of {} model".format(model_type))
    epoch_times = []
    # Start training loop
    for epoch in range(1,EPOCHS+1):
        start_time = time.clock()
        h = model.init_hidden(batch_size)
        avg_loss = 0.
        counter = 0
        for x, label in train_loader:
            counter += 1
            if model_type == "GRU":
                h = h.data
            else:
                h = tuple([e.data for e in h])
            model.zero_grad()
            
            out, h = model(x.to(device).float(), h)
            loss = criterion(out, label.to(device).float())
            loss.backward()
            optimizer.step()
            avg_loss += loss.item()
            if counter%200 == 0:
                print("Epoch {}......Step: {}/{}....... Average Loss for Epoch: {}".format(epoch, counter, len(train_loader), avg_loss/counter))
        current_time = time.clock()
        print("Epoch {}/{} Done, Total Loss: {}".format(epoch, EPOCHS, avg_loss/len(train_loader)))
        print("Total Time Elapsed: {} seconds".format(str(current_time-start_time)))
        epoch_times.append(current_time-start_time)
    print("Total Training Time: {} seconds".format(str(sum(epoch_times))))
    return model

def evaluate(model, test_x, test_y):
    model.eval()
    outputs = []
    targets = []
    start_time = time.clock()
    #for i in test_x.keys():
    for i in range( len( test_x ) ):    
        inp = torch.from_numpy(np.array(test_x)[i]) # should be 5x1
        labs = torch.from_numpy(np.array(test_y[i])) #should be 1x1
        h = model.init_hidden(inp.shape[0])
        #print("inp",inp)
        #print("labs",labs)
        #print("h",h)
        out, h = model(inp.to(device).float(), h)
        #outputs.append(label_scalers[i].inverse_transform(out.cpu().detach().numpy()).reshape(-1))
        #targets.append(label_scalers[i].inverse_transform(labs.numpy()).reshape(-1))
        outputs.append( out.cpu().detach().numpy().reshape(-1) )
        targets.append( labs.numpy().reshape(-1) )
        
    print("Evaluation Time: {}".format(str(time.clock()-start_time)))
    sMAPE = 0
    for i in range(len(outputs)):
        sMAPE += np.mean(abs(outputs[i]-targets[i])/(targets[i]+outputs[i])/2)/len(outputs)
    print("sMAPE: {}%".format(sMAPE*100))
    return outputs, targets, sMAPE

In [78]:
lr = 0.001
gru_model = train(train_loader, lr, model_type="GRU")

Starting Training of GRU model
Epoch 1/5 Done, Total Loss: 0.03722783434204757
Total Time Elapsed: 0.3100810000000003 seconds
Epoch 2/5 Done, Total Loss: 0.024139864835888147
Total Time Elapsed: 0.25616999999999734 seconds
Epoch 3/5 Done, Total Loss: 0.020770379924215376
Total Time Elapsed: 0.2584789999999977 seconds
Epoch 4/5 Done, Total Loss: 0.022062607808038592
Total Time Elapsed: 0.25701799999999864 seconds
Epoch 5/5 Done, Total Loss: 0.020740614039823413
Total Time Elapsed: 0.2526989999999998 seconds
Total Training Time: 1.3344469999999937 seconds


In [72]:
model=gru_model
i=1
inp = torch.from_numpy(np.array(test_x))
labs = torch.from_numpy(np.array(test_y))
#h = model.init_hidden(inp.shape[0])
h = model.init_hidden(inp.shape[0])
#print("inp",inp)
#print("INP SHAPE",inp.shape)
#print("INP SHAPE[0]",inp.shape[0])
#print("labs",labs)
#print("h",h)
#print("h.shape",h.shape)
#print(inp.to(device).float())
#print(inp.to(device).float().shape)


out, h = model(inp.to(device).float(), h)
print(out)

tensor([[-1.4086e-02, -6.0219e-03,  4.1642e-02, -1.6095e-02,  5.6613e-03,
         -2.0674e-03, -3.2131e-03, -1.8629e-02, -6.0395e-03,  1.0921e-02,
          5.4747e-01],
        [-1.3201e-02, -4.8687e-03,  3.7036e-02, -1.6419e-02,  3.7723e-03,
         -6.4013e-03, -2.5587e-03, -1.8187e-02, -4.7280e-03,  1.4584e-02,
          5.6907e-01],
        [-1.4211e-02, -6.0222e-03,  4.1011e-02, -1.5537e-02,  4.2788e-03,
         -3.9017e-03, -3.4968e-03, -1.7982e-02, -7.1083e-03,  1.2374e-02,
          5.4199e-01],
        [-1.3055e-02, -4.4857e-03,  3.1844e-02, -2.1391e-02,  2.7604e-03,
         -7.7797e-03, -3.1675e-03, -2.1791e-02, -3.5933e-03,  1.8636e-02,
          5.9519e-01],
        [-1.3453e-02, -4.6706e-03,  3.6728e-02, -1.6767e-02,  3.6640e-03,
         -5.9912e-03, -2.5351e-03, -1.8593e-02, -5.0201e-03,  1.4725e-02,
          5.6738e-01],
        [-1.3607e-02, -6.1279e-03,  4.2681e-02, -1.2811e-02,  4.5003e-03,
         -4.6837e-03, -2.7325e-03, -1.6155e-02, -7.0204e-03,  1.1456e-0

In [76]:
gru_outputs, targets, gru_sMAPE = evaluate(gru_model, test_x, test_y)

Evaluation Time: 0.03812200000000132
sMAPE: 2.996084209654857%
