# 3. Sequences & Recurrent Neural Networks

Build and train recurrent neural networks (RNNs) for processing sequential data such as time series, text, or audio. You will learn about the two most popular recurrent architectures, Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks, as well as how to prepare sequential data for model training. You will practice your skills by training and evaluating a recurrent model for predicting electricity consumption.

In [53]:
import pandas as pd
import numpy as np
import torch
from torch.utils.data import TensorDataset, DataLoader
import torch.nn as nn
import torch.optim as optim
import torchmetrics

### Generating sequences

To be able to train neural networks on sequential data, you need to pre-process it first. You'll chunk the data into inputs-target pairs, where the inputs are some number of consecutive data points and the target is the next data point.

Your task is to define a function to do this called `create_sequences()`. As inputs, it will receive data stored in a DataFrame, `df` and `seq_length`, the length of the inputs. As outputs, it should return two NumPy arrays, one with input sequences and the other one with the corresponding targets.

In [54]:
train_data = pd.read_csv("datasets/electricity_consump/electricity_train.csv")
train_data.head()

Unnamed: 0,timestamp,consumption
0,2011-01-01 00:15:00,-0.704319
1,2011-01-01 00:30:00,-0.704319
2,2011-01-01 00:45:00,-0.678983
3,2011-01-01 01:00:00,-0.653647
4,2011-01-01 01:15:00,-0.704319


Instructions:

- Iterate over the range of the number of data points minus the length of an input sequence.
- Define the inputs `x` as the slice of `df` from the ith row to the `i + seq_lengthth` row and the column at index `1`.
- Define the target `y` as the slice of `df` at row index `i + seq_length` and the column at index `1`.

In [55]:
def create_sequences(df: pd.DataFrame, seq_length: int):
    xs, ys = [], []
    # Iterate over data indices
    for i in range(len(df) - seq_length):
        # Define inputs
        x = df.iloc[i : (i + seq_length), 1]
        # Define target
        y = df.iloc[i + seq_length, 1]
        xs.append(x)
        ys.append(y)
    return np.array(xs), np.array(ys)

### Sequential Dataset

Just like tabular and image data, sequential data is easiest passed to a model through a torch Dataset and DataLoader. To build a sequential Dataset, you will call `create_sequences()` to get the NumPy arrays with inputs and targets, and inspect their shape. Next, you will pass them to a `TensorDataset` to create a proper torch Dataset, and inspect its length.

Instructions:

- Call `create_sequences()`, passing it the training DataFrame and a sequence length of 24*4, assigning the result to X_train, y_train.
- Define `dataset_train` by calling TensorDataset and passing it two arguments, the inputs and the targets created by `create_sequences()`, both converted from NumPy arrays to tensors of floats.

In [56]:
# Use create_sequences to create inputs and targets
X_train, y_train = create_sequences(train_data, 24 * 4)
print(X_train.shape, y_train.shape)

# Create TensorDataset
dataset_train = TensorDataset(
    torch.from_numpy(X_train).float(),
    torch.from_numpy(y_train).float(),
)
print(len(dataset_train))

(105119, 96) (105119,)
105119


## Recurrent Neural Networks

### Building a forecasting RNN

It's time to build your first recurrent network! It will be a sequence-to-vector model consisting of an RNN layer with two layers and a `hidden_size` of `32`. After the RNN layer, a simple linear layer will map the outputs to a single value to be predicted.

Instructions:

- Define the RNN layer passing it the correct values for `input_size`, `hidden_size`, `num_layers`, and `batch_first`, and assign it to `self.rnn`
- Initialize the first hidden state `h0` as a tensor of zeros of the appropriate shape.
- Pass the input `x` and the first hidden state `h0` through recurrent layer.
- Pass recurrent layer's last output through the linear layer

In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        # Define RNN layer
        self.rnn = nn.RNN(
            input_size=1,
            hidden_size=32,
            num_layers=2,
            batch_first=True,
        )
        self.fc = nn.Linear(32, 1)

    def forward(self, x):
        # Initialize first hidden state with zeros
        h0 = torch.zeros(2, x.size(0), 32)
        # Pass x and h0 through recurrent layer
        out, _ = self.rnn(x, h0)
        # Pass recurrent layer's last output through linear layer
        out = self.fc(out[:, -1, :])
        return out

## LSTM and GRU cells

### LSTM network

As you already know, plain RNN cells are not used that much in practice. A more frequently used alternative that ensures a much better handling of long sequences are Long Short-Term Memory cells, or LSTMs. In this exercise, you will be build an LSTM network yourself!

The most important implementation difference from the RNN network you have built previously comes from the fact that LSTMs have two rather than one hidden states. This means you will need to initialize this additional hidden state and pass it to the LSTM cell.

Instructions:

- In the `.__init__()` method, define an LSTM layer and assign it to `self.lstm`.
- In the `forward()` method, initialize the first long-term memory hidden state `c0` with zeros.
- In the `forward()` method, pass all three inputs to the `LSTM` layer: the current time step's inputs, and a tuple containing the two hidden states.

In [None]:
class Net(nn.Module):
    def __init__(self, input_size):
        super().__init__()
        # Define lstm layer
        self.lstm = nn.LSTM(
            input_size=1,
            hidden_size=32,
            num_layers=2,
            batch_first=True,
        )
        self.fc = nn.Linear(32, 1)

    def forward(self, x):
        h0 = torch.zeros(2, x.size(0), 32)
        # Initialize long-term memory
        c0 = torch.zeros(2, x.size(0), 32)
        # Pass all inputs to lstm layer
        out, _ = self.lstm(x, (h0, c0))
        out = self.fc(out[:, -1, :])
        return out

### GRU network

Next to LSTMs, another popular recurrent neural network variant is the Gated Recurrent Unit, or GRU. It's appeal is in its simplicity: GRU cells require less computation than LSTM cells while often matching them in performance.

The code you are provided with is the RNN model definition that you coded previously. Your task is to adapt it such that it produces a GRU network instead. 

In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        # Define RNN layer
        self.rnn = nn.RNN(
            input_size=1,
            hidden_size=32,
            num_layers=2,
            batch_first=True,
        )
        self.fc = nn.Linear(32, 1)

    def forward(self, x):
        h0 = torch.zeros(2, x.size(0), 32)
        out, _ = self.rnn(x, h0)  
        out = self.fc(out[:, -1, :])
        return out

Instructions:

- Update the RNN model definition in order to obtain a GRU network.

In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        # Define RNN layer
        self.gru = nn.GRU(
            input_size=1,
            hidden_size=32,
            num_layers=2,
            batch_first=True,
        )
        self.fc = nn.Linear(32, 1)

    def forward(self, x):
        h0 = torch.zeros(2, x.size(0), 32)
        out, _ = self.gru(x, h0)
        out = self.fc(out[:, -1, :])
        return out

## Training and evaluating RNNs

### RNN training loop

It's time to train the electricity consumption forecasting model!

You will use the LSTM network you have defined previously which is available to you as `Net`, as is the `dataloader_train` you built before.

In this exercise, you will train the model for only three epochs to make sure the training progresses as expected. Let's get to it!

In [58]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        # Define lstm layer
        self.lstm = nn.LSTM(
            input_size=1,
            hidden_size=32,
            num_layers=2,
            batch_first=True,
        )
        self.fc = nn.Linear(32, 1)

    def forward(self, x):
        h0 = torch.zeros(2, x.size(0), 32)
        # Initialize long-term memory
        c0 = torch.zeros(2, x.size(0), 32)
        # Pass all inputs to lstm layer
        out, _ = self.lstm(x, (h0, c0))
        out = self.fc(out[:, -1, :])
        return out

In [59]:
dataloader_train = DataLoader(dataset_train, batch_size=16)

Instructions:

- Set up the Mean Squared Error loss and assign it to `criterion`.
- Reshape `seqs` to `(batch size, sequence length, num features)`, which in our case is `(16, 96, 1)`, and re-assign the result to `seqs`.
- Pass `seqs` to the model to get its `outputs`.
- Based on previously computed quantities, calculate the loss, assigning it to `loss`.

In [None]:
net = Net()
# Set up MSE loss
criterion = nn.MSELoss()
optimizer = optim.Adam(net.parameters(), lr=0.0001)

for epoch in range(3):
    for seqs, labels in dataloader_train:
        # Reshape model inputs
        seqs = seqs.view(16, 96, 1)
        # Get model outputs
        outputs = net(seqs)
        # Compute loss
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1}, Loss: {loss.item()}")

### Evaluating forecasting models

It's evaluation time! The same LSTM network that you have trained in the previous exercise has been trained for you for a few more epochs and is available as `net`.

In [66]:
test_data = pd.read_csv("datasets/electricity_consump/electricity_test.csv")
test_data.head()

Unnamed: 0,timestamp,consumption
0,2014-01-01 00:00:00,-0.932595
1,2014-01-01 00:15:00,-0.957931
2,2014-01-01 00:30:00,-0.932595
3,2014-01-01 00:45:00,-0.907259
4,2014-01-01 01:00:00,-0.881923


In [67]:
# Use create_sequences to create inputs and targets
X_test, y_test = create_sequences(test_data, 24 * 4)
print(X_test.shape, y_test.shape)

# Create TensorDataset
dataset_test = TensorDataset(
    torch.from_numpy(X_test).float(),
    torch.from_numpy(y_test).float(),
)
print(len(dataset_test))

(34944, 96) (34944,)
34944


In [73]:
dataloader_test = DataLoader(dataset=dataset_test, batch_size=32)

Instructions:

- Define the Mean Squared Error metrics and assign it to `mse`.
- Pass the input sequence to `net`, and squeeze the result before you assign it to `outputs`.
- Compute the final value of the test metric assigning it to `test_mse`.

In [74]:
# Define MSE metric
mse = torchmetrics.MeanSquaredError()

net.eval()
with torch.no_grad():
    for seqs, labels in dataloader_test:
        seqs = seqs.view(32, 96, 1)
        # Pass seqs to net and squeeze the result
        outputs = net(seqs).squeeze()
        mse(outputs, labels)

# Compute final metric value
test_mse = mse.compute()
print(f"Test MSE: {test_mse}")

Test MSE: 0.14219625294208527
