# LSTM time series forecasting

This notebook contains a brief demonstration of **recurrent neural networks** (RNNs) for time series forecasting. In particular, we use a **long short-term memory** (LSTM) model in order to predict the future values of a synthetically generated time series. A simple of two-variate sine/cosine time series example is constructed to that end. After training the LSTM neural network, the performance is tested and compared against a naive baseline approach.

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import sys
sys.path.append('..')

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from sklearn.model_selection import train_test_split

from tsutils import (
    make_sine_cosine,
    SlidingWindows,
    LSTM,
    train
)

In [None]:
np.random.seed(12345)
_ = torch.manual_seed(54321)

## Data generation

Let us start by generating a sequence of data. A sine and a cosine function are sampled at regular locations and corrupted with random noise. Those two curves constitute a bi-variate time series.

In [None]:
num_steps = 2000
max_length = 100.
noise_level = 0.1

data = make_sine_cosine(
    num_steps=num_steps,
    max_length=max_length,
    noise_level=noise_level,
    val_size=None
)

print('Data shape:', data.shape)

In [None]:
fig, ax = plt.subplots(figsize=(6, 4))
ax.plot(np.arange(len(data)) + 1, data[:,0], alpha=0.7, label='sine data')
ax.plot(np.arange(len(data)) + 1, data[:,1], alpha=0.7, label='cosine data')
ax.set(xlabel='x', ylabel='y')
ax.set_xlim((0, len(data)))
ax.legend(loc='lower left')
ax.grid(visible=True, which='both', color='lightgray', linestyle='-')
ax.set_axisbelow(True)
fig.tight_layout()

The data is split into a training and a validation set. It is noted that due to the sequential character, this amounts to a simple division of the time series into two parts, without shuffling of any kind. This is easily done by setting the flag `shuffle=False` in the code below.

In [None]:
val_size = 0.2

train_data, val_data = train_test_split(
    data,
    test_size=val_size,
    shuffle=False
)

print('Train data shape:', train_data.shape)
print('Val. data shape:', val_data.shape)

## Data set/loader

For training purposes, the data is best accessed through a PyTorch dataset. The `SlidingWindows` class implements a sliding windows mechanism over the full sequence. It yields overlapping intervals of data. As a first step, we initialize sliding windows datasets for the training and validation sequences. Intervals of twenty consecutive time steps are considered.

In [None]:
window_size = 20

train_set = SlidingWindows(
    train_data,
    window_size,
    mode='next',
    next_steps=1

)

val_set = SlidingWindows(
    val_data,
    window_size,
    mode='next',
    next_steps=1
)

print('No. train points:', len(train_set))
print('No. val. points:', len(val_set))

The corresponding data loaders are created in the following cell. Sixteen randomly sampled time intervals constitute a mini-batch.

In [None]:
batch_size = 16

train_loader = DataLoader(
    train_set,
    batch_size=batch_size,
    shuffle=True,
    drop_last=True
)

val_loader = DataLoader(
    val_set,
    batch_size=batch_size,
    shuffle=False,
    drop_last=True
)

print('No. train batches:', len(train_loader))
print('No. val. batches:', len(val_loader))

A batch of data is loaded in order to ensure that everything works fine so far.

In [None]:
X_batch, y_batch = next(iter(train_loader))
print('Input shape:', X_batch.shape)
print('Target shape:', y_batch.shape)

## Model training

A simple LSTM architecture is defined and initialized next. Moreover, an MSE loss function and an optimizer are set up.

In [None]:
lstm = LSTM(
    input_size=2,
    hidden_size=5,
    num_layers=1
)

print('No. weights:', lstm.num_trainable)

In [None]:
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(lstm.parameters(), lr=0.001)

Subsequently, the model is trained for a small number of epochs. It can be seen that the training converges fairly quickly.

In [None]:
train(
    lstm,
    criterion=criterion,
    optimizer=optimizer,
    num_epochs=100,
    train_loader=train_loader,
    val_loader=val_loader
)

The final model outperforms the naive forecast in terms of the validation MSE.

In [None]:
naive_mse = np.mean(np.diff(val_data, axis=0)**2)
print('Naive forecast MSE: {:.4e}'.format(naive_mse))

## Test predictions

Finally, we visualize recursively computed model forecasts starting from the first validation set window.

In [None]:
seq = torch.as_tensor(val_data[:window_size]) # (time, features)
seq = seq.unsqueeze(0) # (batch=1, time, features)

lstm.eval()
with torch.no_grad():
    preds = lstm.forecast(
        seq,
        steps=len(val_data) - window_size
    )

print('Pred. shape:', preds.shape)

In [None]:
fig, ax = plt.subplots(figsize=(6, 4))

ax.plot(
    np.arange(window_size) + 1,
    val_data[:window_size,0],
    color=plt.cm.Dark2(0), alpha=0.7,
    label='test sine'
)

ax.plot(
    np.arange(window_size) + 1,
    val_data[:window_size,1],
    color=plt.cm.Dark2(1), alpha=0.7,
    label='test cosine'
)

ax.plot(
    np.arange(window_size, len(val_data)) + 1,
    preds[0,:,0],
    color=plt.cm.Dark2(0), alpha=0.7, linestyle='--',
    label='sine forecast'
)

ax.plot(
    np.arange(window_size, len(val_data)) + 1,
    preds[0,:,1],
    color=plt.cm.Dark2(1), alpha=0.7, linestyle='--',
    label='cosine forecast'
)

ax.set(xlabel='x', ylabel='y')
ax.set_xlim((0, len(val_data)))
ax.legend(loc='lower left')

ax.grid(visible=True, which='both', color='lightgray', linestyle='-')
ax.set_axisbelow(True)
fig.tight_layout()