<a href="https://colab.research.google.com/github/rahiakela/deep-learning-for-time-series-forecasting/blob/part-3-deep-learning-methods/2_time_series_forecasting_using_multi_layer_perceptron.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Time Series Forecasting using Multi-Layer Perceptron

Multilayer Perceptrons, or MLPs for short, can be applied to time series forecasting. A challenge with using MLPs for time series forecasting is in the preparation of the data. Specifically, lag observations must be  attened into feature vectors. 

In this notebook, we will discover how to develop a suite of Multilayer Perceptron models for a range of standard time series forecasting problems.

* Develop MLP models for univariate time series forecasting.
* Develop MLP models for multivariate time series forecasting.
* Develop MLP models for multi-step time series forecasting.

So the notebook is divided into four parts; they are:

1. Univariate MLP Models
2. Multivariate MLP Models
3. Multi-step MLP Models
4. Multivariate Multi-step MLP Models

Traditionally, a lot of research has been invested into using MLPs for time series forecasting with modest results. Perhaps the most promising area in the application of deep learning methods to time series forecasting are in the use of CNNs, LSTMs and hybrid models. As such, we will not see more examples of straight MLP models for time series forecasting.

## Setup

In [0]:
from __future__ import absolute_import, division, print_function, unicode_literals

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf
from tensorflow import keras

from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Input, concatenate

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

## Univariate MLP Models

Multilayer Perceptrons, or MLPs for short, can be used to model univariate time series forecasting problems. Univariate time series are a dataset comprised of a single series of observations with a temporal ordering and a model is required to learn from the series of past observations to predict the next value in the sequence. This section is divided into two parts; they are:

1. Data Preparation
2. MLP Model


### Data Preparation

Before a univariate series can be modeled, it must be prepared. The MLP model will learn a function that maps a sequence of past observations as input to an output observation. As such, the sequence of observations must be transformed into multiple examples from which the model can learn.

```python
[10, 20, 30, 40, 50, 60, 70, 80, 90]
```

We can divide the sequence into multiple input/output patterns called samples, where three time steps are used as input and one time step is used as output for the one-step prediction that is being learned.

```python
X,          y
10, 20, 30, 40
20, 30, 40, 50
30, 40, 50, 60
..............
```

The split sequence() function below implements this behavior and will split a given univariate sequence into multiple samples where each sample has a specified number of time steps and the output is a single time step.


In [0]:
# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
  X, y = list(), list()
  for i in range(len(sequence)):
    # find the end of this pattern
    end_ix = i + n_steps
    # check if we are beyond the sequence
    if end_ix > len(sequence) - 1:
      break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
    X.append(seq_x)
    y.append(seq_y)
  return np.array(X), np.array(y)

In [3]:
# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]

# choose a number of time steps
n_steps = 3

# split into samples
X, y = split_sequence(raw_seq, n_steps)

# summarize the data
for i in range(len(X)):
  print(X[i], y[i])

[10 20 30] 40
[20 30 40] 50
[30 40 50] 60
[40 50 60] 70
[50 60 70] 80
[60 70 80] 90


Now that we know how to prepare a univariate series for modeling, let's look at developing an MLP model that can learn the mapping of inputs to outputs.

### MLP Model

A simple MLP model has a single hidden layer of nodes, and an output layer used to make a prediction.

We almost always have multiple samples, therefore, the model will expect the input component of training data to have the dimensions or shape: $[samples, features]$.

The model expects the input shape to be two-dimensional with $[samples,
features]$, therefore, we must reshape the single input sample before making the prediction, e.g with the shape [1, 3] for 1 sample and 3 time steps used as input features.

We can make this concept concrete with a worked example.


In [4]:
model = Sequential()
model.add(Dense(100, activation='relu', input_dim=n_steps))
model.add(Dense(1))

# compile model'
model.compile(optimizer='adam', loss='mse')

# fit model
model.fit(X, y, epochs=200, verbose=0)

# demonstrate prediction
x_input = np.array([70, 80, 90])
x_input = x_input.reshape((1, n_steps))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[109.83776]]


## Multivariate MLP Models

Consider that you are in the current situation:

``
I have two columns in my data file with 5,000 rows, column 1 is time (with 1 hour interval) and column 2 is the number of sales and I am trying to forecast the number of sales for future time steps. Help me to set the number of samples, time steps and features in this data for an LSTM?
``

There are few problems here:

* **Data Shape**: LSTMs expect 3D input, and it can be challenging to get your head around this the first time.
* **Sequence Length**: LSTMs don't like sequences of more than 200-400 time steps, so the data will need to be split into subsamples.

We will work through this example, broken down into the following 4 steps:

1. Load the Data
2. Drop the Time Column
3. Split Into Samples
4. Reshape Subsequences



### Load the Data

In [5]:
# load time series dataset
# series = pd.read_csv('filename.csv', header=0, index_col=0)

# We will mock loading by defining a new dataset in memory with 5,000 time steps.
# define the dataset
data = list()
n = 5000
for i in range(n):
  data.append([i+1, (i+1) * 10])
data = np.array(data)
print(data[:5, :])
print(data.shape)

[[ 1 10]
 [ 2 20]
 [ 3 30]
 [ 4 40]
 [ 5 50]]
(5000, 2)


We can see we have 5,000 rows and 2 columns: a standard univariate time series dataset.

### Drop the Time Column

If your time series data is uniform over time and there is no missing values, we can drop the time column. If not, you may want to look at imputing the missing values, resampling the data to a new time scale, or developing a model that can handle missing values. 

Here, we just drop the first column:

In [6]:
# define the dataset
data = list()
n = 5000
for i in range(n):
  data.append([i+1, (i+1) * 10])
data = np.array(data)

# drop time
data = data[:, 1]
print(data.shape)

(5000,)


### Split Into Samples

LSTMs need to process samples where each sample is a single sequence of observations. In this case, 5,000 time steps is too long; LSTMs work better with 200-to-400 time steps. Therefore, we need to split the 5,000 time steps into multiple shorter sub-sequences.

For example, perhaps you need overlapping sequences, perhaps non-overlapping is good but your model needs state across the sub-sequences and so on. 

In this example, we will split the 5,000 time steps into 25 sub-sequences of 200 time steps each.

In [7]:
# define the dataset
data = list()
n = 5000
for i in range(n):
  data.append([i+1, (i+1) * 10])
data = np.array(data)

# drop time
data = data[:, 1]
print(data.shape)

# split into samples (e.g. 5000/200 = 25)
samples = list()
length = 200

# step over the 5,000 in jumps of 200
for i in range(0, n , length):
  sample = data[i: i + length]  # grab from i to i + 200
  samples.append(sample)
print(len(samples))

(5000,)
25


In [8]:
len(samples[:5][0])

200

In [9]:
samples[:2]

[array([  10,   20,   30,   40,   50,   60,   70,   80,   90,  100,  110,
         120,  130,  140,  150,  160,  170,  180,  190,  200,  210,  220,
         230,  240,  250,  260,  270,  280,  290,  300,  310,  320,  330,
         340,  350,  360,  370,  380,  390,  400,  410,  420,  430,  440,
         450,  460,  470,  480,  490,  500,  510,  520,  530,  540,  550,
         560,  570,  580,  590,  600,  610,  620,  630,  640,  650,  660,
         670,  680,  690,  700,  710,  720,  730,  740,  750,  760,  770,
         780,  790,  800,  810,  820,  830,  840,  850,  860,  870,  880,
         890,  900,  910,  920,  930,  940,  950,  960,  970,  980,  990,
        1000, 1010, 1020, 1030, 1040, 1050, 1060, 1070, 1080, 1090, 1100,
        1110, 1120, 1130, 1140, 1150, 1160, 1170, 1180, 1190, 1200, 1210,
        1220, 1230, 1240, 1250, 1260, 1270, 1280, 1290, 1300, 1310, 1320,
        1330, 1340, 1350, 1360, 1370, 1380, 1390, 1400, 1410, 1420, 1430,
        1440, 1450, 1460, 1470, 1480, 

We now have 25 subsequences of 200 time steps each.

### Reshape Subsequences

The LSTM needs data with the format of $[samples, timesteps, features]$. We have 25 samples, 200 time steps per sample, and 1 feature. 

First, we need to convert our list of arrays into a 2D NumPy array with the shape $[25, 200]$.

In [10]:
# define the dataset
data = list()
n = 5000
for i in range(n):
  data.append([i+1, (i+1) * 10])
data = np.array(data)

# drop time
data = data[:, 1]
print(data.shape)

# split into samples (e.g. 5000/200 = 25)
samples = list()
length = 200

# step over the 5,000 in jumps of 200
for i in range(0, n , length):
  sample = data[i: i + length]  # grab from i to i + 200
  samples.append(sample)
print(len(samples))

# convert list of arrays into 2d array
data = np.array(samples)
print(data.shape)

(5000,)
25
(25, 200)


Now we have 25 rows and 200 columns. Interpreted in a machine learning context, this dataset has 25 samples and 200 features per sample.

Next, we can use the reshape() function to add one additional dimension for our single feature and use the existing columns as time steps instead.

In [11]:
# reshape into [samples, timesteps, features]
data = data.reshape((len(samples), length, 1))
print(data.shape)

(25, 200, 1)


And that is it. The data can now be used as an input (X) to an LSTM model, or even a CNN model.

## Multivariate MLP Models

Multivariate time series data means data where there is more than one observation for each time step. There are two main models that we may require with multivariate time series data.

1. Multiple Input Series.
2. Multiple Parallel Series.


### Multiple Input Series

A problem may have two or more parallel input time series and an output time series that is dependent on the input time series. The input time series are parallel because each series has an observation at the same time step. 

We can demonstrate this with a simple example of two parallel input time series where the output series is the simple addition of the input series.

We can reshape these three arrays of data as a single dataset where each row is a time step and each column is a separate time series. This is a standard way of storing parallel time series in a CSV file.

In [12]:
# define input sequence
in_seq1 = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90])
in_seq2 = np.array([15, 25, 35, 45, 55, 65, 75, 85, 95])
out_seq = np.array([in_seq1[i] + in_seq2[i] for i in range(len(in_seq1))])
print(out_seq)

# convert to [rows, columns] structure
in_seq1 = in_seq1.reshape((len(in_seq1), 1))
in_seq2 = in_seq2.reshape((len(in_seq2), 1))
out_seq = out_seq.reshape((len(out_seq), 1))

# horizontally stack columns
dataset = np.hstack((in_seq1, in_seq2, out_seq))
print(dataset)

[ 25  45  65  85 105 125 145 165 185]
[[ 10  15  25]
 [ 20  25  45]
 [ 30  35  65]
 [ 40  45  85]
 [ 50  55 105]
 [ 60  65 125]
 [ 70  75 145]
 [ 80  85 165]
 [ 90  95 185]]


As with the univariate time series, we must structure these data into samples with input and output samples. We need to split the data into samples maintaining the order of observations across the two input sequences. 

If we chose three input time steps, then the first sample would
look as follows:

```python
Input:
10, 15
20, 25
30, 35

Output:
65
```

That is, the first three time steps of each parallel series are provided as input to the model and the model associates this with the value in the output series at the third time step, in this case 65.

We can see that, in transforming the time series into input/output samples to train the model, that we will have to discard some values from the output time series where we do not have values in the input time series at prior time steps. In turn, the choice of the size of the number of input time steps will have an important effect on how much of the training data is used.

We can define a function named split sequences() that will take a dataset as we
have defined it with rows for time steps and columns for parallel series and return input/output samples.

In [0]:
# split a multivariate sequence into samples
def split_sequences(sequences, n_steps):
  X, y = list(), list()
  for i in range(len(sequences)):
    # find the end of this pattern
    end_ix = i + n_steps
    # check if we are beyond the dataset
    if end_ix > len(sequences):
      break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequences[i: end_ix, :-1], sequences[end_ix - 1, -1]
    X.append(seq_x)
    y.append(seq_y)
  return np.array(X), np.array(y)

We can test this function on our dataset using three time steps for each input time series as input.

In [14]:
# define input sequence
in_seq1 = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90])
in_seq2 = np.array([15, 25, 35, 45, 55, 65, 75, 85, 95])
out_seq = np.array([in_seq1[i] + in_seq2[i] for i in range(len(in_seq1))])
print(out_seq)

# convert to [rows, columns] structure
in_seq1 = in_seq1.reshape((len(in_seq1), 1))
in_seq2 = in_seq2.reshape((len(in_seq2), 1))
out_seq = out_seq.reshape((len(out_seq), 1))

# horizontally stack columns
dataset = np.hstack((in_seq1, in_seq2, out_seq))
print(dataset)

# choose a number of time steps
n_steps = 3

# convert into input/output
X, y = split_sequences(dataset, n_steps)
print(X.shape, y.shape)

# summarize the data
for i in range(len(X)):
  print(X[i], y[i])

[ 25  45  65  85 105 125 145 165 185]
[[ 10  15  25]
 [ 20  25  45]
 [ 30  35  65]
 [ 40  45  85]
 [ 50  55 105]
 [ 60  65 125]
 [ 70  75 145]
 [ 80  85 165]
 [ 90  95 185]]
(7, 3, 2) (7,)
[[10 15]
 [20 25]
 [30 35]] 65
[[20 25]
 [30 35]
 [40 45]] 85
[[30 35]
 [40 45]
 [50 55]] 105
[[40 45]
 [50 55]
 [60 65]] 125
[[50 55]
 [60 65]
 [70 75]] 145
[[60 65]
 [70 75]
 [80 85]] 165
[[70 75]
 [80 85]
 [90 95]] 185


We can see that the X component has a three-dimensional structure. The rst dimension is the number of samples, in this case 7. The second dimension is the number of time steps per sample, in this case 3, the value specied to the function. 

Finally, the last dimension specifies the number of parallel time series or the number of variables, in this case 2 for the two parallel series. We can then see that the input and output for each sample is printed, showing the three time steps for each of the two input series and the associated output for each sample.

#### MLP Model

Before we can fit an MLP on this data, we must flatten the shape of the input samples. MLPs require that the shape of the input portion of each sample is a vector. With a multivariate input, we will have multiple vectors, one for each time step. 

We can flatten the temporal structure of each input sample, so that:
```python
[[10 15]
[20 25]
[30 35]]

Becomes:
[10, 15, 20, 25, 30, 35]
```

**Step-1**:
First, we can calculate the length of each input vector as the number of time steps multiplied by the number of features or time series. We can then use this vector size to reshape the input.

**Step-2**:We can now define an MLP model for the multivariate input where the vector length is used for the input dimension argument.

**Step-3**:When making a prediction, the model expects three time steps for two input time series.The shape of the 1 sample with 3 time steps and 2 variables would be $[1, 3, 2]$. We must again reshape this to be 1 sample with a vector of 6 elements or $[1, 6]$. We would expect the next value in the sequence to be 100 + 105 or 205.


In [15]:
# Step-1: flatten input
n_input = X.shape[1] * X.shape[2]
X = X.reshape((X.shape[0], n_input))

# Step-2: define model
model = Sequential()
model.add(Dense(100, activation='relu', input_dim=n_input))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

# fit the model
model.fit(X, y, epochs=200, verbose=0)

# Step-3: demonstrate prediction
x_input = np.array([[80, 85], [90, 95], [100, 105]])
x_input = x_input.reshape((1, n_input))
yhat = model.predict(x_input, verbose=0)
print(yhat)

[[212.04596]]


#### Multi-headed MLP Model

There is another more elaborate way to model the problem. Each input series can be handled by a separate MLP and the output of each of these submodels can be combined before a prediction is made for the output sequence. We can refer to this as a multi-headed input MLP model. It may offer more  exibility or better performance depending on the specifics of the problem that are being modeled. This type of model can be defined in Keras using the Keras functional API.

We can do this using the followings steps:

* First, we can define the two input model as an MLP with an input layer that expects vectors with n steps features.
* Now, we can merge the output from each model into one long vector, which can be interpreted before making a prediction for the output sequence.
* We can then tie the inputs and outputs together.

The image below provides a schematic for how this model looks, including the shape of the inputs and outputs of each layer.

<img src='https://github.com/rahiakela/img-repo/blob/master/multi-headed-mlp.png?raw=1' width='800'/>

This model requires input to be provided as a list of two elements, where each element in the list contains data for one of the submodels. In order to achieve this, we can split the 3D input data into two separate arrays of input data: that is from one array with the shape $[7, 3,2]$ to two 2D arrays with the shape $[7, 3]$.

These data can then be provided in order to fit the model.

Similarly, we must prepare the data for a single sample as two separate two-dimensional arrays when making a single one-step prediction.


In [19]:
# define input sequence
in_seq1 = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90])
in_seq2 = np.array([15, 25, 35, 45, 55, 65, 75, 85, 95])
out_seq = np.array([in_seq1[i]+in_seq2[i] for i in range(len(in_seq1))])

# convert to [rows, columns] structure
in_seq1 = in_seq1.reshape((len(in_seq1), 1))
in_seq2 = in_seq2.reshape((len(in_seq2), 1))
out_seq = out_seq.reshape((len(out_seq), 1))

# horizontally stack columns
dataset = np.hstack((in_seq1, in_seq2, out_seq))

# choose a number of time steps
n_steps = 3

# convert into input/output
X, y = split_sequences(dataset, n_steps)

# separate input data
X1 = X[:, :, 0]
X2 = X[:, :, 1]

# first input model
visible1 = Input(shape=(n_steps,))
dense1 = Dense(100, activation='relu')(visible1)

# second input model
visible2 = Input(shape=(n_steps,))
dense2 = Dense(100, activation='relu')(visible2)

# merge input models
merge = concatenate([dense1, dense2])
output = Dense(1)(merge)
model = Model(inputs=[visible1, visible2], outputs=output)

model.compile(optimizer='adam', loss='mse')

model.fit([X1, X2], y, epochs=2000, verbose=0)

# demonstrate prediction
x_input = np.array([[80, 85], [90, 95], [100, 105]])
x1 = x_input[:, 0].reshape((1, n_steps))
x2 = x_input[:, 1].reshape((1, n_steps))

yhat = model.predict([x1, x2], verbose=0)
print(yhat)

[[206.63481]]


### Multiple Parallel Series