# Deep Learning for Time Series Forecasting

**Deep learning** is a fashionable term referring to machine learning with artificial neural networks, particularly networks with many layers, so called **deep networks**. Advances in algorithm design, hardware (GPUs...), software tools (TensorFlow etc.) and the availability of training data on the internet have led to a renaissance for neural networks and the ability to train complex models to perform tasks that can rightly be called intelligent.

The following notebooks assume that you are familiar with the concepts of deep learning, including:
- artificial neurons
- artificial neural network architectures and layers
- activation functions
- training in batches and epochs
- ...

These concepts are the focus of our course **[📓 Deep Learning with TensorFlow](index/dlt2-intro-dl-tensorflow-2day.ipynb)**

As we have seen in the chapter [**📓 Forecasting with "Shallow" Learning**](../timeseries/mlts-forecasting-shallow.ipynb), _basically any regressor can be applied to recursive time series forecasting by transforming the time series data into a supervised learning problem._ That of course includes any neural network capable of regression, including **feed-forward neural networks**.

In this chapter however, we are going to foucs on types of networks especially suited for learning from sequences of data: **recurrent neural networks**. 


## Preamble

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import pandas
import numpy
import sklearn

In [None]:
import data_science_learning_paths
import forecast_lab

In [None]:
data_science_learning_paths.setup_plot_style()

## Recurrent Neural Networks for Time Series Forecasting

As we have seen in the chapter [**📓 Forecasting with "Shallow" Learning**](../timeseries/mlts-forecasting-shallow.ipynb), _basically any regressor can be applied to recursive time series forecasting by transforming the time series data into a supervised learning problem._ That of course includes any neural network capable of regression, including **feed-forward neural networks**.

In this chapter however, we are going to foucs on types of networks especially suited for learning from sequences of data: [**📓 Recurrent Neural Networks**](../dl/dl-recurrent-neural-networks.ipynb)

`keras` provides a number of recurrent neural network architectures as layers:

- `keras.layers.SimpleRNN`: fully-connected RNN where the output is to be fed back to input
- `keras.layers.LSTM`: LSTM layer
- `keras.layers.GRU`: GRU layer

In [None]:
import tensorflow
from tensorflow import keras

## Example: Engineering an LSTM Forecasting Model

Consider again the taxi trips dataset:

In [None]:
taxi_trips = data_science_learning_paths.datasets.read_chicago_taxi_trips_daily()

In [None]:
taxi_trips.head()

In [None]:
taxi_trips.plot()

### Data Preparation

More than other machine learning algorithms, neural networks are sensitive to the magnitude of the input values. It is generally recommended to **scale the input variables to the range of the activation function.** 


In [None]:
scaler = sklearn.preprocessing.MinMaxScaler(
    feature_range=(0,1)
)
scaler.fit(
    taxi_trips
)
scaled_values = scaler.transform(
    taxi_trips
)

In [None]:
taxi_trips_scaled = pandas.Series(
    sklearn.preprocessing.MinMaxScaler(
        feature_range=(0,1)
    ).fit(
        taxi_trips
    ).transform(
        taxi_trips
    ).reshape(-1, ),
    index=taxi_trips.index,
)
taxi_trips_scaled.freq = "d"

In [None]:
taxi_trips_scaled.head()

### Transforming to Supervised Learning Format

For training an LSTM network to forecast this time series, we are first preparing the training data by following the approach described in [**Forecasting with "Shallow" Learning**](../timeseries/mlts-forecasting-shallow.ipynb): Transforming the time series to a set of labelled data points:

In [None]:
w = 10

In [None]:
X_train, y_train = forecast_lab.transform_to_labelled_points(taxi_trips["Trips"][:1000], w)
X_test, y_test = forecast_lab.transform_to_labelled_points(taxi_trips["Trips"][1000:2000], 10)

In [None]:
X_train.head()

In [None]:
y_train.head()

The NN expects the input to be in shape $(n, k, w)$, where:

- $n$: number of samples
- $k$: number of features per time step
- $w$: number of time steps in window

In [None]:
X_train.shape

In [None]:
X_train = X_train.values.reshape(-1, 1, w)
X_test = X_test.values.reshape(-1, 1, w)
y_train = y_train.values
y_test = y_test.values

In [None]:
X_train.shape

In [None]:
X_train[0]

### Network Architecture

In [None]:
network = keras.models.Sequential(
    [
        keras.layers.LSTM(
            batch_input_shape=(1, 1, w),
            units=w,
            stateful=True,
        ),
        keras.layers.Dense(units=1,  activation="linear"),
    ]
)

Note the parameters of the LSTM layer:

- **stateful**: The documentation states that  "If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch." We want the layer to maintain state across the training series.


In [None]:
network.compile(
    loss="mean_absolute_percentage_error",
    optimizer="adam"
)

In [None]:
network.summary()

### Training

**Training Parameters**


- **don't shuffle**: Order of data points matters, so keep the training data by passing `shuffle=False`
- **batch size: 1**
- **reset states**: When using a [stateful recurrent network](https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/) want it to maintain state during the epoch (one pass through the training series), but not between multiple passes - therefore the state is reset manually


In [None]:
epochs = 10

In [None]:
for i in range(epochs):
    network.fit(
        X_train, 
        y_train,
        batch_size=1,
        shuffle=False,
        epochs=1
    )
    network.reset_states()

In [None]:
X_train[-1]

In [None]:
network.predict(X_test[-1].reshape(1, 1, w).astype("float"), batch_size=1)

### Wrapping the Network for Evaluation

We provide the `RNNWrapper` class to wrap the code above and make it easy to use the recurrent network in the `ForecastEvaluation`:

In [None]:
w = 50

In [None]:
lstm_forecasting = forecast_lab.RNNWrapper(
    estimator_class=keras.models.Sequential,
    estimator_params={
        "layers": [
            keras.layers.LSTM(
                batch_input_shape=(1, 1, w),
                units=w,
                stateful=True,
            ),
            keras.layers.Dense(units=1,  activation="linear"),
        ]
    },
    sliding_window_size=w,
    epochs=epochs
)

## Evaluation

In the following a few recurrent neural networks are passed to the `ForecastEvaluation`. Let's see whether they do well.

In [None]:
metrics = {
    "MAPE": data_science_learning_paths.mlp.mean_absolute_percentage_error,
    "RSME": data_science_learning_paths.mlp.root_mean_squared_error
}

In [None]:
train_window_size = 365
test_window_size = 60

In [None]:
epochs = 30
w = 50

### LSTM Network

In [None]:
lstm_layers = [
    keras.layers.LSTM(
        batch_input_shape=(1, 1, w),
        units=128,
        stateful=True,
    ),
    keras.layers.Dense(units=1,  activation="linear"),
]

In [None]:
forecast_lab.ForecastEvaluation(
        ts=taxi_trips_scaled,
        forecasting=forecast_lab.RNNWrapper(
            estimator_class=keras.models.Sequential,
            estimator_params={
                "layers": lstm_layers
            },
            epochs=epochs,
            sliding_window_size=w
        ),
        train_window_size=train_window_size,
        test_window_size=test_window_size,
        metrics=metrics
).evaluate(
    k=3, 
    plot_segments=True,
    plot_residuals=True,
    plot_pulls=True
).get_metrics().mean()

### Simple RNN

In [None]:
rnn_layers = [
    keras.layers.SimpleRNN(
        batch_input_shape=(1, 1, w),
        units=128,
        stateful=True,
    ),
    keras.layers.Dense(units=1, activation="linear"),
]

In [None]:
forecast_lab.ForecastEvaluation(
    ts=taxi_trips_scaled,
    forecasting=forecast_lab.RNNWrapper(
        estimator_class=keras.models.Sequential,
        estimator_params={
            "layers": rnn_layers
        },
        epochs=epochs,
        sliding_window_size=w
    ),
    train_window_size=train_window_size,
    test_window_size=test_window_size,
    metrics=metrics
).evaluate(
    k=2, 
    plot_segments=True,
    plot_residuals=True,
    plot_pulls=True
).get_metrics().mean()

### GRU Network

In [None]:
gru_layers = [
    keras.layers.GRU(
        batch_input_shape=(1, 1, w),
        units=128,
        stateful=True,
    ),
    keras.layers.Dense(units=1,  activation="linear"),
]

In [None]:
forecast_lab.ForecastEvaluation(
        ts=taxi_trips_scaled,
        forecasting=forecast_lab.RNNWrapper(
            estimator_class=keras.models.Sequential,
            estimator_params={
                "layers": gru_layers,
            },
            epochs=epochs,
            sliding_window_size=w
        ),
        train_window_size=train_window_size,
        test_window_size=test_window_size,
        metrics=metrics
).evaluate(
    k=2, 
    plot_segments=True,
    plot_residuals=True,
    plot_pulls=True
).get_metrics().mean()

## Model Engineering Options

With neural networks we enter a vast space of possibilities for engineering better models. Parameters to experiment with include:
- network architecture: combine RNN layers with the full range of neural network architecture patterns (layers, activation functions...) 
- dropout: a technique to combat overfitting that is implemented in `keras` RNN layers
- [training parameters](https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/)
- preprocessing: experiment with [different scaling methods](https://stats.stackexchange.com/questions/7757/data-normalization-and-standardization-in-neural-networks)
- [stacking LSTM layers](https://machinelearningmastery.com/stacked-long-short-term-memory-networks/)
- [different forecasting strategies](https://machinelearningmastery.com/multi-step-time-series-forecasting/) (e.g. direct forecasting of multi-step sequence) 
- ...

## Exercise: Incorporating External Variables

We can use a similar approach as described in [**Forecasting with "Shallow" Learning**](../timeseries/mlts-forecasting-shallow.ipynb) to incorporate multivariate time series or other external variables into the training and forecasting of our model. 

**Extend the class `forecast_lab.RNNWrapper` to enable passing the `ext_vars` parameter to the `fit` and `forecast` methods!**

## Summary

**Pros**

+ powerful learning algorithm that can in principle learn any time series pattern
+ extensible to multivariate time series: the expected input is a 3D tensor
+ vast space of model engineering options 

**Cons**

- vast space of model engineering options
- randomness: training results may vary


## References

- [Time Series Forecasting with the Long Short-Term Memory Network in Python](https://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/)
- [Understanding Stateful LSTM Recurrent Neural Networks in Python with Keras](https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/)

---
_This notebook is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright © 2018-2025 [Point 8 GmbH](https://point-8.de)_