## About Time Series Forecasting

In this chapter we introduce and define the problem of forecasting time series, discuss how to properly evaluate the performance of a forecasting model, and get to know some metrics to quantify the performance.

## Preamble

In [None]:
import pandas
import seaborn
import matplotlib.pyplot as plt
import numpy

In [None]:
import data_science_learning_paths

In [None]:
data_science_learning_paths.setup_plot_style()

## For Example: Forecasting Taxi Demand

Consider for example the problem of forecasting demand for taxi rides in a city. Based on [a public dataset from the City of Chicago](https://catalog.data.gov/dataset/taxi-trips), we have extracted a time series of the the number of taxi trips per day. This time series has several realistic properties, such as seasonal (e.g. weekly) patterns and a non-trivial trend.

In [None]:
taxi_trips = data_science_learning_paths.datasets.read_chicago_taxi_trips_daily()

In [None]:
taxi_trips.head()

In [None]:
taxi_trips.plot()

In [None]:
taxi_trips["January 2013"].plot()

In the following, we are going to have a look at various statistical modelling and machine learning techniques that can be applied to forecasting a time series like this.

## Forecasting Essentials

**A Forecast's Horizon**

An important question to ask before building a forecast model: How far into the future do we need to look? The number of steps to forecast is called the forecast's **horizon**.


A true forecasting model has the ability ability to **predict a time series $h$ steps ahead**, for the desired number of steps $h$. (This is a more difficult task than **one-step-ahead** prediction - given the previous $k$ points of the time series, predict the next value, which can be solved by supervised ML methods we already know at this point.)


**Recursive and Direct Forecasting**


We need to distinguish two different approaches to forecasting this:
- **recursive forecasting**: The model has the ability to predict one step ahead - now we apply it recursively on its own predictions to forecast $h$ steps ahead.
- **direct forecasting**: The model can directly predict the next $h$ steps without recursively using its own predictions as input.



## Evaluating Forecasting Models

How good is the performance of my forecasting model? How do I set up an evaluation in order to produce a model that works in practice? This section introduces:

- **error metrics for time series forecasts**
- **splitting into training and test data**

### Metrics

In general, a performance/error metric is a function $M$ that takes  actual values of the time series $y$ and the coresponding forecasted values $\hat{y}$.

$$M(y, \hat{y}) = \dots$$

Measuring the error in time series forecasting is in many ways similar to how we do it with classical **regression** problems, so let's revisit the error metrics discussed in [📓 Machine Learning with Python: About Regression](../ml/ml-regression-intro.ipynb).

- **Mean Absolute Error (MAE)**
- **Root Mean Squared Error (RMSE)**
- **$R^2$ score**
- **Mean Absolute Percentage Error (MAPE)**


### Business Case-Specific Metrics

While the general error metrics above are widely applicable, a metric specific to your business case for forecasting is even more appropriate and interpretable. It pays to spend some time on designing an appropriate performance metric - perhaps the error can be numbered in monetary terms, connected to an important KPI, etc. 

**Exercise: Pick a real-life forecasting problem, then brainstorm and discuss specific metrics that could be relevant!**

### Splitting the Data for Evaluation

At this point we assume you already know about evaluation strategies like **train-test-split** and **cross-validation** and why they are necessary. You can read up on this in [📓 Machine Learning with Python: About Classification](../ml/ml-classification-intro.ipynb).

When dealing with time series, we have to approach things somewhat differently: Here, randomly shuffling and splitting of the data points does not make sense. Rather, we want to use a past segment of the time series to predict a future segment.

How large should these segments be? This is very much dependent on our application and use case. How far into the future do we need to look to make good decisions for our business case? What is more important - short-term or long-term accuracy? 

#### A Helper Class for Forecast Evaluation

We have prepared some code to make evaluation of forecast performance more convenient (and it is free software on [GitHub](https://github.com/point8/forecast-lab), so feel free to contribute!)

The `ForecastEvaluation` class implements a couple of training and evaluation strategies. Here, we use it to:

1. Perform evaluation similar to _cross-validation_: Split the time series randomly into a training and adjacent test segment of given sizes. Fit the model to the training segment and forecast the test segment.
2. Evaluate the performance of the forecast through the given metrics.
3. Plot the forecast and diagnostic information.

In [None]:
import forecast_lab

In [None]:
metrics = {
    "RMSE": data_science_learning_paths.mlp.root_mean_squared_error,
    "MAPE": data_science_learning_paths.mlp. mean_absolute_percentage_error
}

Our `mlts` module also provides a few wrapper classes so that we can evaluate forecasting with different approaches and libraries. For example, the `StatsmodelsWrapper` is used to package a `statsmodels`-style time series model. Note the parameters:

- `estimator_params`: supplied to the constructor of the `estimator_class`
- `fit_params`: supplied to the `fit` method call

In [None]:
import statsmodels.api as sm

In [None]:
try:
    forecast_lab.ForecastEvaluation(
        ts=taxi_trips["Trips"],
        metrics=metrics,
        forecasting=forecast_lab.StatsmodelsWrapper(
            estimator_class=sm.tsa.ARIMA,
            estimator_params={
                "order": (4,1,2)
            },
            fit_params={
                "max_iter": 10
            },
        ),
        train_window_size=365,
        test_window_size=60,
    ).evaluate(
        k=2,
        plot_segments=True,
        plot_residuals=True,
        plot_pulls=True
    ).get_metrics().mean()
except LinAlgError:
    print("The ARIMA model did not converge")

## Dummy Models

When spending time on a sophisticated forecast using statistical modelling and machine learning, there is a good question that we should have an answer to: How does it compare to simple, "trivial" forecasts? We should be able to get significantly better performance to justify the engineering that goes into any ML-based solution.

Here are implementations a few **dummy models** that are really simple, but can be surprisingly hard to beat with more advanced techniques.

In [None]:
forecast_lab.dummy.MeanForecast??

In [None]:
forecast_lab.dummy.LinearForecast??

In [None]:
forecast_lab.ForecastEvaluation(
    ts=taxi_trips["Trips"],
    metrics=metrics,
    forecasting=forecast_lab.dummy.MeanForecast(),
    train_window_size=365,
    test_window_size=100,
).evaluate(
    k=2,
    plot_segments=True,
    plot_residuals=True
).get_metrics()

In [None]:
forecast_lab.ForecastEvaluation(
    ts=taxi_trips["Trips"],
    metrics=metrics,
    forecasting=forecast_lab.dummy.LinearForecast(),
    train_window_size=365,
    test_window_size=100,
).evaluate(
    k=2,
    plot_segments=True,
    plot_residuals=True
).get_metrics().mean()

## References

- [Forecasting - Metrics for Time Series Forecasts](https://www.edscave.com/forecasting---time-series-metrics.html)
- [Recursive and Direct Forecasting](https://stats.stackexchange.com/questions/346714/forecasting-several-periods-with-machine-learning)
- [Simple Forecast Methods](https://otexts.com/fpp2/simple-methods.html)

---
_This notebook is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright © 2018-2025 [Point 8 GmbH](https://point-8.de)_