# Classical Time Series Forecasting Models

In this chapter we learn about so-called _classical_ time series models that statisticians have developed to model and forecast time series data.

## Is this Machine Learning or Statistics?

That's a trick question: In many ways, machine learning is just conveniently automated statistical modelling. Let's adopt the following precise definition of machine learning: **Machine learning is when a computer program improves its performance with experience**, i.e. by seeing data points/examples and using them to build a good model for the task at hand. This definition is rather broad, and not only encompasses fancy methods like neural networks, but also quite simple ones, like iteratively fitting a regression line to a set of points in 2D.

In this chapter we are going to introduce some **classical time series models** that can be used for generating a time series, and therefore, for **forecasting**. In a broad sense, this is also machine learning: To achieve good forecasting performance, such a classical model also need to be fitted to the data of a time series to estimate its parameters. Why then do data scientist often make a distinction between _classical time series modelling_, and _machine learning on time series_? There are historical reasons, with ML being considered more modern. But we also note some technical differences between the approaches: 


| classical statistical approach                                                      | machine learning approach                                                |
|--------------------------------------------------------|--------------------------------------------------------------------------|
| careful statistical modeling: theory, preconditions/assumptions, explainability... | whatever works: focus on performance                                               |
| focus on univariate models                                                                   | more open to incorporating multivariate timeseries or external variables |
| often linear models                                                                 | more algorithmic variety, nonlinear models   |
| stochastic process perspective | anything goes...                                              |

Common to all classical time series models is the view that **time series values are the result of a stochastic process**: A **stochastic process** is any system that changes over time, with randomness involved, and outputs a time series of values. The goal is to estimate a good model of the stochastic process that has generated the time series data we see.

## Preamble

In [None]:
import matplotlib.pyplot as plt
import numpy
import pandas
import seaborn

In [None]:
import data_science_learning_paths

In [None]:
data_science_learning_paths.setup_plot_style()

In [None]:
# Reset matplotlib_converters to standards 
pandas.plotting.register_matplotlib_converters()

## Example: Climate Data Prediction

This is a dataset of the average monthly temperature in the USA over more than one century.

In [None]:
usa_temp = data_science_learning_paths.datasets.read_usa_temperature()

In [None]:
usa_temp["Value"].plot()

We use `statsmodels` to fit a model of the **ARMA** type to the data.

In [None]:
import statsmodels.api as sm

In [None]:
estimator = sm.tsa.ARMA(
    usa_temp["Value"],
    order=(12,1)
)
model = estimator.fit(maxiter=1e3)

The `summary` method outputs a large amount of diagnostic information about the model parameters:

In [None]:
model.summary()

The `plot_predict` method shows actuals and one-step-ahead forecasts by the model. What we see is a first indication that the model has the ability to predict the correct values. However, this is not yet a proper multi-step forecast.

In [None]:
model.plot_predict(start=0, end=10 * 12);

For a true forecast, we fit to a prefix of the time series, call the `forecast` method with the length of the remainder as horizon and compare actuals to predictions:

In [None]:
estimator = sm.tsa.ARMA(
    usa_temp["Value"][:"2000"],
    order=(14,1)
)
model = estimator.fit(maxiter=1e3)

In [None]:
y_test = usa_temp["Value"]["2000":]

In [None]:
y_test.shape

The forecast includes both an error estimation and confidence intervals:

In [None]:
y_f, err, conf_int = model.forecast(steps=229)

In [None]:
pandas.DataFrame(
    {
        "forecast": y_f,
        "conf_low": conf_int[:, 0],
        "conf_high": conf_int[:, 1],
        "actual": y_test
    }
).plot()

## Stationarity

A stochastic process is **stationary** if its **probability distribution does not change overtime**. Consequently, parameters such as mean and variance also do not change over time.

_An ARMA model has stationarity as a precondition._

#### Testing for Stationarity

Proving stationarity is not trivial. However, we can use statistical testing to do a quick check on whether a time series should be assumed to be stationary. 

One such test is the **Augmented Dickey-Fuller Test**. It can be understood as a **hypothesis test** about whether the time series is defined by a time-dependent structure, such as a trend:

- H0 (null-hypothesis): The time series is non-stationary.
- H1 (alternative hypothesis): The time series is stationary.


In [None]:
from statsmodels.tsa.stattools import adfuller

In [None]:
def show_adfuller_result(result):
    print('ADF Statistic: %f' % result[0])
    print('p-value: %f' % result[1])
    for key, value in result[4].items():
        print('\t%s: %.3f' % (key, value))

In [None]:
show_adfuller_result(
    adfuller(usa_temp["Value"])
)

How to read the output:
1. See the value of the ADF statistic and check whether it is **lower than the critical value at the desired significance level**. (The **significance level** can be understood as the probability that we see this result by chance).
2. If it is, we can **reject the null-hypothesis**, i.e. we can assume the time series to be stationary.

In [None]:
show_adfuller_result(
    adfuller(usa_temp["Value"])
)

In [None]:
show_adfuller_result(
    adfuller(usa_temp["Value"])
)

## Model (Hyper)Parameters

The [**Autoregressive Moving Average (ARMA) model**](https://en.m.wikipedia.org/wiki/Autoregressive%E2%80%93moving-average_model) describes a _stationary_ stochastic process in terms of two polynomials:

**Moving Averages - $MA(q)$**

$$MA(q): y_t = \epsilon_t + a_1 \epsilon_t + \dots + a_q \epsilon_{t-q} $$

**Autoregressive Process - $AR(p)$**

$$AR(p): y_t = \sum_{i=1}^p a_i y_{t-i} + \epsilon_t$$

**ARMA - $ARMA(p,q)$**

$$ARMA(p,q): y_t = AR(p) + MA(q) + \epsilon_t$$

In order to estimate the hyperparameters $p$ and $q$, called the **order** of the model, two strategies present themselves:

1. estimate the order of the model manually by statistical means
2. use parameter search and raw compute power to select the best performing model

**1. estimating the order of the model**

Plotting the **autocorrelation** and **partial autocorrelation functions** provides information on choosing $q$ and $p$ respectively. [More information here](https://people.duke.edu/~rnau/411arim3.htm).

In [None]:
from statsmodels.graphics.tsaplots import plot_pacf, plot_acf

In [None]:
plot_pacf(
    usa_temp["Value"],
    lags=50,
);

In [None]:
plot_acf(
    usa_temp["Value"],
    lags=50,
);

## Generating Synthetic Time Series

Stochastic process models like ARMA can also be used for generating time series data from scratch by initializing the model with the appropriate hyperparameters.

In [None]:
from statsmodels.tsa.arima_process import arma_generate_sample

In [None]:
ar_coeff = numpy.random.uniform(-1, 1, 4)
ar_coeff

In [None]:
y = arma_generate_sample(
    ar=[1, -1, ],
    ma=[1],
    sigma=.2,
    nsample=int(1e4)
)

In [None]:
plt.plot(y)

In [None]:
plot_pacf(
    y,
    lags=50,
);

## Non-Stationary Time Series

- **stationarize** the time series, e.g. by subtracting the trend component and adding it back to the forecasts later
- use models that can deal with non-stationary data
    - e.g [**ARIMA**](https://en.m.wikipedia.org/wiki/Autoregressive_integrated_moving_average), which has one additional hyperparameter

## Summary

**Pros**

- rich theory: statistical motivation and explainability
- error and confidence interval estimation

**Cons**

- rich theory
- manual estimation of hyperparameters (model order)
- compute time for fitting increases strongly with model order
- adding external variables is not straightforward

## References

- [Machine Learning Mastery: How to Check if Time Series Data is Stationary with Python](https://machinelearningmastery.com/time-series-data-stationary-python/)
- [O'Reilly: Machine Learning for Time Series Data Analysis—Best Practices in Prediction and Anomaly Detection Using Python ](https://learning.oreilly.com/learning-paths/learning-path-machine/9781492025528/9781492025504-video318126)

---
_This notebook is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright © 2018-2025 [Point 8 GmbH](https://point-8.de)_