# Classical Forecasting Methods: ARIMA

**Approximate Learning Time**: Up to 3 hours 

--- 

In this module, we will explore the following classical time series forecasting methods:

- **Auto-Regressive Integrated Moving Average (ARIMA) Models:** Covered in Notebook 3.1.
- **Holt-Winters Model:** An approach to exponential smoothing for time series forecasting, covered in Notebook 3.2.
- **Vector Auto-Regression (VAR):** A method for modeling multiple time series simultaneously, covered in Notebook 3.3.

We will follow a standard workflow that includes:
- Splitting the data into training, validation, and test subsets.
- Fitting the models on a predefined set of hyperparameters.
- Conducting a hyperparameter search to find the best-performing model.
- Evaluating test metrics using the best model.

---

## SARIMA Model


**Autoregressive (AR) models** assume that future values are a linear function of past values. The order of the AR model, denoted by $p$, is the number of past values used to predict future values. The residual error, $e_t$, is assumed to be white noise.

**Moving Average (MA) models** assume that the future values can be modeled using past error terms ($e_t$). The term "moving average" here refers to a weighted sum of past errors, not to a traditional moving average over the data. Importantly, MA models cannot be applied without first modeling the errors through an AR process, so MA models are not standalone for time series prediction. The order of MA model, denoted by $q$, is the number of past error terms to be used. 

**ARMA (Autoregressive Moving Average) models** combine the AR and MA approaches, modeling future values as a linear combination of both past values and past error terms. The order of an ARMA model is given by $(p, q)$, where $p$ is the AR order and $q$ is the MA order.

The core assumption of AR and ARMA models is that the underlying time series is stationary. If a series is non-stationary, it can be transformed to achieve stationarity, commonly by **differencing**—subtracting consecutive values. When differencing is incorporated into an ARMA model, the result is an **ARIMA (Autoregressive Integrated Moving Average)** model, where:
- $p$ is the AR order,
- $d$ is the number of differencing steps,
- $q$ is the MA order.

The order of an ARIMA model is specified as $(p, d, q)$.


**SARIMA (Seasonal ARIMA)** explicitly accounts for seasonality in time series data. Seasonality refers to patterns that repeat after a fixed time period, or "season." Incorporating seasonality into the model can enhance performance by reducing the search space for parameters.

SARIMA adds seasonal terms to the ARIMA model, introducing seasonal parameters $(P, D, Q, S)$ where:
- $P$ is the seasonal autoregressive order,
- $D$ is the seasonal differencing order,
- $Q$ is the seasonal moving average order,
- $S$ is the length of the seasonal cycle (e.g., number of time steps in a season).

For example, $(0,1,0,4)$ would model a series with seasonality that repeats every 4 time steps, and the differencing would take into account changes between values 4 time steps apart.


**Note**: The **ARIMA model** is designed for univariate time series, meaning it models one time series at a time. Therefore, we will build individual univariate models for each of the time series in our dataset. To handle **multivariate time series**, where multiple variables are modeled together, we will introduce the **Vector Autoregression (VAR)** approach later in this module.

---

Let's load the log daily returns of exchange rates, and split the data into train, validation, and test subsets!

In [None]:
import pathlib
import numpy as np
import itertools
import pandas as pd
from tqdm.notebook import tqdm

from statsmodels.tsa.statespace.sarimax import SARIMAX

# To avoid flooding of the screen with convergence warnings during hyperparameter tuning
import warnings
warnings.filterwarnings("ignore")

## WARNING: To compare different models on the same horizon, keep this same across the notebooks
from termcolor import colored
import sys; sys.path.append("../")
import utils

FORECASTING_HORIZON = [4, 8, 12] # weeks 
MAX_FORECASTING_HORIZON = max(FORECASTING_HORIZON)

SEQUENCE_LENGTH = 2 * MAX_FORECASTING_HORIZON
PREDICTION_LENGTH = MAX_FORECASTING_HORIZON

DIRECTORY_PATH_TO_SAVE_RESULTS = pathlib.Path('../results/DIY/').resolve()
MODEL_NAME = "ARIMA"

RESULTS_DIRECTORY = DIRECTORY_PATH_TO_SAVE_RESULTS / MODEL_NAME
if RESULTS_DIRECTORY.exists():
    print(colored(f'Directory {str(RESULTS_DIRECTORY)} already exists.'
           '\nThis notebook will overwrite results in the same directory.'
           '\nYou can also create a new directory if you want to keep this directory untouched.'
           ' Just change the `MODEL_NAME` in this notebook.\n', "red" ))
else:
    RESULTS_DIRECTORY.mkdir(parents=True)

# load data
data, transformed_data = utils.load_tutotrial_data(dataset='exchange_rate', log_transform=True)
data = transformed_data

## DATA SPLITTING
train_val_data = data.iloc[:-MAX_FORECASTING_HORIZON]
train_data, val_data = train_val_data.iloc[:-MAX_FORECASTING_HORIZON], train_val_data.iloc[-MAX_FORECASTING_HORIZON:]
test_data = data.iloc[-MAX_FORECASTING_HORIZON:]

print(f"Number of steps in training data: {len(train_data)}\nNumber of steps in validation data: {len(val_data)}\nNumber of steps in test data: {len(test_data)}")

%load_ext autoreload
%autoreload 2

--- 

## Hyperparameter Tuning

The hyperparameters for the **SARIMA model** include parameters for both the non-seasonal order $(p, d, q)$ and the seasonal order $(P, D, Q, S)$. Given these hyperparameters, we fit the model to the data. SARIMA models minimize an objective function, typically the likelihood, which is summarized by the **Akaike Information Criterion (AIC)**. 

However, the fitness criterion used during model fitting (AIC) does not necessarily correspond to real-world performance metrics such as the **Mean Absolute Scaled Error (MASE)**, which is more aligned with practical forecasting needs. Therefore, we evaluate the model's performance on the validation dataset to select the best hyperparameters.

For this tutorial, we will explore the following range of hyperparameters for the SARIMA model:

- $p \in \{1, 2, 3\}$
- $d \in \{0, 1\}$
- $q \in \{1, 2, 3\}$
- $s \in \{0, 4, 12\}$

One of the simplest ways to find the best combination of these hyperparameters is through a brute-force search (grid search), which evaluates all possible combinations. However, this approach becomes computationally expensive as the search space grows. To overcome this, more sophisticated methods such as **Bayesian optimization** can be employed. Libraries like [`pmdarima.auto_arima`](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.AutoARIMA.html#pmdarima.arima.AutoARIMA) implement intelligent hyperparameter search strategies, including Bayesian methods.

In this tutorial, we will stick to brute-force search, but leave it as an exercise for you to explore AutoARIMA for smarter hyperparameter tuning.

**Note:** Some parameters might lead to warnings as the underlying model might be under-constrained.


In [None]:
# define the grid
p_values = range(1, 4)
q_values = range(1, 3)
d_values = range(0, 2)
s_values = [0, 4, 12] # a large value will result in long running times
pdqs = list(itertools.product(p_values, d_values, q_values, s_values))
print(f"Number of parameters: {len(pdqs)}")

In [None]:
# Grid search

# only for understanding
best_aic = {}
best_aic_model = {}

# we will use the models with best mase on validation dataset
best_mase = {}
best_mase_model = {}
for col in tqdm(train_data.columns, leave=False):
    ts = train_data[col]
    best_aic[col] = np.inf
    best_mase[col] = np.inf
    
    for p,d,q,s  in tqdm(pdqs, leave=False, desc=f"Column:{col}"):
        if s == 0:
            seasonal_order = None
        else:
            seasonal_order=(0, 1, 0, s)

        model = SARIMAX(
            ts,
            trend="c",
            order=(p, d, q), # non-seasonal ARIMA parameters
            seasonal_order=seasonal_order,
            enforce_stationarity=False,
            enforce_invertibility=False,
        )
    
        results = model.fit(disp=False)
        predictions = results.get_forecast(steps=len(val_data)) # how to get forecasts?

        # SARIMAX returns mean and standard errors
        forecast = predictions.predicted_mean.values 
        actual = val_data[col].values
        mase = utils.mean_absolute_scaled_error(forecast, actual, insample_data=ts.values)

        if mase < best_mase[col]:
            best_mase[col] = mase
            best_mase_model[col] = (model, results, (p, d, q, s))
        
        if results.aic < best_aic[col]:
            best_aic[col] = results.aic
            best_aic_model[col] = (model, results, (p, d, q, s))
    
    print(f"Col: {col}. Best AIC parameters: {best_aic_model[col][-1]}. Best AIC:{best_aic[col]}. Best Mase: {best_mase[col]}. Best Mase Parameters: {best_mase_model[col][-1]}")

**Observations**

- As suspected, best hyperparameters corresponding to MASE aren't the same as that which performs the best according to AIC

---

## Refit on Train-Val Subset & Forecast

To measure the model's performance on the test data, we will first retrain the model using the combined train-validation dataset. Then, we will compute the MASE metric on the test dataset to evaluate its performance.
Additionally, we will store the test predictions for later comparison with other forecasting methods.



In [None]:
test_predictions = {}
best_model_metrics = {}
for col in test_data.columns:
    best_model_metrics[col] = {}
    ts = train_val_data[col]
    
    # retrain the model with best mase parameters on train_val_data
    p, d, q, s = best_mase_model[col][2]
    ts = train_val_data[col]
    if s == 0:
        seasonal_order = None
    else:
        seasonal_order=(0, 1, 0, s)
    
    model = SARIMAX(
        ts,
        trend="c",
        order=(p, d, q),
        seasonal_order=seasonal_order,
        enforce_stationarity=False,
        enforce_invertibility=False,
    )

    results = model.fit(disp=False)

    # get metrics and predictions
    predictions = results.get_forecast(steps=len(test_data))
    test_predictions[f"{MODEL_NAME}_{col}_mean"] = predictions.predicted_mean.values
    test_predictions[f"{MODEL_NAME}_{col}_se"] = predictions.se_mean.values

# store these predictions 
test_predictions_df = pd.DataFrame(test_predictions, index=test_data.index)

test_predictions_df.to_csv(f"{str(RESULTS_DIRECTORY)}/predictions.csv", index=True)
print(test_predictions_df.shape)
test_predictions_df.head()

---

## Evaluate

Let's compute the metrics by comparing the predictions with that of the target data. Note that we will have to rename the columns of the dataframe to match the expected column names by the function. 

In [None]:
# compute MASE metrics
model_metrics, records = utils.get_mase_metrics(
    historical_data=train_val_data,
    test_predictions=test_predictions_df[[x for x in test_predictions_df.columns if 'mean' in x]].rename(
            columns={x:x.split("_")[1] for x in test_predictions_df.columns
        }),
    target_data=test_data,
    forecasting_horizons=FORECASTING_HORIZON,
    columns=data.columns, 
    model_name=MODEL_NAME,
)

records = pd.DataFrame(records)

records.to_csv(f"{str(RESULTS_DIRECTORY)}/metrics.csv", index=False)
records[['col', 'horizon', 'mase']].pivot(index=['horizon'], columns='col')

**Exercise**: What does these values mean?

---

## Compare Models

In [None]:
utils.display_results(path=DIRECTORY_PATH_TO_SAVE_RESULTS, metric='mase')

**Exercise**: Change `metric` in the above function to "mae" or "mse" and observe the values. What do you think about the performance of the models?

---

## Plot Forecasts

In [None]:
fig, axs = utils.plot_forecasts(
    historical_data=train_val_data,
    forecast_directory_path=DIRECTORY_PATH_TO_SAVE_RESULTS,
    target_data=test_data,
    columns=data.columns,
    n_history_to_plot=10, 
    forecasting_horizon=MAX_FORECASTING_HORIZON,
    dpi=200
)

--- 

## Conclusions

We learned about ARIMA models, searched for the best ARIMA model for 8 time series in the dataset, and finally, evalauted its performance using the MASE metric. 

--- 

## Exercises

- The [`pmdarima` library](https://alkaline-ml.com/pmdarima/index.html) in Python provides an automatic parameter tuning feature for ARIMA models. Refer to the [documentation](https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.AutoARIMA.html#pmdarima.arima.AutoARIMA) for `pmd.auto_arima` to familiarize yourself with the parameters. Note that it does not perform hyperparameter search on the seasonality parameter, so you will need to manually select the best seasonality parameter based on the AIC score or validation score.

- Apply a normalization procedure (e.g., **min-max scaling**) to the data, ensuring that only the training data is used for fitting the scaler. Perform the modeling process on the normalized data and, after generating the final model's predictions, invert the normalization to return the output to its original scale. See `sklearn.preprocessing.MinMaxScaler` ([documentation](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html))

- Additionally, perform the modeling on the **raw data**, without applying any transformation (such as converting it into log daily returns), to compare results directly with the untransformed dataset.

---

## Next Steps

- To learn about other classical methods, proceed to the notebook 3.3 (exponential smoothing method) or notebook 3.4 (Vector-Autoregression method)

- To learn about other machine learning based approaches, check out the module 4 (XGBoost), module 5 (LSTM-based models), module 6 (Transformer based models), or module 7 (LLM-based models).
---