# Goals

Last week we introduced the ARIMA series of models. The goal of this notebook is to show you how to find the best values for the model parameters using the *Akaike information criterion* (AIC).  

# Reminder: ARIMA models and SARIMAX
AR and MA models can be combined with taking differences to give the **ARIMA(p, d, q)** series of models. Last week, we introduced the **tsa.statespace.SARIMAX* method implemented in *statsmodel*: https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html

The SARIMAX dunction takes as input the pandas dataframe *df* with the data, and then two tuples specifying the *order* and the *seasonal_order* parameters for the model. 

In [1]:
import pandas as pd
from pandas import read_excel
import matplotlib.pyplot as plt
import statsmodels.api as sm

In [2]:
series = read_excel('BuildingMaterials.xls', sheet_name='Data', header=0, index_col=0, parse_dates=True)
series.index.freq = 'MS'

mod = sm.tsa.statespace.SARIMAX(series, order=(1,1,1), seasonal_order=(0,1,1,12))
results = mod.fit(disp=False)
print(results.summary())

                                     SARIMAX Results                                      
Dep. Variable:                         Production   No. Observations:                  265
Model:             SARIMAX(1, 1, 1)x(0, 1, 1, 12)   Log Likelihood               -1030.968
Date:                            Mon, 13 May 2024   AIC                           2069.936
Time:                                    11:45:49   BIC                           2084.054
Sample:                                09-01-1986   HQIC                          2075.617
                                     - 09-01-2008                                         
Covariance Type:                              opg                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1         -0.1172      0.134     -0.877      0.380      -0.379       0.145
ma.L1         -0.3691      0.120   

  series = read_excel('BuildingMaterials.xls', sheet_name='Data', header=0, index_col=0, parse_dates=True)


The cell below uses the *itertools* python library to create all possible combinations of *p*, *d* and *q* triplets: https://docs.python.org/3/library/itertools.html

Using *itertools* is preferable than just creating the triples manually with for loops, as it's written to be fast and memory-efficient. 

In [8]:
import itertools

#Define the p, d and q parameters to take any value between 0 and 1
p = d = q = range(0, 2)

# Generate all different combinations of p, d and q triplets
pdq = list(itertools.product(p, d, q))

# Generate all different combinations of seasonal p, d and q triplets (i.e., P, D, Q)
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]

Here below, an example of how to select the best model parameters by retrieving directly the AIC score from the SARIMAX fit results.

In [7]:
import warnings
warnings.filterwarnings("ignore") # specify to ignore warning messages

# Indentification of best model from different combinations of pdq and seasonal_pdq
best_score, best_param, best_paramSeasonal = float("inf"), None, None
for param in pdq:
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX(series, order=param, seasonal_order=param_seasonal, enforce_invertibility=False)
            results = mod.fit(disp=False)
            if results.aic < best_score:
                best_score, best_param, best_paramSeasonal = results.aic, param, param_seasonal
            print('ARIMA{}x{} - AIC:{}'.format(param, param_seasonal, results.aic))
        except:
            continue # if fit fails, just continue to the next parameters combionation

ARIMA(0, 0, 0)x(0, 0, 0, 12) - AIC:3718.8589858085606
ARIMA(0, 0, 0)x(0, 0, 1, 12) - AIC:3407.987314430595
ARIMA(0, 0, 0)x(0, 1, 0, 12) - AIC:2463.755234631169
ARIMA(0, 0, 0)x(0, 1, 1, 12) - AIC:2456.249081845319
ARIMA(0, 0, 0)x(1, 0, 0, 12) - AIC:2633.9315173431496
ARIMA(0, 0, 0)x(1, 0, 1, 12) - AIC:2625.5089242544527
ARIMA(0, 0, 0)x(1, 1, 0, 12) - AIC:2460.088181568989
ARIMA(0, 0, 0)x(1, 1, 1, 12) - AIC:2448.1526057045826
ARIMA(0, 0, 1)x(0, 0, 0, 12) - AIC:3420.3934063820543
ARIMA(0, 0, 1)x(0, 0, 1, 12) - AIC:3136.4889686695988
ARIMA(0, 0, 1)x(0, 1, 0, 12) - AIC:2328.702839352657
ARIMA(0, 0, 1)x(0, 1, 1, 12) - AIC:2330.6840807059393
ARIMA(0, 0, 1)x(1, 0, 0, 12) - AIC:2487.3682882170615
ARIMA(0, 0, 1)x(1, 0, 1, 12) - AIC:2489.9482419659416
ARIMA(0, 0, 1)x(1, 1, 0, 12) - AIC:2330.690978775181
ARIMA(0, 0, 1)x(1, 1, 1, 12) - AIC:2326.822308502616
ARIMA(0, 1, 0)x(0, 0, 0, 12) - AIC:2714.7674515796884
ARIMA(0, 1, 0)x(0, 0, 1, 12) - AIC:2529.9678612143025
ARIMA(0, 1, 0)x(0, 1, 0, 12) - AIC:

Let's now print the best set of parameters according to the AIC.

In [9]:
print('The best model is ARIMA{}x{} - AIC:{}'.format(best_param, best_paramSeasonal, best_score))

The best model is ARIMA(0, 1, 1)x(1, 1, 1, 12) - AIC:2062.4968772167613
