# [How to Grid Search ARIMA Model Hyperparameters with Python - For loop](https://machinelearningmastery.com/grid-search-arima-hyperparameters-with-python/)

#### [endog, exog, what’s that?](https://www.statsmodels.org/stable/endog_exog.html)

#### [How to Make Out-of-Sample Forecasts with ARIMA in Python](https://machinelearningmastery.com/make-sample-forecasts-arima-python/)

After completing this tutorial, you will know:

- A general procedure that you can use to tune the ARIMA hyperparameters for a rolling one-step forecast.
- How to apply ARIMA hyperparameter optimization on a standard univariate time series dataset.
- Ideas for extending the procedure for more elaborate and robust models.

#### Grid Searching Method

We can automate the process of training and evaluating ARIMA models on different combinations of model hyperparameters. In machine learning this is called a grid search or model tuning.

In this tutorial, we will develop a method to grid search ARIMA hyperparameters for a one-step rolling forecast.

The approach is broken down into two parts:

1. Evaluate an ARIMA model.
2. Evaluate sets of ARIMA parameters.

### 1. Evaluate an ARIMA model

We can evaluate an ARIMA model by preparing it on a training dataset and evaluating predictions on a test dataset.

This approach involves the following steps:

1. Split the dataset into training and test sets.
2. Walk the time steps in the test dataset.
        a. Train an ARIMA model.
        b. Make a one-step prediction.
        c. Store prediction; get and store actual observation.
3. Calculate error score for predictions compared to expected values.

We can implement this in Python as a new standalone function called `evaluate_arima_model()` that takes a time series dataset as input as well as a tuple with the p, d, and q parameters for the model to be evaluated.

The dataset is split in two: 66% for the initial training dataset and the remaining 34% for the test dataset.

Each time step of the test set is iterated. Just one iteration provides a model that you could use to make predictions on new data. The iterative approach allows a new ARIMA model to be trained each time step.

A prediction is made each iteration and stored in a list. This is so that at the end of the test set, all predictions can be compared to the list of expected values and an error score calculated. In this case, a mean squared error score is calculated and returned.

In [79]:
# evaluate an ARIMA model for a given order (p,d,q)
# Load specific forecasting tools
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tools.eval_measures import rmse

def evaluate_arima_models(X, arima_order):
    # prepare training datasets
    train_size = int(len(X)*0.66)
    train, test = X[0:train_size], X[train_size:]
    history = [x for x in train]
    # make predictions
    predictions =  list()
    for t in range(len(test)):
        model = ARIMA(history, arima_order)
        model_fit = model.fit()
        yhat = model_fit.forecast()[0][0] # one-step out-of sample forecast
        predictions.append(yhat)
        history.append(test[t])
    # calculate rmse
    error = rmse(test, predictions)
    return(error)
    

### 2. Iterate ARIMA parameters

The user must specify a grid of p, d, and q ARIMA parameters to iterate. A model is created for each parameter and its performance evaluated by calling the `evaluate_arima_model()` function described in the previous section.

The function must keep track of the lowest error score observed and the configuration that caused it. This can be summarized at the end of the function with a print to standard out.

We can implement this function called `evaluate_models()` as a series of four loops.

There are two additional considerations. The first is to ensure the input data are floating point values (as opposed to integers or strings), as this can cause the ARIMA procedure to fail.

Second, the statsmodels ARIMA procedure internally uses numerical optimization procedures to find a set of coefficients for the model. These procedures can fail, which in turn can throw an exception. We must catch these exceptions and skip those configurations that cause a problem. This happens more often then you would think.

Additionally, it is recommended that warnings be ignored for this code to avoid a lot of noise from running the procedure. This can be done as follows:

In [2]:
import warnings
warnings.filterwarnings("ignore")

The complete procedure for evaluating a grid of ARIMA hyperparameters is listed below.

In [92]:
# evaluate combinations of p,d,q values for an ARIMA model

def evaluate_models(dataset, p_values, d_values, q_values):
    dataset = dataset.astype('float32')
    best_score, best_cfg = float('inf'), None
    for p in p_values:
        for d in d_values:
            for q in q_values:
                order = (p,d,q)
                try:
                    rmse = evaluate_arima_models(dataset, order)
                    if rmse < best_score:
                        best_score, best_cfg = rmse, order
                        print("ARIMA order: %s and RMSE = %.3f" % (order,rmse))
                except:
                    continue
                    
    print("Best ARIMA order %s and RMSE = %.3f"%(best_cfg, best_score))

Now that we have a procedure to grid search ARIMA hyperparameters, let’s test the procedure on two univariate time series problems.

In [10]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()

In [11]:
from datetime import datetime
def parser(x):
    return(datetime.strptime('190'+x, '%Y-%m'))

df = pd.read_csv('../Data/shampoo.csv', header = 0, parse_dates = [0], index_col = 0, date_parser = parser)
df.index.freq = 'MS'
df.head()

Unnamed: 0_level_0,Sales
Month,Unnamed: 1_level_1
1901-01-01,266.0
1901-02-01,145.9
1901-03-01,183.1
1901-04-01,119.3
1901-05-01,180.3


Once loaded, we can specify a site of p, d, and q values to search and pass them to the `evaluate_models()` function.

We will try a suite of lag values (p) and just a few difference iterations (d) and residual error lag values (q).

In [12]:
import warnings
warnings.filterwarnings('ignore')

In [86]:
p_values = [0, 1, 2, 4, 6, 8]
d_values = range(0,3)
q_vales = range(0,3)

In [90]:
evaluate_models(df['Sales'], p_values, d_values, q_values)

(0, 0, 0)
ARIMA order: (0, 0, 0) and RMSE = 52425.268
(0, 0, 1)
ARIMA order: (0, 0, 1) and RMSE = 38145.141
(0, 0, 2)
ARIMA order: (0, 0, 2) and RMSE = 23989.607
(0, 0, 3)
(0, 0, 4)
(0, 1, 0)
ARIMA order: (0, 1, 0) and RMSE = 18003.173
(0, 1, 1)
ARIMA order: (0, 1, 1) and RMSE = 9558.246
(0, 1, 2)
ARIMA order: (0, 1, 2) and RMSE = 6306.315
(0, 1, 3)
(0, 1, 4)
(0, 2, 0)
(0, 2, 1)
(0, 2, 2)
(0, 2, 3)
ARIMA order: (0, 2, 3) and RMSE = 3935.567
(0, 2, 4)
(1, 0, 0)
(1, 0, 1)
(1, 0, 2)
(1, 0, 3)
(1, 0, 4)
(1, 1, 0)
(1, 1, 1)
(1, 1, 2)
(1, 1, 3)
(1, 1, 4)
(1, 2, 0)
(1, 2, 1)
(1, 2, 2)
(1, 2, 3)
(1, 2, 4)
(2, 0, 0)
(2, 0, 1)
(2, 0, 2)
(2, 0, 3)
(2, 0, 4)
(2, 1, 0)
(2, 1, 1)
(2, 1, 2)
(2, 1, 3)
(2, 1, 4)
(2, 2, 0)
(2, 2, 1)
(2, 2, 2)
(2, 2, 3)
(2, 2, 4)
(4, 0, 0)
(4, 0, 1)
(4, 0, 2)
(4, 0, 3)
(4, 0, 4)
(4, 1, 0)
(4, 1, 1)
(4, 1, 2)
(4, 1, 3)
(4, 1, 4)
(4, 2, 0)
(4, 2, 1)
(4, 2, 2)
(4, 2, 3)
(4, 2, 4)
(6, 0, 0)
(6, 0, 1)
(6, 0, 2)
(6, 0, 3)
(6, 0, 4)
(6, 1, 0)
(6, 1, 1)
(6, 1, 2)
(6, 1, 3)
(6, 1

### Example-2

In [93]:
df2 = pd.read_csv('../Data/DailyTotalFemaleBirths.csv',index_col='Date',parse_dates=True)
df2.index.freq = 'D'
df2.head()

Unnamed: 0_level_0,Births
Date,Unnamed: 1_level_1
1959-01-01,35
1959-01-02,32
1959-01-03,30
1959-01-04,31
1959-01-05,44


In [94]:
p_values = [0, 1, 2, 4, 6, 8, 10]
d_values = range(0, 3)
q_values = range(0, 3)

In [None]:
evaluate_models(df2['Births'], p_values, d_values, q_values)

ARIMA order: (0, 0, 0) and RMSE = 8.189
ARIMA order: (0, 0, 1) and RMSE = 7.884
ARIMA order: (0, 0, 2) and RMSE = 7.771
ARIMA order: (0, 1, 1) and RMSE = 7.527
ARIMA order: (0, 1, 2) and RMSE = 7.434
ARIMA order: (1, 1, 1) and RMSE = 7.425
ARIMA order: (2, 0, 1) and RMSE = 7.421
ARIMA order: (2, 1, 1) and RMSE = 7.417


### Extensions

This section lists some ideas to extend the approach you may wish to explore.

1. __Seed Grid__. The classical diagnostic tools of ACF and PACF plots can still be used with the results used to seed the grid of ARIMA parameters to search.
2. __Alternate Measures__. The search seeks to optimize the out-of-sample mean squared error. This could be changed to another out-of-sample statistic, an in-sample statistic, such as AIC or BIC, or some combination of the two. You can choose a metric that is most meaningful on your project.
3. __Residual Diagnostics__. Statistics can automatically be calculated on the residual forecast errors to provide an additional indication of the quality of the fit. Examples include statistical tests for whether the distribution of residuals is Gaussian and whether there is an autocorrelation in the residuals.
4. __Update Model__. The ARIMA model is created from scratch for each one-step forecast. With careful inspection of the API, it may be possible to update the internal data of the model with new observations rather than recreating it from scratch.
5. __Preconditions__. The ARIMA model can make assumptions about the time series dataset, such as normality and stationarity. These could be checked and a warning raised for a given of a dataset prior to a given model being trained.