## Auto ARIMA 

Automates the ARIMA Model selection.

Benefits:

- Saves time 
- Removes ambiguity 
- Reduces the risk of human error

In [1]:
import numpy as np
import pandas as pd
import scipy
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
from statsmodels.tsa.arima_model import ARIMA
from pmdarima.arima import auto_arima
from arch import arch_model
import seaborn as sns
import yfinance
import warnings
warnings.filterwarnings("ignore")
sns.set()


In [2]:
raw_data = yfinance.download(tickers="^GSPC ^FTSE ^N225 ^GDAXI", start="1994-01-07", end="2018-01-29",
                             interval="1d", group_by='ticker', auto_adjust=True, treads=True)


[*********************100%***********************]  4 of 4 completed


In [3]:
raw_data.head()

Unnamed: 0_level_0,^N225,^N225,^N225,^N225,^N225,^GDAXI,^GDAXI,^GDAXI,^GDAXI,^GDAXI,^FTSE,^FTSE,^FTSE,^FTSE,^FTSE,^GSPC,^GSPC,^GSPC,^GSPC,^GSPC
Unnamed: 0_level_1,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2
1994-01-07,17842.980469,18131.410156,17787.480469,18124.009766,0.0,2218.959961,2227.639893,2201.820068,2224.949951,0.0,3401.399902,3446.800049,3398.699951,3446.0,0.0,467.089996,470.26001,467.029999,469.899994,324920000.0
1994-01-10,18186.519531,18567.060547,18186.519531,18443.439453,0.0,2231.840088,2238.01001,2222.0,2225.0,0.0,3465.699951,3468.100098,3430.0,3440.600098,0.0,469.899994,475.269989,469.549988,475.269989,319490000.0
1994-01-11,18481.849609,18671.669922,18373.039062,18485.25,0.0,2225.429932,2235.610107,2225.179932,2228.100098,0.0,3442.5,3442.5,3413.5,3413.800049,0.0,475.269989,475.279999,473.269989,474.130005,305490000.0
1994-01-12,18447.339844,18807.080078,18301.929688,18793.880859,0.0,2227.120117,2227.790039,2182.060059,2182.060059,0.0,3394.800049,3402.399902,3372.0,3372.0,0.0,474.130005,475.059998,472.140015,474.170013,310690000.0
1994-01-13,18770.380859,18823.380859,18548.75,18577.259766,0.0,2171.5,2183.709961,2134.100098,2142.370117,0.0,3380.699951,3383.300049,3356.899902,3360.0,0.0,474.170013,474.170013,471.799988,472.470001,277970000.0


In [4]:
df = raw_data.copy()

In [5]:
df['spx'] = df['^GSPC'].Close[:]
df['dax'] = df['^GDAXI'].Close[:]
df['ftse'] = df['^FTSE'].Close[:]
df['nikkei'] = df['^N225'].Close[:]


Creating returns

In [6]:
df['ret_spx'] = df.spx.pct_change(1)*100
df['ret_ftse'] = df.ftse.pct_change(1)*100
df['ret_dax'] = df.dax.pct_change(1)*100
df['ret_nikkei'] = df.nikkei.pct_change(1)*100


Splitting the data

In [7]:
size = int(len(df)*0.8)
df, df_test = df.iloc[:size], df.iloc[size:]
df.shape, df_test.shape

((5008, 28), (1252, 28))

#### Fitting the Model 

In [8]:
model_auto = auto_arima(df.ret_ftse[1:])

In [9]:
model_auto

      with_intercept=False)

In [10]:
model_auto.summary()

0,1,2,3
Dep. Variable:,y,No. Observations:,5007.0
Model:,"SARIMAX(5, 0, 2)",Log Likelihood,-7872.435
Date:,"Wed, 24 Aug 2022",AIC,15760.87
Time:,18:19:47,BIC,15813.019
Sample:,0,HQIC,15779.146
,- 5007,,
Covariance Type:,opg,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
ar.L1,0.2012,0.044,4.549,0.000,0.115,0.288
ar.L2,-0.8024,0.040,-19.947,0.000,-0.881,-0.724
ar.L3,-0.0914,0.011,-7.983,0.000,-0.114,-0.069
ar.L4,0.0184,0.009,2.078,0.038,0.001,0.036
ar.L5,-0.1107,0.008,-13.098,0.000,-0.127,-0.094
ma.L1,-0.2252,0.043,-5.264,0.000,-0.309,-0.141
ma.L2,0.7617,0.041,18.451,0.000,0.681,0.843
sigma2,1.3589,0.015,93.334,0.000,1.330,1.387

0,1,2,3
Ljung-Box (L1) (Q):,0.0,Jarque-Bera (JB):,6496.31
Prob(Q):,0.98,Prob(JB):,0.0
Heteroskedasticity (H):,1.99,Skew:,-0.16
Prob(H) (two-sided):,0.0,Kurtosis:,8.57


We see the difference on model name. In  model summary, We have SARIMAX but without seasonal order, and exogenous variable. Hence, it is simple  ARIMA(5,0,2).

**!!! Important Note: In pdmarima v1.5.2, out_of_sample_size is replaced with out_of_sample, so make sure to use the latter!**


- $exogenous$ -> outside factors (e.g other time series)
- $m$ -> seasonal cycle length
- $max_order$ -> maximum amount of variables to be used in the regression (p + q)
- $max_p$ -> maximum AR components
- $max_q$ -> maximum MA components
- $max_d$ -> maximum Integrations
- $maxiter$ -> maximum iterations we're giving the model to converge the coefficients (becomes harder as the order increases)
- $alpha$ -> level of significance, default is 5%, which we should be using most of the time
- $n_jobs$ -> how many models to fit at a time (-1 indicates "as many as possible")
- $trend$ -> "ct" usually
- $information_criterion$ -> 'aic', 'aicc', 'bic', 'hqic', 'oob' 
        (Akaike Information Criterion, Corrected Akaike Information Criterion,
        Bayesian Information Criterion, HannanQuinn Information Criterion, or
        "out of bag"--for validation scoring--respectively)
- $out of sample$ -> validates the model selection (pass the entire dataset, and set 20% to be the out_of_sample_size)

In [11]:
model_auto = auto_arima(df.ret_ftse[1:], exogenous = df[['ret_spx', 'ret_dax', 'ret_nikkei']][1:], m = 5,
                       max_order = None, max_p = 7, max_q = 7, max_d = 2, max_P = 4, max_Q = 4, max_D = 2,
                       maxiter = 50, alpha = 0.05, n_jobs = -1, trend = 'ct', information_criterion = 'oob',
                       out_of_sample = int(len(df)*0.2)
                    )

In [12]:
model_auto.summary()

0,1,2,3
Dep. Variable:,y,No. Observations:,5007.0
Model:,"SARIMAX(2, 0, 3)x(0, 0, [1], 5)",Log Likelihood,-7877.945
Date:,"Wed, 24 Aug 2022",AIC,15773.89
Time:,18:26:51,BIC,15832.558
Sample:,0,HQIC,15794.451
,- 5007,,
Covariance Type:,opg,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
intercept,0.0268,0.050,0.542,0.588,-0.070,0.124
drift,-2.34e-06,1.61e-05,-0.146,0.884,-3.39e-05,2.92e-05
ar.L1,-0.3409,0.111,-3.064,0.002,-0.559,-0.123
ar.L2,-0.1509,0.127,-1.193,0.233,-0.399,0.097
ma.L1,0.3172,0.111,2.861,0.004,0.100,0.534
ma.L2,0.0930,0.128,0.725,0.468,-0.158,0.344
ma.L3,-0.1051,0.009,-11.291,0.000,-0.123,-0.087
ma.S.L5,-0.0510,0.016,-3.246,0.001,-0.082,-0.020
sigma2,1.3687,0.015,91.001,0.000,1.339,1.398

0,1,2,3
Ljung-Box (L1) (Q):,0.0,Jarque-Bera (JB):,6413.59
Prob(Q):,0.99,Prob(JB):,0.0
Heteroskedasticity (H):,2.0,Skew:,-0.18
Prob(H) (two-sided):,0.0,Kurtosis:,8.53
