#### ARIMAX 

This is simply an ARMA model with an extra independent variable (exogenous variables) on the right side of the equation.

$ \Delta P_t = C + \beta X + \phi_1 \Delta P_{t-1} + \theta_1 \epsilon_{t-1} + \epsilon_t$

- $\Delta P_t$: $ P_t -  P_{t-1}$

- $\beta X$: exogenous variables

- $P_t, P_{t-1}$: Values in the current period and a period ago respectively 

- $ \epsilon_t, \epsilon_{t-1}$: Error terms 

- $C$: Constant

- $\phi_1$: what part of the value last period is relevant in explaining the current one

- $\theta_1$: what part of the value last period is relevant in explaining the current one


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.graphics.tsaplots as sgt
from statsmodels.tsa.arima.model import ARIMA
from scipy.stats.distributions import chi2
import statsmodels.tsa.stattools as sts
import seaborn as sns
sns.set()

In [2]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
raw_csv_data = pd.read_csv("./../datasets/Index2018.csv")
df = raw_csv_data.copy()
df.date = pd.to_datetime(df.date, dayfirst=True)
df.set_index("date", inplace=True)
df = df.asfreq('b')
df = df.fillna(method='ffill')

In [4]:
df.head()

Unnamed: 0_level_0,spx,dax,ftse,nikkei
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1994-01-07,469.9,2224.95,3445.98,18124.01
1994-01-10,475.27,2225.0,3440.58,18443.44
1994-01-11,474.13,2228.1,3413.77,18485.25
1994-01-12,474.17,2182.06,3372.02,18793.88
1994-01-13,472.47,2142.37,3360.01,18577.26


In [5]:
# picking market value for FTSE
df['market_value'] = df.ftse

# df.drop(["ftse", "nikkei", "dax"], axis=1, inplace=True)
df.describe()

Unnamed: 0,spx,dax,ftse,nikkei,market_value
count,6277.0,6277.0,6277.0,6277.0,6277.0
mean,1288.642547,6083.381061,5423.679824,14597.672753,5423.679824
std,487.86821,2755.563853,1145.616719,4043.795272,1145.616719
min,438.92,1911.7,2876.6,7054.98,2876.6
25%,992.715221,4070.46,4486.73,10701.13,4486.73
50%,1233.761241,5774.26,5663.3,15030.51,5663.3
75%,1460.25,7445.56,6304.630175,17860.47,6304.630175
max,2872.867839,13559.6,7778.637689,24124.15,7778.637689


In [6]:
train_locs = int(df.shape[0]*0.8)
train_locs

5021

In [7]:
df, df_test = df.iloc[:train_locs], df.iloc[train_locs:]
df, df_test

(                    spx      dax     ftse    nikkei  market_value
 date                                                             
 1994-01-07   469.900000  2224.95  3445.98  18124.01       3445.98
 1994-01-10   475.270000  2225.00  3440.58  18443.44       3440.58
 1994-01-11   474.130000  2228.10  3413.77  18485.25       3413.77
 1994-01-12   474.170000  2182.06  3372.02  18793.88       3372.02
 1994-01-13   472.470000  2142.37  3360.01  18577.26       3360.01
 ...                 ...      ...      ...       ...           ...
 2013-04-01  1562.173837  7795.31  6411.74  12135.02       6411.74
 2013-04-02  1570.252238  7943.87  6490.66  12003.43       6490.66
 2013-04-03  1553.686978  7874.75  6420.28  12362.20       6420.28
 2013-04-04  1559.979316  7817.39  6344.11  12634.54       6344.11
 2013-04-05  1553.278930  7658.75  6249.77  12833.64       6249.77
 
 [5021 rows x 5 columns],
                     spx       dax         ftse    nikkei  market_value
 date                        

In [8]:
def llr_test(model_one, model_two, df=1):
    l1 = model_one.fit().llf
    l2 = model_two.fit().llf
    lr = (2*(l2-l1))
    p = chi2.sf(lr, df).round(3)
    return p


In [9]:
df['returns']= df.market_value.pct_change(1)*100
df[:5]

Unnamed: 0_level_0,spx,dax,ftse,nikkei,market_value,returns
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1994-01-07,469.9,2224.95,3445.98,18124.01,3445.98,
1994-01-10,475.27,2225.0,3440.58,18443.44,3440.58,-0.156704
1994-01-11,474.13,2228.1,3413.77,18485.25,3413.77,-0.779229
1994-01-12,474.17,2182.06,3372.02,18793.88,3372.02,-1.222988
1994-01-13,472.47,2142.37,3360.01,18577.26,3360.01,-0.356166


In [10]:
model_111_xspx = ARIMA(df.market_value, exog=df.spx, order=(1,1,1))
result_111_xspx = model_111_xspx.fit()
result_111_xspx.summary()

0,1,2,3
Dep. Variable:,market_value,No. Observations:,5021.0
Model:,"ARIMA(1, 1, 1)",Log Likelihood,-26693.392
Date:,"Sun, 21 Aug 2022",AIC,53394.784
Time:,14:18:56,BIC,53420.869
Sample:,01-07-1994,HQIC,53403.925
,- 04-05-2013,,
Covariance Type:,opg,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
spx,2.6928,0.035,76.407,0.000,2.624,2.762
ar.L1,0.2571,0.029,8.812,0.000,0.200,0.314
ma.L1,-0.5460,0.025,-21.791,0.000,-0.595,-0.497
sigma2,2433.0771,27.350,88.961,0.000,2379.472,2486.682

0,1,2,3
Ljung-Box (L1) (Q):,0.24,Jarque-Bera (JB):,4423.9
Prob(Q):,0.62,Prob(JB):,0.0
Heteroskedasticity (H):,1.25,Skew:,-0.57
Prob(H) (two-sided):,0.0,Kurtosis:,7.45


we can see that we get an additional row for the S&P prices.

In [11]:
model_111_xdax = ARIMA(df.market_value, exog=df.dax, order=(1,1,1))
result_111_xdax = model_111_xdax.fit()
result_111_xdax.summary()

0,1,2,3
Dep. Variable:,market_value,No. Observations:,5021.0
Model:,"ARIMA(1, 1, 1)",Log Likelihood,-25049.811
Date:,"Sun, 21 Aug 2022",AIC,50107.623
Time:,14:18:57,BIC,50133.707
Sample:,01-07-1994,HQIC,50116.763
,- 04-05-2013,,
Covariance Type:,opg,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
dax,0.6306,0.004,148.424,0.000,0.622,0.639
ar.L1,0.6378,0.067,9.537,0.000,0.507,0.769
ma.L1,-0.7024,0.063,-11.149,0.000,-0.826,-0.579
sigma2,1264.1092,14.130,89.460,0.000,1236.414,1291.804

0,1,2,3
Ljung-Box (L1) (Q):,0.1,Jarque-Bera (JB):,5691.91
Prob(Q):,0.75,Prob(JB):,0.0
Heteroskedasticity (H):,0.96,Skew:,0.02
Prob(H) (two-sided):,0.42,Kurtosis:,8.22


In [13]:
model_111_xnik = ARIMA(df.market_value, exog=df.nikkei, order=(1,1,1))
result_111_nik = model_111_xnik.fit()
result_111_nik.summary()

0,1,2,3
Dep. Variable:,market_value,No. Observations:,5021.0
Model:,"ARIMA(1, 1, 1)",Log Likelihood,-27418.225
Date:,"Sun, 21 Aug 2022",AIC,54844.451
Time:,14:20:21,BIC,54870.535
Sample:,01-07-1994,HQIC,54853.591
,- 04-05-2013,,
Covariance Type:,opg,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
nikkei,0.0825,0.003,24.620,0.000,0.076,0.089
ar.L1,0.5931,0.049,12.064,0.000,0.497,0.690
ma.L1,-0.6919,0.043,-16.245,0.000,-0.775,-0.608
sigma2,3252.8928,41.734,77.944,0.000,3171.096,3334.690

0,1,2,3
Ljung-Box (L1) (Q):,0.0,Jarque-Bera (JB):,2355.25
Prob(Q):,0.95,Prob(JB):,0.0
Heteroskedasticity (H):,1.73,Skew:,-0.25
Prob(H) (two-sided):,0.0,Kurtosis:,6.32


### Seasonality SARIMAX
 
The Seasonal Autoregressive Integrated Moving Average eXogenous Model. 

The SARIMAX is the seasonal equivalent of the ARIMAX model. Of course, there exist seasonal versions of the other models as well (SARMA, SARIMA, SARMAX, etc.)


$ \Delta P_t = C + \beta X + \phi_1 \Delta P_{t-1} + \theta_1 \epsilon_{t-1} + \epsilon_t$

- $\Delta P_t$: $ P_t -  P_{t-1}$

- $\beta X$: exogenous variables

- $P_t, P_{t-1}$: Values in the current period and a period ago respectively 

- $ \epsilon_t, \epsilon_{t-1}$: Error terms 

- $C$: Constant

- $\phi_1$: what part of the value last period is relevant in explaining the current one

- $\theta_1$: what part of the value last period is relevant in explaining the current one