# ARIMAX Set Up

#### We need to first split up the data. As it is a time series, we cannot just do train test split, so let's just split based on time

## Necessary Imports

### statsmodels (arima, SARIMAX, plot_acf, plot_pacf), pmdarima, dateutil, pyflux, varmax, adfuller

## First, let's check for stationarity. We can do that with adfuller

In [None]:
from statsmodels.tsa.stattools import adfuller

adfuller(df.value)[1]  # p-value

# This will provide info on stationarity, we're looking for values close to 0. 
# If close to 1, it means it's not stationary. That means we need to do some differentiating!
# A solid metric for this is if the p value is greater than .5, non-stationary, and under is stationary


## ARIMAX Parameters

### p = relates to how many lags we want to incorpriate into AR (this can be determined by plot_acf), q = relates to how many lags we want to incorpriate into MA (can be determined by plot_pacf), d = number of differentiations (this can be determined by pmdarima.arima.utils.ndiffs)

#### Let's look at how to determine p and q

In [None]:
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(df.value, lags=50)
plt.show()

# Where the confidence interval overtakes the value of the "dots", we can use that value as the AR lag

from statsmodels.graphics.tsaplots import plot_pacf
sm.graphics.tsa.plot_pacf(df.value, lags=50, c='r')
plt.show()

# Where the confidence interval overtakes the value of the "dots", we can use that value as the MA lag




Let's then use the NDiff to determine the "d"

In [None]:
from pmdarima.arima.utils import ndiffs
ndiffs(df['target'])

## Model Shape

### ARIMAX - ARIMA model with exogenous variables

from statsmodels.tsa.arima_model import ARIMA, SARIMAX


model = ARIMA(df['FX'], exog = 'exogenous variable', order = (p, d, q)).fit()




### From this, we will look at model summary, and examine p-values to determine stat-sig of each parameter (need to review how to interpret ARIMA summary)

In [None]:
model.summary()

results = arima.predict(len(test), alpha=0.05)


### At this point, we can start to play around with different features

# VARMAX Set Up

## First, we need to determine whether some features have correlation to the others, to help determine the lag value

In [None]:
from statsmodels.tsa.stattools import grangercausalitytests

df['feature a', 'feature_b']

c = "number of lags to go back to"

granger = grangercausalitytests(df['feature a', 'feature_b'], 'c')

# This will check feature_a's causality on feature_b, checking 'c' number of lags
# Check the p-values. 
