# Notes

## Stationaty

A time-series is called __Stationary__ if its statistical properties such as mean, variance, autocorrelation, etc. are all constant over time.
the __augmented Dicky-Fuller test__ can help us to identify whether a time-series is stationary or not:

```
from statsmodels.tsa.stattools import adfuller

results = adfuller(df['close'])
```

The second value of `results` is the p-value: if `results[1] < 0.05`, we reject the null-hypothesis, i.e., the series is Stationary.

## ARMA models

An __Autoregressive__ (__AR(N)__) model is one that depends on the previous __N__ values: $y_t = \sum_i=1^N a_i y_{t-i} + \epsilon_t$, with $\epsilon_t$ being the shock value (error).
A __Moving Average__ (__MA(N)__) model is one that depends on the previous __N__ shock values: $y_t = \sum_i=1^N m_i \epsilon_{t-i} + \epsilon_t$.
Finally, an __ARMA(p,q)__ model is a combination of a __AR(p)__ and a __AM(q)__ model.

To create a __ARMA(1,1)__ series, we can use statsmodel

```python
from statsmodels.tsa.arima_process import arma_generate_sample


ar_coefs = [1, -a_1]
ma_coefs = [1, m_1]
y = arma_generate_sample(ar_coefs, ma_coefs, nsample=100, scale=0.5)
```

Note the negaive value in $a_1$

We can fit an __ARMA(p,q)__ using statsmodel:

```python
from statsmodels.tsa.arima.model import ARIMA

# Initiate the model
model = ARMA(series, order=(p, 0, q))

# Fit
model.fit()

# Print results
print(model.summary())

```

### ARIMAX

We can use `ARMA` to model an `ARIMAX` model: one with a dependency on a linear component:

```python
from statsmodels.tsa.arima.model import ARIMA

# Initiate the model
model = ARMA(series, order=(p, 0, q), exog=dependency)

# Fit
model.fit()

# Print results
print(model.summary())

```

### What is the I in ARIMA?

Not all time series are stationary, but we can make them stationary by taking the difference of values a certain number of time. A series that needs to be differentiated __n__-times can be modeled by a __ARIMA__ model of order __(d, n, q)__.

## Finding the right p and q in ARMA(p,q)

We can use the auto-correlation __ACF__ and the partial auto-correlation __PACF__ to determine the correct __p__ and __q__ values:
1. If the __ACF__ is steadily decreasing but the __PACF__ is cut-off after a lag __p__, then the time series is a __AR(p)__ one.
1. If the __ACF__ is cut-off after a lag __q__ and the __PACF__ is steadily decreasing, the the time series is a __MA(q)__ one.
1. Finally, if both __ACF__ and __PACF__ are steadily decreasing, then the model is a __ARMA__ one, and we can't infer the order, yet.

The functions can be visually checked using statsmodel:

```python
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
```

## Is the model a good model?

We can perform several tests to see if the model is good:
1. The __Akaike information criterion__ (__AIC__) prefers simpler model and is good at choosing simpler models for predictions.
1. The __Bayesian information criterion__ (__BIC__), similar to __AIC__, likes simpler models and it's good at choosing simpler explanatory models.
1. We can also use the `plot_diagnosis()` method to inspect four attributes of our model.:
    1. The __Standardized Residual__ showing pattern in the errors,
    1. The __Histogram plus estimated density__: errors should be Gaussian, and not follow a pattern
    1. The __Normal Q-Q__ shows how close to a normal distribution are the errors,
    1. The __Correlogram__, showng if our error are correlated.
1. Inspecting the `summary()` method also gives us:
    1. The __Ljung–Box test__ (__Prob(Q)__): p-value for null hypothesis that residuals are uncorrelated.
    1. The __Jarque-Bera test__ (__Prob(JB)__): p-value for null hypothesis that residuals are normal.

## Seasonal ARIMA Models = SARIMA

Just as the season repeating themselves every twelve months, other similar patterns can be found in time series. We refer to those as Seasonal time series. Just as before, `statsmodel` provides several tools to analyse and predict such time series.

### seasonal_decompose

`seasonal_decompose` provides the time series itself, __Observed__, the __Seasonal__ decomposition (the cycle), the __Trend__ (he mean value) and the __Residual__, which is the error.

```python
# Import
from statsmodels.tsa.seasonal import seasonal_decompose

# Decomposition
decomp_results = seasonal_decompose(series, period=12)


# Plot decomposed data
decomp_results.plot()
plt.show()
```

### SARIMAX

To model and predict a __SARIMA__, we can use `SARIMAX`

```python
# Imports
statsmodels.tsa.statespace.sarimax import SARIMAX
# Instantiate model
model = SARIMAX(df, order=(p,d,q), seasonal_order=(P,D,Q,S))
# Fit model
results = model.fit()
```