# Univariate TS Forecasting: Autoregression (AR)

Autoregression models an output (value at the next step) based on the linear combination of input variables (values at prior time steps). For example, in linear regression y-hat is the prediction, β₀ and β₁ are coefficients calculated by the model on training data, and X is an input value.

$ \hat{y} = \beta_{0} + \beta_{1} X_{1} $

Similarly, in time series we can predict the value at the next time step given the observations at current and previous time steps.

$ X(t) = \beta_{0} + \beta_{1} X(t-1) + \beta_{2} X(t-2) + ... + \beta_{p} X(t-p) $

A pure Auto Regressive (AR only) model of order $p$ is one where $Y_t$ depends only on its own lags. That is, $Y_t$ is a function of the ‘lags of $Y_t$’. ‘p’ is the auto-regressive trend parameter, the ideal value for p can be determined from an autocorrelation plot.

$ X_{t} = \alpha + \sum_{i=1}^{p} \beta_{i} X_{t-i} + \epsilon_{t}$

where,

- $X_{t-1}$ is the lag1 of the series
- $\beta_1$ is the coefficient of lag1 that the model estimates
- $\alpha$ is the intercept term, also estimated by the model
- $\epsilon_{t}$ is white noise


## Finding p (order of AR term)

You can find out the required number of AR terms by inspecting the Partial Autocorrelation (PACF) plot.

Partial autocorrelation can be imagined as the correlation between the series and its lag, after excluding the contributions from the intermediate lags. So, PACF sort of conveys the pure correlation between a lag and the series. That way, you will know if that lag is needed in the AR term or not.

The partial autocorrelation of lag (k) of a series is the coefficient of that lag in the autoregression equation of $Y$. The autoregressive equation of $Y$ is nothing but the linear regression of $Y$ with its own lags as predictors.

For Example, if $X_{t}$ is the current series and $X_{t-1}$ is the lag 1 of $X$, then the partial autocorrelation of lag 3 $X_{t-3}$ is the coefficient $\alpha_3$ of $X_{t-3}$ in the following equation:

$X_{t} = \alpha_{0} + \alpha_{1} X_{t-1} + \alpha_{2} X_{t-2} + \alpha_{3} X_{t-3}$

Any autocorrelation in a stationarized series can be rectified by adding enough AR terms. So, we initially take the order of AR term to be equal to as many lags that crosses the significance limit in the PACF plot.

The Ljung-Box test helps us check whether the lag we chose gives autocorrelations that are significantly different from zero. The null hypothesis is that the previous lags as a whole are not correlated with the current period. If the p-value is small enough (say 0.05), we can reject the null and assume that the past lags have some correlation with the current period.

## Import libraries

In [None]:
pip install statsmodels --upgrade

In [2]:
import numpy as np
from statsmodels.tsa.ar_model import AutoReg

## Load data

In [3]:
# generate random data
data = np.random.randn(100)
data.shape

(100,)

## AR Model Implementation

In [4]:
# only demonstration (summary values not important here)
model = AutoReg(data, lags=3)
model_fit = model.fit()
print(model_fit.summary()) 

                            AutoReg Model Results                             
Dep. Variable:                      y   No. Observations:                  100
Model:                     AutoReg(3)   Log Likelihood                -131.710
Method:               Conditional MLE   S.D. of innovations              0.941
Date:                Fri, 20 Aug 2021   AIC                             -0.019
Time:                        10:58:38   BIC                              0.114
Sample:                             3   HQIC                             0.035
                                  100                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
intercept     -0.0512      0.096     -0.535      0.593      -0.239       0.136
y.L1           0.1526      0.099      1.539      0.124      -0.042       0.347
y.L2          -0.2854      0.097     -2.937      0.0



## Make Prediction

In [5]:
yhat = model_fit.predict(data.shape[0], data.shape[0])
print(yhat)

[-0.455281]
