In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# ARIMA models

__ARIMA__ models are used to forecast a time series using the series past values.

__Forescasting a time series can be broadly divided into two types__:
- if we use only the previous values of the time series to predict its future values, it is called __Univariate Time Series Forecasting__;


- if we use predictors other than the series (like exogenous variables) to forecast it is called __Multi Variate Time Series Forecasting__;

__ARIMA__ stands for __Autoregressive Integrated Moving Average__ model. It belongs to a class of models that explains a given time series based on its own past values -i.e.- its own lags and the lagged forecast errors. The equation can be used to forecast future values. Any 'non-seasonal' time series that exhibits patterns and is not a random white noise can be modeled with ARIMA models. 


So, __ARIMA__ is a forecasting algorithm based on the idea that the information in the past values of the time series can alone be used to predict the future values.


__ARIMA Models__ are specified by three order parameters (p, d, q):
- p is the order of the __AR__ term;


- q is the order of the __MA__ term;


- d is the number of differencing required to make the time series stationary;


__AR(p) Autoregression__ - a regression model that utilizes the dependent relationship between a current observation and observations over a previous period. An auto regressive (AR(p)) component refers to the use of past values in the regression equation for the time series.


__I(d) Integration__ - uses differencing of observations (subtracting an observation from an observation at the previous time step) in order to make the time series stationary. Differencing involves the subtraction of the current values of a series with its previous values d number of times.


__MA(q) Moving Average__ - a model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations. A moving average component depicts the error of the model as a combination of previous error terms. The order q represents the number of terms to be included in the model

## Types of ARIMA models

- __ARIMA__: Non-seasonal Autoregressive Integrated Moving Averages;


- __SARIMA__: Seasonal ARIMA


- __SARIMAX__: Seasonal ARIMA with exogenous variables

If a time series has seasonal patterns then we need to add seasonal terms and it becomes __SARIMA__

## The meaning of p, d and q in ARIMA model

### The meaning of p

__p__ is the order of the __Auto Regressive (AR)__ term. It refers to the number of lags of Y to be used as predictors.

### The meaning of d

- the term __Auto Regressive__ in ARIMA means it is a linear regression model that uses its own lags as predictors. Linear regression models work best when the predictors are not correlated and are independent of each other. So we need to make the time series stationary.


- the most common approach to make the series stationary is to difference it. That is, subtract the previous value from the current one. Sometimes, depending on the complexity of the series, more than one differencing may be needed.


- therefore, the value of __d__ is the minimum number of differencing needed to make the series stationary. If the time series is already stationary, then d = 0.

### The meaning of q

__q__ is the order of the __Moving Average (MA)__ term. It refers to the number of laged forecast errors that should o into the ARIMA model.

# AR and MA models

## AR model

- An __Auto Regressive (AR)__ model is one where Y_t depends only on its own lags.


- that is, Y_t is a function of the lags of Y_t. It is depicted by the following equation:


        Y_t = alpha + beta1 x Y_t-1 + beta2 x Y_t-2 + ... + betap x Y_t-p + epsilon1
        
        
where:

- Y_t-1 is the lag1 of the series,


- beta1 is the coefficient of lag1 that the model estimates, and


- alpha is the intercept term, also estimated by the model.

## MA model

- a __Moving Average (MA)__ model is one where Y_t depends only on the lagged forecast errors. It is depicted by the following equation:


        Y_t = alpha + epsilon_t + teta1 x epsilon_t-1 + teta2 x epsilon_t-2 + ... + tetaq x epsilon_t-q
        
        
where the error terms are the errors of the autoregressive models of the respective lags.


The errors epsilon_t and epsilon_t-1 are the errors from the following equations:


        Y_t = beta1 x Y_t-1 + beta2 x Y_t-2 + ... + beta0 x Y0 + epsilon_t
        
        
        Y_t-1 = beta1 x Y_t-2 + beta2 x Y_t-3 + ... + beta0 x Y0 + epsilon_t-1

## ARIMA model

- an ARIMA model is one where the time series was differenced at least once to make it stationary and we combine the AR and the MA terms. So the equation of an ARIMA model becomes:

        
        Y_t = alpha + beta1 x Y_t-1 + beta2 x Y_t-2 + ... + betap x Y_t-p x epsilon_t + teta1 x epsilon_t-1 + teta2 x epsilon_t-2 + ...+ tetaq x epsilon_t-q
        
        
__ARIMA__ model in words:
- Predicted Y_t = Constant + Linear combination Lags of Y (upto p lagas) + Linear Combination of Lagged forecast errors (upto q lags)