In [1]:
import pandas as pd
import numpy as np

## ARIMA
### AutoRegressive Integrated Moving Average

Arima is not capable of perfectly predicting any time series data. For example, stock price data has so many outside factors. ARIMA performs very well when working with a time series that is directly related to the time stamp (clear growth and seasonality)

ARIMA model is a generalization of an autoregressive moving average (ARMA) model. Both of these models are fitted to time series data either to better understand the data or to predict future points in the series.

Non Seasonal ARIMA models are generally denoted ARIMA $(p,d,q)$ where parameters $p,d$ and $q$ are positive intergers
ARIMA models are appiled in some cases where data show evidence of non-stationarity, where an differencing step("integrated" part of the model) can be applied one or more times to eliminate the non-stationarity.

### Parts of ARIMA model
<strong>ARIMA</strong>, or <em>Autoregressive Integrated Moving Average</em> is actually a combination of 3 models:
* <strong>AR($p$)</strong> Autoregression - a regression model that utilizes the dependent relationship between a current observation and observations over a previous period
* <strong>I($d$)</strong> Integration - uses differencing of observations (subtracting an observation from an observation at the previous time step) in order to make the time series stationary
* <strong>MA($q$)</strong> Moving Average - a model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.

 ### Stationarity

To effectively use ARIMA, we need to understand Stationarity in our data. A Staionary data set will allow our model to predict that mean and variance will be the same in future periods. We can use Augmented Dickey-Fuller test to test stationarity of data.If you've determined your data isn't stationary, you will then need to transform it to be stationary in order to evaluate it and what type of ARIMA terms you will use.

### Differencing
Non-stationary data can be made to look stationary through <em>differencing</em>. A simple method called <em>first order differencing</em> calculates the difference between consecutive observations.

&nbsp;&nbsp;&nbsp;&nbsp;$y^{\prime}_t = y_t - y_{t-1}$

In this way a linear trend is transformed into a horizontal set of values. You can continue differencing until you reach stationarity, However each differencing step comes at the cost of losing a row of data.


In [4]:
df = pd.read_csv('../../Data/samples.csv',index_col=0,parse_dates=True)
df['d1b'] = df['b'] - df['b'].shift(1)

df[['b','d1b']].head()

Unnamed: 0,b,d1b
1950-01-01,27,
1950-02-01,22,-5.0
1950-03-01,17,-5.0
1950-04-01,15,-2.0
1950-05-01,13,-2.0


## Autoregression Model AR($p$)

In an autoregression model, we forecast using linear combination of past values of the variable. The term autoregression describes a regression of the variable against itself. An autoregression is run against a set of lagged values of order $p$. The AR model specifies that the output vairable depends linearly on its own previous values and on an imperfectly predictable term.
### $y_{t} = c + \phi_{1}y_{t-1} + \phi_{2}y_{t-2} + \dots + \phi_{p}y_{t-p} + \varepsilon_{t}$

where $c$ is a constant, $\phi_{1}$ and $\phi_{2}$ are lag coefficients up to order $p$, and $\varepsilon_{t}$ is white noise.

## Choosing ARIMA Orders

We will discuss the best way to figure out what $p,d,q$ and $P,D,Q$ values to use for ARIMA based models. Our main priority here is to try to figure out the orders for the AR and MA components and if we need to difference our data ($I$ component)

If the autocorrelation plot is positive at the first lag (lag-1), then it suggests to use the AR terms in relation to the lag. If is it negative at the first lag, then it sugests using MA terms. 

* $p$: The number of lag observations included in the model.
* $d$: The number of thimes that the raw observations are differenced.
* $q$: The size of the moving average window, also called the order of. moving average.

Typically a sharp drop after lag "k" suggests an AR-k model should be used. If there is a gradual decline, it suggests an MA model.

* Identification of an AR model is often best done with the *PACF*.
* Identification of an MA model is often best done with the *ACF* rather than the *PACF*.

It can be verry difficult to read these plots, so it is often more effective to perform a grid search across various combination of $p,d,q$ values.

The pmdarima (Pyramid ARIMA) is a separate library designed to perform grid searches across multiple combinations of $p,d,q$ and $P,D,Q$. This is by far the most effective way to get good fitting models. The pmdarima library utilizes the *AIC* as a metric to compare the performance of various ARIMA based models. When comparing models we want to minimize the *AIC* value.

Suppose that er have statistical model of some data. Let $k$ be the number of estimated parameters in the model. Let $L$ be the maximun value of the likelihood function for the model.

$AIC = 2k - 2In(\hat L)$

# Descriptive Statisic and Test

## Tests for Stationarity 

To determine whether a series is stationarywe can use the augmented Dickey-Fuller Test. This performs a test in the form of a classic null hypothesis test and returns a p value.

### Dickey-Fuller Test 
* In this test the null hypothesis states that $\phi$=1 (this is also called a unit test). 
* If $p$ value is high (>0.05) we <strong>fail to reject</strong> the null hypothesis

### Granger Causality Test 
* The Granger causality test is a hypothesis test to determine if one time series is useful in forecasting another.
* While it is fairly easy to measure correlations between series it's another thing to observe changes in one series correlated to changes in another after a consistent amount of time.
* This test is used to see if there is an indication of causality, but keep in mind, it could always be some outside factor unaccounted for

## Evaluating Forecast

### AIC - Akaike Information Criterion
* The AIC evaluates a collection of models and estimates the quality of each model <strong>relative</strong> to the others.
* **Penalties** are provided for the <strong>number of parameters</strong> used in an effort to prevent overfitting

### BIC - Bayesian Information Criterion
Very simillar to AIC, just the mathematics behind the model comparisions utilize a Beyesian approach