<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Seasonal-arima-forecasting lab

---


### Lab Guide

- [Load the European retail data](#load-the-european-retail-data)
- [Decompose the timeseries and plot](#decompose-the-timeseries-and-plot)
- [Perform differencing and seasonal differencing of the time series.](#take-a-second-order-difference-of-the-retail-timeseries)
- [Dickey-Fuller test of stationarity](#dickey-fuller-test-of-stationarity)
- [Seasonal ARIMA model (SARIMAX)](#seasonal-arima-with-additional-predictors-sarimax)
- [Forecast using the SARIMAX model](#forecast-using-the-sarimax-model)

In [1]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(font_scale=1.5)
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

<a id="statsmodels-timeseries-tools"></a>
## Statsmodels timeseries tools
---


In [2]:
# this will filter out a lot of future warnings from statsmodels
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)


import statsmodels.api as sm  
from statsmodels.tsa.stattools import acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.seasonal import seasonal_decompose

<a id="load-the-european-retail-data"></a>
## Load the European retail data
---

Create an index representing each quartal.

In [3]:
df = pd.read_csv('../../../../resource-datasets/european_retail_trade/euretail.csv')
df = df.set_index(['Year'])
df.head()

Unnamed: 0_level_0,Qtr1,Qtr2,Qtr3,Qtr4
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1996,89.13,89.52,89.88,90.12
1997,89.19,89.78,90.03,90.38
1998,90.27,90.77,91.85,92.51
1999,92.21,92.52,93.62,94.15
2000,94.69,95.34,96.04,96.3


In [4]:
df_stacked = df.stack()
df_stacked.head()

Year         
1996     Qtr1    89.13
         Qtr2    89.52
         Qtr3    89.88
         Qtr4    90.12
1997     Qtr1    89.19
dtype: float64

<a id="decompose-the-timeseries-and-plot"></a>
## Decompose the timeseries and plot
---

Which frequency would you choose for the seasonality?

<a id="take-a-second-order-difference-of-the-retail-timeseries"></a>
## Perform differencing and seasonal differencing of the time series.

Plot the ACF and PACF for various combinations of differencing steps.

In [5]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

<a id="dickey-fuller-test-of-stationarity"></a>
## Dickey-Fuller test of stationarity
---

[Perform an (augmented) Dickey-Fuller test of stationarity](https://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test) to evaluate whether or not the timeseries (or the differenced versions you created) are stationary.

In [6]:
from statsmodels.tsa.stattools import adfuller

<a id="seasonal-arima-with-additional-predictors-sarimax"></a>
## Seasonal ARIMA model (SARIMAX)
---

#### Fit a seasonal ARIMA model.

**Plot the residuals of the SARIMAX model.**

**Plot the ACF and PACF of the residuals.**

What should we be expecting from the ACF and PACF of our residuals if the model is good?

**Increase the order of the SARIMAX model.**

How do the results change?

<a id="forecast-using-the-sarimax-model"></a>
## Forecast using the SARIMAX model

Forecast 12 additional timepoints and plot them.