# Store Item Demand Forecasting Challenge

## Autoregressive Integrated Moving Average (ARIMA)

<a href="https://www.kaggle.com/c/demand-forecasting-kernels-only">Link to competition on Kaggle.</a>

The <a href="https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average">ARIMA</a> model is a generalisation of an ARMA model that can be applied to non-stationary time series.

In [None]:
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import statsmodels.api as sm
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller

pd.options.display.max_columns = 99
plt.rcParams['figure.figsize'] = (12, 8)

## Load Data

In [None]:
df_train = pd.read_csv('../input/train.csv', parse_dates=['date'], index_col=['date'])
df_test = pd.read_csv('../input/test.csv', parse_dates=['date'], index_col=['date'])
df_train.shape, df_test.shape

In [None]:
df_train.head()

In [None]:
num_stores = len(df_train['store'].unique())
fig, axes = plt.subplots(num_stores, figsize=(8, 16))

for s in df_train['store'].unique():
    t = df_train.loc[df_train['store'] == s, 'sales'].resample('W').sum()
    ax = t.plot(ax=axes[s-1])
    ax.grid()
    ax.set_xlabel('')
    ax.set_ylabel('sales')
fig.tight_layout();

All stores appear to show identical trends and seasonality; they just differ in scale.

## ARIMA

We will build a SARIMA model for a single store and item, and then retrain it and generate predictions for all time series in the dataset.

### Example store and item

In [None]:
s1i1 = df_train.loc[(df_train['store'] == 1) & (df_train['item'] == 1)]
s1i1.head()

In [None]:
s1i1['sales'].plot();

### Time Series Decomposition

Decompose the example time series into trend, seasonal, and residual components.


In [None]:
fig = seasonal_decompose(s1i1['sales'], model='additive', freq=365).plot()

There is clearly yearly seasonality and a non-stationary, upward trend. We can run a Dickey-Fuller test to examine the stationarity.

In [None]:
dftest = adfuller(s1i1['sales'], autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
dfoutput

The Dickey-Fuller test p-value is lower than I would have expected, but the time series is not considered stationary using a 1% Critical Value and we can see visually that there is an upwards trend.

### Take first differences

We can try to remove the trend by applying a first difference to the time series.

In [None]:
diff_1 = s1i1['sales'].diff(1)
diff_1.dropna(inplace=True)
fig = seasonal_decompose(diff_1, model='additive', freq=365).plot()

In [None]:
dftest = adfuller(diff_1, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
dfoutput

The trend has been eliminated and the Dickey-Fuller test implies that the data is now stationary. Note that there is still some evidence of seasonality, however.

### Plot ACF and PACF

The <a href="https://en.wikipedia.org/wiki/Autocorrelation">Autocorrelation Function</a> (ACF) is the correlation of a signal with a delayed copy of itself as a function of delay.

The <a href="https://en.wikipedia.org/wiki/Partial_autocorrelation_function">Partial Autocorrelation Function</a> (PACF) is the partial correlation of a signal with a delayed copy of itself, controlling for the values of the time series at all shorter delays, as a function of delay.

In [None]:
fig, ax = plt.subplots(2)
ax[0] = sm.graphics.tsa.plot_acf(diff_1, lags=50, ax=ax[0])
ax[1] = sm.graphics.tsa.plot_pacf(diff_1, lags=50, ax=ax[1])

Clearly there are seasonal patterns in the data. In this case, the ACF and PACF are too complex to infer the appropriate parameters for the ARIMA model.

### Build Model

We will implement grid search to identify the optimal parameters for our ARIMA(p,d,q) model, using the following possible values:

In [None]:
from itertools import product

ps = range(0, 7) # Up to 6 AR terms
d = 1            # Differencing is 1
qs = range(0, 7) # Up to 6 MA terms

params = product(ps, qs)
params_list = list(params)
print("Number of parameter combinations for grid search: {}".format(len(params_list)))

In [None]:
def optimiseARIMA(ts, params_list, d):
    results = []
    best_aic = np.inf
    
    for param in params_list:
        try:
            arima = sm.tsa.ARIMA(ts.astype(float), freq='D',
                                 order=(param[0], d, param[1])).fit()
        except:
            continue
        
        aic = arima.aic
        if aic < best_aic:
            best_model = arima
            best_aic = aic
            best_param = param
            
        results.append([param, arima.aic])
        
    df_results = pd.DataFrame(results)
    print(results)
    df_results.columns = ['parameters', 'aic']
    df_results = df_results.sort_values(by='aic', ascending=True).reset_index(drop=True)
    
    return df_results

In [None]:
%%time
results = optimiseARIMA(s1i1['sales'], params_list, d)

In [None]:
results.head(10)

Unsurprisingly, the more complex models have the lowest AIC values. We proceed with the ARIMA(6,1,6) model..

In [None]:
%%time
arima = sm.tsa.ARIMA(s1i1['sales'].astype(float), freq='D', order=(6, 1, 6)).fit()
print(arima.summary())

## Make Predictions

In [None]:
arima_results = df_test.reset_index()
arima_results['sales'] = 0

In [None]:
tic = time.time()

for s in arima_results['store'].unique():
    for i in arima_results['item'].unique():
        si = df_train.loc[(df_train['store'] == s) & (df_train['item'] == i), 'sales']
        try:
            arima = sm.tsa.ARIMA(si.astype(float), freq='D', order=(6, 1, 6)).fit()
        except:
            arima = sm.tsa.ARIMA(si.astype(float), freq='D', order=(2, 1, 2)).fit()
            print("ARIMA(6,1,6) failed to converge for store {} item {}. ARIMA(2,1,2) used instead.".format(s, i))
        fcst = arima.predict(start='2018-01-01', end='2018-03-31', dynamic=True)
        arima_results.loc[(arima_results['store'] == s) & (arima_results['item'] == i), 'sales'] = fcst.values
        
        toc = time.time()
        if i % 10 == 0:
            print("Completed store {} item {}. Cumulative time: {:.1f}m".format(s, i, (toc-tic)/60))

In [None]:
arima_results.drop(['date', 'store', 'item'], axis=1, inplace=True)
arima_results.head()

In [None]:
arima_results.to_csv('arima_results.csv', index=False)

### Example forecast

In [None]:
forecast = arima.predict(start='2017-10-01', end='2017-12-31', dynamic=True)
actual = df_train.loc[(df_train['store'] == 10) & (df_train['item'] == 50), 'sales']

forecast.plot()
actual.loc['2017-10-01':].plot()
plt.legend(['ARIMA', 'Actual'])
plt.ylabel('Sales');