# Intro to Forecasting methodologies

How are we going to do that? By 
1. splitting our sales data into a training and testing set (let’s say Oct 2018 - Feb 2019 for training and Mar - Apr 2019 for testing)
2. trying a few forecasting methodologies and asking them to predict what would happen Mar - Apr 2019,
3. scoring their predictions against the testing set and
4. picking the winner (the one with the lowest error).

Then that model will give us a forecast for the future (May and onwards).

In [1]:
# imports
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
from pylab import rcParams
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf
from sklearn.metrics import mean_squared_error
from math import sqrt
from statsmodels.tsa.api import Holt
from statsmodels.tsa.arima_model import ARIMA
from multiprocessing import cpu_count
from joblib import Parallel
from joblib import delayed
from warnings import catch_warnings
from warnings import filterwarnings

## What is forecasting?

Forecasting is a special category of predictive modeling. <br>
Traditional machine learning predicts A (let's say iced coffee sales) from B (temperature). <br>
Forecasting predicts A from A – future coffee sales from past coffee sales. This is only possible if past A correlates with present A (aka, there's autocorrelation).

How does forecasting work?
* We pick a form to describe the data (for example, a line if there's steady sales)
* Then find the best parameters to fit the form to the data (the slope to the line, for instance).

So let's start by checking out our data - do we have autocorrelation and if so, what form might we fit through our data?

In [None]:

series_s = pd.read_csv('data/TotalSales_2019-4-19_1621.csv', parse_dates=['query_date'], index_col=['query_date'])



# set tran test split
n_test = 49

In [2]:
# load dataset
df = pd.read_table('data/birth.txt')
birth = df.set_index(pd.DatetimeIndex(start='1/1/1980', end='12/31/2010', freq='M'))
birth.head()

# get values
data_s = birth.values

FileNotFoundError: [Errno 2] No such file or directory: 'data/birth.txt'

In [None]:
# let's make the plots bigger
rcParams['figure.figsize'] = 18, 6

### Birth rates: autocorrelation and decomposition

In [None]:
# plot autocorrelation
plot_acf(series_s, lags= 20, alpha=0.05)
plt.show()

In [None]:
# additive seasonal decompose
seasonal_decompose(series_s, model='additive').plot()
plt.show()

In [None]:
# then multiplicative
seasonal_decompose(series_s, model='multiplicative').plot()
plt.show()

### Exponential Smoothing

The first family of models we tried are called Holt-Winters or exponential smoothing. 

Holt-Winters has several versions:
* simple exponential smoothing for level line only
* double (aka Holt) for level + trend (up and down)
* triple for level + seasonality
* Holt-Winters for level + trend + seasonality.

Here I used Holt with multiplicative, and did gridsearch whether the trend needs to be damped or not.

### ARIMA

The other family of forecasting methodologies is called Box-Jenkins or ARIMA. It is basically based on autocorrelation. The idea is that if we can figure out a way to remove all autocorrelation (let's say, we take steps 1, 2, 3 and then land on a dataset that is just noise), then we can take today's iced coffee sales, apply 3, 2, 1 and get tomorrow's iced coffee sales.

The full name is Autoregressive Integrated Moving Average - it refers to the various steps you take (the number of time lags of the autoregressive model, the number of times the data have had past values subtracted in the integrated part, and the size of averages in the moving-average model). Just like Holt-Winters, Box-Jenkins has a whole bunch of versions - just AR, ARIMA, SARIMA, etc.