# <center> Australia's Beer Production Forecast </center>
The goal is to <i> provide a forecast of monthly Australian beer production for the year 1996 using only the https://www.kaggle.com/sergiomora823/monthly-beer-production data with a verbal summarization of the forecast and a comment on what was done, why, and how the final forecast was made. </i> <br><br>
But first, missing libraries (Pytorch Forecasting, Pystand and Prophet) have to be installed and all the necessary modules loaded.

In [None]:
%%capture
# Installing missing libraries
!pip install pytorch_forecasting
!pip install pystan==2.19.1.1
!pip install prophet

# Loading modules
import pandas as pd
import numpy as np

from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.filters.hp_filter import hpfilter
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.tsa.stattools import adfuller
import statsmodels.api as sm

import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping
import torch

from pytorch_forecasting import Baseline, NBeats, TimeSeriesDataSet
from pytorch_forecasting.data import NaNLabelEncoder
from pytorch_forecasting.data.examples import generate_ar_data
from pytorch_forecasting.metrics import SMAPE

from prophet import Prophet

from sklearn.metrics import mean_squared_error

from scipy.stats import boxcox
from scipy.special import inv_boxcox

import plotly.graph_objects as go
from plotly.subplots import make_subplots

import matplotlib.pyplot as plt 
%matplotlib inline

from IPython.core.display import display, HTML
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

## <center> Data Loading and Exploration </center>
First it is crucial to inspect the data and investigate whether:
1. There are no missing data, cleary wrong data or extreme outliers
2. The data are in a format compatible with statistical or ML packages
3. The data are or are not stationary
4. There is or is not a seasonality in the data
5. There is or is not not a trend in the data
6. The data has to be transformed 
7. What is the overall nature of data (data types, number of variables etc.)

In [None]:
# Loading data
data = pd.read_csv("/kaggle/input/monthly-beer-production/datasets_56102_107707_monthly-beer-production-in-austr.csv")

def display_side_by_side(dfs:list, captions:list):
    """Display tables side by side to save vertical space
    Input:
        dfs: list of pandas.DataFrame
        captions: list of table captions
    """
    output = ""
    combined = dict(zip(captions, dfs))
    for caption, df in combined.items():
        output += df.style.set_table_attributes("style='display:inline'").set_caption(caption)._repr_html_()
        output += "\xa0\xa0\xa0"
    display(HTML(output))
    
# Displaying head, tail and descriptive statistics of data
display_side_by_side([data.head(), data.tail(), data.describe()], ['<b>First 5 rows</b>', '<b>Last 5 rows</b>', '<b>Descriptive Statistics</b>'])

In [None]:
# Showing number of missing values
print(f"Missing data: {data.isnull().sum().sum()}","\n")

# Rows and columns
print(f"Data shape: {data.shape}","\n")

# Column names
print(data.columns,"\n")

# Data types
print(data.dtypes)

# Data info
# print(data.info())

The dataframe has 476 rows, one per month ranging from 1956-01 to 1995-08, and only two colums, date and monthly beer production. There are no missing data observed in the dataset. Monthly beer production is a 64-bit float datatype which does not need changing. Date is in format "Y%-%M" and is currently stored as a string object, not a datetime object, and thus requires reformatting for further easier use.

In [None]:
# Data manipulation, setting month an index with datetime format and monthly frequency, renaming columns
data["Month"] = pd.to_datetime(data["Month"])
data.rename(columns={"Month":"month","Monthly beer production":"beer_prod"}, inplace=True)
data.set_index('month', inplace=True)
data.index.freq = "MS"

In [None]:
# First data exploration plot
data_exploration_plot = go.Figure()

data_exploration_plot.add_trace(go.Scatter(x=data.index, y=data["beer_prod"],
                                mode='lines',
                                name='Monthly Beer Production'))

data_exploration_plot.update_yaxes(showline=True, linewidth=1, linecolor='black', gridcolor='black')

data_exploration_plot.update_layout(
                    plot_bgcolor = "rgba(0,0,0,0)",
                    autosize=True,
                    xaxis_title="Date",
                    yaxis_title="Beer Production",
                    title={
                        'text': "Monthly Beer Production",
                        'y':0.9,
                        'x':0.5,
                        'xanchor': 'center',
                        'yanchor': 'top'})

data_exploration_plot.show()

From the visual inspection of the data, several observations can be drawn: <br>
1. There seems to be seasonality in the data on a yearly basis <br>
    1.1 The production seems to be highest during Nov-Jan period and lowest during June-August period. This makes sense since it is reasonable to expect the production of beer to be higher during hot months (Australian's summer is during Nov-Jan) and lower during cold months (Australian winter)
2. Data seem not to be stationary
3. There seems to be a non-linear trend in the data
4. The variance seems to be increasing with time <br>

Let's investigate further using seasonal decomposing using moving averages!

In [None]:
# Running seasonal decompostion
season_decomp_result = seasonal_decompose(data["beer_prod"], model="additive")

# Making seasonl decomposition plotly chart
season_trend_resid_plot = make_subplots(rows=4, cols=1)

season_trend_resid_plot.add_trace(go.Scatter(x=data.index, y=data["beer_prod"],
                                name="Raw Data",
                                mode='lines'),
                                 row=1,
                                 col=1)

season_trend_resid_plot.add_trace(go.Scatter(x=data.index, y=season_decomp_result.trend,
                                name="Trend",
                                mode='lines'),
                                 row=2,
                                 col=1)

season_trend_resid_plot.add_trace(go.Scatter(x=data.index, y=season_decomp_result.seasonal,
                                name="Seasonality",
                                mode='lines'),
                                 row=3,
                                 col=1)


season_trend_resid_plot.add_trace(go.Scatter(x=data.index, y=season_decomp_result.resid,
                                name="Residuals",
                                mode='lines'),
                                 row=4,
                                 col=1)

season_trend_resid_plot.update_yaxes(showline=True, linewidth=1, linecolor='black', gridcolor='black')

season_trend_resid_plot.update_layout(
                                        plot_bgcolor = "rgba(0,0,0,0)",
                                        autosize=True,
                                        title={
                                            'text': "Seasonal Decomposition",
                                            'y':0.9,
                                            'x':0.5,
                                            'xanchor': 'center',
                                            'yanchor': 'top'})

season_trend_resid_plot.show()

The seasonal decomposition provides further evidence for the beforementioned observation as it clearly separates non-linear trend, seasonality and non-uniformly distributed residuals. Although it seems now very likely even from visual inspection that the data are not stationary, it is desirable to conduct Augmented Dickey-Fuller test to evaluate whether the time series does have a unit-root (it is not stationary) with some degree of confidence. Raw data as well as natural logarithm of raw data and Box Cox transformation of the raw data will be tested.

In [None]:
def test_stationarity(timeseries, cutoff = 0.05):
    #Perform Dickey-Fuller test:
    print('Results of Dickey-Fuller Test:')
    dftest = adfuller(timeseries)
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for key,value in dftest[4].items():
        dfoutput['Critical Value (%s)'%key] = value
    pvalue = dftest[1]
    if pvalue < cutoff:
        print('p-value = %.4f. The series is likely stationary.' % pvalue)
    else:
        print('p-value = %.4f. The series is likely non-stationary.' % pvalue)
    
    print(dfoutput)
    print("\n")
    
# Log and boxcox transforms
data["beer_prod_log"] = np.log(data["beer_prod"]) # log transform
data["beer_prod_box_cox"], lam = boxcox(data["beer_prod"]) # lam stores lambda param for inverse boxcox transformation    
    
# Train-test split with 6 years of test data
data_train = data.iloc[:-72]
data_test = data.iloc[-72:]
    
# Testing stationarity
print("Raw Data - Beer production")
test_stationarity(data["beer_prod"])
print("Natural Log Transformed Data - Beer production")
test_stationarity(data["beer_prod_log"])
print("Box Cox Transformed Data - Beer production")
test_stationarity(data["beer_prod_box_cox"])

From the results of Dickey-Fuller tests we can draw a conclusion that at the 95% confidence level neither of the transformations of beer production monthly data is stationary, although it seems like the natural log transformation have had the at least contributed to making the data closer to stationarity. Further visual inspection of the data might provide us with more clues.

In [None]:
# Plotting all 3 transformations of target variable
data_exploration_trans_plot = make_subplots(specs=[[{"secondary_y": True}]])

data_exploration_trans_plot.add_trace(go.Scatter(x=data.index, y=data["beer_prod"],
                                mode='lines',
                                name='Beer Production'),
                                secondary_y=False)

data_exploration_trans_plot.add_trace(go.Scatter(x=data.index, y=data["beer_prod_log"],
                                mode='lines',
                                name='Beer Production - Log '),
                                secondary_y=True)

data_exploration_trans_plot.add_trace(go.Scatter(x=data.index, y=data["beer_prod_box_cox"],
                                mode='lines',
                                name='Beer Production - BoxCox'),
                                secondary_y=False)

data_exploration_trans_plot.update_yaxes(showline=True, linewidth=1, linecolor='black', gridcolor='black')

data_exploration_trans_plot.update_layout(
                            plot_bgcolor = "rgba(0,0,0,0)",
                            autosize=True,
                            xaxis_title="Date",
                            yaxis_title="Beer Production",
                            title={
                                'text': "Monthly Beer Production",
                                'y':0.9,
                                'x':0.5,
                                'xanchor': 'center',
                                'yanchor': 'top'})

data_exploration_trans_plot.show()

Based on the visual inspection of the chart, it seems that the log transformation of the data had an effect on reducing the unequal distribution of variance across the time series span. Although it is very common to use natural logarithm of time series when trying to model it, I will avoid this due to several reason:
1. Even after log transformation, the data are not stationary and as (Lütkepohl & Xu, 2012) suggest, using log transforms of time series when not achieving stationarity might have negative impact on precision of the forecast. For more info see the article: Lütkepohl, H., Xu, F. 2012. <i>The role of the log transformation in forecasting economic variables.</i> Empir Econ 42, 619–638 . https://doi.org/10.1007/s00181-010-0440-1
2. Several of the models used contain build-in means of making the data stationary (such as differencing) or provide other means of transforming the data.
3. I have empirically tested whether log transformation have a positive impact on forecast precision performance of the models used and did not find any substantial positive effects (for the sake of efficiency I avoid duplicating results of all the analysis as it would almost double the range of an already lengthy notebook)

## <center> Searching for the best model </center>

Due to the nature of data, four models will be tested and evaluated against each other. First two are common statistical models used for analyzing time series data with seasonality and trend, these are <b><i>Holt-Winters Exponential Smoothing</i></b> and <b><i>SARIMA (Seasonal Autoregressive Integrated Moving Average)</i></b>. <b><i>Facebook's Prophet </b></i> algorithm will also be fit to the data as it generally provides good performance on univeriate time series data with seasonal compontents. Moreover, <b><i>N-BEATS (Neural basis expansion analysis for interpretable time series forecasting)</i></b>, deep learning based model that achieved state-of-art results in several univariate time series forecasting competitions will be fit to the data.

#### <b><center>Test set evaluation methodology</center></b>
Since the <b>goal is to forecast data that are 16 months ahead</b> from the last observation with the highest possible precision, the models are evaluated based on their performance on 16 steps ahead forecast. For all models, 4 precision metrics are calculated and averaged across all 16 time points subsets of the test set predictions. All of the models except for N-BEATS are re-estimated after each forecast with +1 time point. That way, the ability to forecast 16 steps ahead is always based on the maximum available data, while keeping the train dataset for initial model selection and training. For N-BEATS, the neural network weights are estimated only once. All models, although very different in terms of complexity and statistical background, are compared on the same set of fit metrics. 
These are:  <br>
* Mean Squared Error (MSE)
* Root Mean Squared Error (RMSE)
* Mean Absolute Percentage Error (MAPE)
* Symmetric absolute percentage Error (SMAPE)

In [None]:
# Defining functions for evaluations
def mape(y_true, y_pred): 
    # Mean absolute percentage error
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

def smape(y_true, y_pred):
    # Symmetric mean absolute percentage error
    denominator = (np.abs(y_true) + np.abs(y_pred))
    diff = np.abs(y_true - y_pred) / denominator
    diff[denominator == 0] = 0.0
    return 200 * np.mean(diff)

### <center> Holt-Winters Exponential Smoothing </center>

First, Holt-Winters Exponential Smoothing model is implemented as it takes into account both trend and seasonality that are present in the current data. Both additive and multiplicative methods are calculated and their performance is compared.

In [None]:
%%capture
# Additive
tripple_additive_holt_winters = ExponentialSmoothing(data_train["beer_prod"], trend="add", seasonal="add")
tripple_additive_holt_winters_fitted = tripple_additive_holt_winters.fit()

# Multiplicative
tripple_multiplicative_holt_winters = ExponentialSmoothing(data_train["beer_prod"], trend="mul", seasonal="mul")
tripple_multiplicative_holt_winters_fitted = tripple_multiplicative_holt_winters.fit()

In [None]:
# Chart Holt-Winters Exponential Smoothing
hw_plot = go.Figure()

hw_plot.add_trace(go.Scatter(x=data_train.index, y=data_train["beer_prod"].values,
                                mode='lines',
                                name='Monthly Beer Production'))

hw_plot.add_trace(go.Scatter(x=data_train.index, y=tripple_additive_holt_winters_fitted.fittedvalues.values,
                                mode='lines',
                                name='Additive Trend + Seasonality'))

hw_plot.add_trace(go.Scatter(x=data_train.index, y=tripple_multiplicative_holt_winters_fitted.fittedvalues.values,
                                mode='lines',
                                name='Multiplicative Trend + Seasonality'))

hw_plot.update_yaxes(showline=True, linewidth=1, linecolor='black', gridcolor='black')

hw_plot.update_layout(
                    plot_bgcolor = "rgba(0,0,0,0)",
                    autosize=True,
                    xaxis_title="Date",
                    yaxis_title="Beer Production",
                    title={
                        'text': "Holt-Winters Exponential Smoothing Train Data Prediction",
                        'y':0.9,
                        'x':0.5,
                        'xanchor': 'center',
                        'yanchor': 'top'})

hw_plot.show()

As the visualisation suggests, both additive and multiplicative models generally follow the pattern of the train data. Additive model tends to slightly overestimate both highs and lows at the beginning of time series. Bellow, I calculate the fit statistics for both models on the train data.

In [None]:
# Additive Train Metrics
ad_hwes_mse = mean_squared_error(y_true=data_train["beer_prod"].values, 
                                 y_pred=tripple_additive_holt_winters_fitted.fittedvalues.values)
ad_hwes_rmse = mean_squared_error(y_true=data_train["beer_prod"].values,
                                  y_pred=tripple_additive_holt_winters_fitted.fittedvalues.values,
                                  squared=False)
ad_hwes_mape = mape(y_true=data_train["beer_prod"].values,
                    y_pred=tripple_additive_holt_winters_fitted.fittedvalues.values)
ad_hwes_smape = smape(y_true=data_train["beer_prod"].values,
                      y_pred=tripple_additive_holt_winters_fitted.fittedvalues.values)

# Multiplicative Train Metrics
ml_hwes_mse = mean_squared_error(y_true=data_train["beer_prod"].values, 
                                 y_pred=tripple_multiplicative_holt_winters_fitted.fittedvalues.values)
ml_hwes_rmse = mean_squared_error(y_true=data_train["beer_prod"].values,
                                  y_pred=tripple_multiplicative_holt_winters_fitted.fittedvalues.values,
                                  squared=False)
ml_hwes_mape = mape(y_true=data_train["beer_prod"].values,
                    y_pred=tripple_multiplicative_holt_winters_fitted.fittedvalues.values)
ml_hwes_smape = smape(y_true=data_train["beer_prod"].values,
                      y_pred=tripple_multiplicative_holt_winters_fitted.fittedvalues.values)

print(f"Additive HWES Train Fit: MSE: {ad_hwes_mse:.3f}, RMSE: {ad_hwes_rmse:.3f}, MAPE: {ad_hwes_mape:.3f}, SMAPE: {ad_hwes_smape:.3f}")
print(f"Multiplicative HWES Train Fit: MSE: {ml_hwes_mse:.3f}, RMSE: {ml_hwes_rmse:.3f}, MAPE: {ml_hwes_mape:.3f}, SMAPE: {ml_hwes_smape:.3f}")

According to the four test statistics, the multiplicative model seems to be able to model the train data slightly better, altough the difference is not substantial. However, it is important to evaluate how will they perform on the test dataset with 16 points-ahead forecast.

In [None]:
def iterate_hw_over_evals(data_train_hw, data_test_hw, target, method):
    """
    Function iterates over the whole testing datasets
        1) Makes out of sample prediction of length 16
        2) Saves the model performance on the current iteration 
        3) Moves one time step ahead and appends one new datapoint to the train_data
        4) Refits the model and repeats steps 1-3 until the end of the test dataset is reached
        5) Calculates average MSE, RMSE, MAPE and SMAPE metrics
        6) Optionally returns all predictions with corresponding true values for further inspection
    """
    list_of_reals = []
    list_of_forecasts = []
    mse_list = []
    rmse_list = []
    mape_list = []
    smape_list = []
    
    for x in range(len(data_test_hw)-16):
        if x == 0:
            holt_winters_model = ExponentialSmoothing(data_train_hw[target], trend=method, seasonal=method)
            holt_winters_model_fit = holt_winters_model.fit()
            holt_winters_forecast_values = holt_winters_model_fit.forecast(16).values
            
            list_of_reals.append(data_test[target].values[x:x+16])
            list_of_forecasts.append(holt_winters_forecast_values)

            mse_list.append(mean_squared_error(list_of_reals[x], list_of_forecasts[x]))
            rmse_list.append(mean_squared_error(list_of_reals[x], list_of_forecasts[x], squared=False))
            mape_list.append(smape(list_of_reals[x], list_of_forecasts[x]))
            smape_list.append(mape(list_of_reals[x], list_of_forecasts[x]))
            
        else:
            data_train_ar = data_train_hw.append(data_test_hw.iloc[x-1])
            
            holt_winters_model = ExponentialSmoothing(data_train_hw[target], trend=method, seasonal=method)
            holt_winters_model_fit = holt_winters_model.fit()
            holt_winters_forecast_values = holt_winters_model_fit.forecast(16).values
            
            list_of_reals.append(data_test_hw[target].values[x:x+16])
            list_of_forecasts.append(holt_winters_forecast_values)
            
            mse_list.append(mean_squared_error(list_of_reals[x], list_of_forecasts[x]))
            rmse_list.append(mean_squared_error(list_of_reals[x], list_of_forecasts[x], squared=False))
            mape_list.append(smape(list_of_reals[x], list_of_forecasts[x]))
            smape_list.append(mape(list_of_reals[x], list_of_forecasts[x]))
            
    avg_mse = np.array(mse_list).mean()
    avg_rmse = np.array(rmse_list).mean()
    avg_mape = np.array(mape_list).mean()
    avg_smape = np.array(smape_list).mean()
        
    return list_of_reals, list_of_forecasts, avg_mse, avg_rmse, avg_mape, avg_smape

In [None]:
%%capture
# Calculating fit statistics for both additive and multiplicative model on test data without saving the prediction time series results
# Additive
_, _, avg_mse_hw_ad, avg_rmse_hw_ad, avg_mape_hw_ad, avg_smape_hw_ad = iterate_hw_over_evals(data_train_hw=data_train,
                                                                                       data_test_hw=data_test,
                                                                                       target="beer_prod",
                                                                                       method="add")
# Multiplicative
_, _, avg_mse_hw_ml, avg_rmse_hw_ml, avg_mape_hw_ml, avg_smape_hw_ml = iterate_hw_over_evals(data_train_hw=data_train,
                                                                                       data_test_hw=data_test,
                                                                                       target="beer_prod",
                                                                                       method="mul")

In [None]:
print(f"Additive HWES Test Fit - AVG MSE: {avg_mse_hw_ad:.3f}, AVG RMSE: {avg_rmse_hw_ad:.3f}, AVG MAPE: {avg_mape_hw_ad:.3f}, AVG SMAPE: {avg_smape_hw_ad:.3f}")
print(f"Multiplicative HWES Test Fit - AVG MSE: {avg_mse_hw_ml:.3f}, AVG RMSE: {avg_rmse_hw_ml:.3f}, AVG MAPE: {avg_mape_hw_ml:.3f}, AVG SMAPE: {avg_smape_hw_ml:.3f}")

On training data, the multiplicative model slightly outperforms the additive model, but on the test dataset, additive model performs much better than the multiplicative model. Therefore, additive model is used as a HWES model for the final comparison and prediction.

In [None]:
# Final Forecast of Holt-Winters Exponential Smoothing Model
final_holt_winters_model = ExponentialSmoothing(data["beer_prod"],trend="add", seasonal="add")
final_holt_winters_model_fitted = final_holt_winters_model.fit()
final_holt_winters_forecast = final_holt_winters_model_fitted.forecast(16).values

### <center> SARIMA </center>
The second suitable forecasting model for the current data is SARIMA - Autoregressive Integrated Moving Average model with seasonal component. It contains two components - trend and seasonal, and 7 hyperparameters to tune.
* p: Trend autoregression order.
* d: Trend difference order.
* q: Trend moving average order.
* P: Seasonal autoregressive order.
* D: Seasonal difference order.
* Q: Seasonal moving average order.
* m: The number of time steps for a single seasonal period. <br>

Although it is possible to try to base the (p,d,q) and (P,D,Q,s) parameters on some clues such as autocorrelation and partial autocorrelation plots as well as knowledge about data and it's seasonality, due to the interactions between parameters and associated relationships' complexity (and thanks to the available computational power), it is often useful to perform some variation of the grid search of the best performing combinations of parameters.

In [None]:
# Defining functions for grid search and evaluation of models on test data
def find_optimal_arima(data, target, p_list=[0], d_list=[0], q_list=[0], P_list=[0], D_list=[0], Q_list=[0], s_list=[0], top_perfomers=3):
    """
    Iterates over lists of parameters and calculates ARIMA/SARIMA with all their combinations.
    Fit statistics for all the models are calculated and top N performing parameter combinations for each of the fit statistics are returned.
    The models are not duplicated so in case that top 4 models would be searched for and 4 models would perform the best on all these fit statistics,
    only 4 combination of parameters would be returned
    """
    all_models_params = []
    all_models_mse = []
    all_models_rmse = []
    all_models_mape = []
    all_models_smape =[]
    best_models = []
    
    x = 0
    
    for p in p_list:
        for d in d_list:
            for q in q_list:
                for P in P_list:
                    for D in D_list:
                        for Q in Q_list:
                            for s in s_list:
                                
#                                 total_iters = len(p_list) * len(d_list) * len(q_list) * len(P_list) * len(D_list) * len(Q_list) * len(s_list)
#                                 x += 1
#                                 print(f"Running iteration {x} out of total {total_iters}.")
                                
                                try:
                                    arima_result = ARIMA(endog=data[target].values, order=(p,d,q), seasonal_order=(P,D,Q,s), dates=data.index)
                                    arima_result_fit = arima_result.fit()
                                    predicted = arima_result_fit.predict()
                                    real = data[target].values

                                    mse_result = mean_squared_error(real, predicted)
                                    rmse_result = mean_squared_error(real, predicted, squared=False)
                                    mape_result = mape(real, predicted)
                                    smape_result = smape(real, predicted)

                                    all_models_params.append(f"order={p,d,q},seasonal_order={P,D,Q,s}")
                                    all_models_mse.append(mse_result)
                                    all_models_rmse.append(rmse_result)
                                    all_models_mape.append(mape_result)
                                    all_models_smape.append(smape_result)
                                except:
#                                     print(f"Passing on iteration {x} due to an error")
                                    pass
                                    
    top_mse = np.argsort(all_models_mse)[:top_perfomers]
    top_rmse = np.argsort(all_models_rmse)[:top_perfomers]
    top_mape = np.argsort(all_models_mape)[:top_perfomers]
    top_smape = np.argsort(all_models_smape)[:top_perfomers]                
                                    
    top_indices = np.unique(np.concatenate([top_mse,top_rmse,top_mape,top_smape]))
    
    for q in top_indices:
    
        best_models.append({all_models_params[q]:[f"mse:{all_models_mse[q]:.3f}",f"rmse:{all_models_rmse[q]:.3f}",f"mape:{all_models_mape[q]:.3f}",f"smape:{all_models_smape[q]:.3f}"]})
    
    return best_models


def iterate_sarima_over_evals(data_train_ar, data_test_ar, target, arima_order, arima_seasonal_order):
    """
    Function iterates over the whole testing datasets
        1) Makes out of sample prediction of length 16
        2) Saves the model performance on the current iteration 
        3) Moves one time step ahead and appends one new datapoint to the train_data
        4) Refits the model and repeats steps 1-3 until the end of the test dataset is reached
        5) Calculates average MSE, RMSE, MAPE and SMAPE metrics
        6) Optionally returns all predictions with corresponding true values for further inspection
    """
    list_of_reals = []
    list_of_forecasts = []
    mse_list = []
    rmse_list = []
    mape_list = []
    smape_list = []
    
    for x in range(len(data_test_ar)-16):
        if x == 0:
            sarima_eval_model = ARIMA(data_train_ar[target], order=arima_order, seasonal_order=arima_seasonal_order, dates=data_train_ar.index)
            sarima_eval_model_fit = sarima_eval_model.fit()
            sarima_forecast_values = sarima_eval_model_fit.forecast(steps=16).values
            
            list_of_reals.append(data_test_ar[target].values[x:x+16])
            list_of_forecasts.append(sarima_forecast_values)

            mse_list.append(mean_squared_error(list_of_reals[x], list_of_forecasts[x]))
            rmse_list.append(mean_squared_error(list_of_reals[x], list_of_forecasts[x], squared=False))
            mape_list.append(smape(list_of_reals[x], list_of_forecasts[x]))
            smape_list.append(mape(list_of_reals[x], list_of_forecasts[x]))
            
        else:
            data_train_ar = data_train_ar.append(data_test_ar.iloc[x-1])
            
            sarima_eval_model = ARIMA(data_train_ar[target], order=arima_order, seasonal_order=arima_seasonal_order, dates=data_train_ar.index)
            sarima_eval_model_fit = sarima_eval_model.fit()
            sarima_forecast_values = sarima_eval_model_fit.forecast(steps=16).values
            
            list_of_reals.append(data_test_ar[target].values[x:x+16])
            list_of_forecasts.append(sarima_forecast_values)
            
            mse_list.append(mean_squared_error(list_of_reals[x], list_of_forecasts[x]))
            rmse_list.append(mean_squared_error(list_of_reals[x], list_of_forecasts[x], squared=False))
            mape_list.append(smape(list_of_reals[x], list_of_forecasts[x]))
            smape_list.append(mape(list_of_reals[x], list_of_forecasts[x]))
            
    avg_mse = np.array(mse_list).mean()
    avg_rmse = np.array(rmse_list).mean()
    avg_mape = np.array(mape_list).mean()
    avg_smape = np.array(smape_list).mean()
        
    return list_of_reals, list_of_forecasts, avg_mse, avg_rmse, avg_mape, avg_smape

In [None]:
# Autocorrelation and partial autocorrelation plots
ac_pc_fig = plt.figure(figsize=(12,8))
ax1 = ac_pc_fig.add_subplot(211)
ac_pc_fig = sm.graphics.tsa.plot_acf(data_train["beer_prod"].values, lags=60, ax=ax1) # 
ax2 = ac_pc_fig.add_subplot(212)
ac_pc_fig = sm.graphics.tsa.plot_pacf(data_train["beer_prod"].values, lags=60, ax=ax2)# , lags=40

Based on autocorrelation, there seems to be obvious seasonal pattern. It serves also as a guidance for selection range in which the parameters combinations will be tested. Although it would be beneficial to test high amount of parameters, for computational reasons, only lower amount of most plausible parameters will be tested.

In [None]:
# List of parameters for grid search
p_list = [1,2,3]
d_list = [0,1]
q_list = [0,1]
P_list = [1,2,3]
D_list = [0,1]
Q_list = [0,1]
s_list = [12]

In [None]:
%%capture
# Finding parameters 
best_arima_models = find_optimal_arima(data=data_train, target="beer_prod", 
                                       p_list=p_list, d_list=d_list, q_list=q_list,
                                       P_list=P_list, D_list=D_list, Q_list=Q_list, 
                                       s_list=s_list)

In [None]:
# Priting best performing arima models
print(best_arima_models)

Grid search suggested higher number of best performing models. However, since their performance on train dataset is very similar, I will select only two of the best performing ones from each category - one with and one without differencing, for evaluation on test dataset.

In [None]:
%%capture
# Best model without differencing
sarima_no_differencing = ARIMA(endog=data_train["beer_prod"].values, order=(3,0,1), seasonal_order=(3,0,1,12), dates=data_train.index)
sarima_no_differencing_fit = sarima_no_differencing.fit()

_, _, avg_mse_sarima_ndif, avg_rmse_sarima_ndif, avg_mape_sarima_ndif, avg_smape_sarima_ndif = iterate_sarima_over_evals(data_train_ar=data_train,
                                                                                                                         data_test_ar=data_test,
                                                                                                                         target="beer_prod",
                                                                                                                         arima_order=(3,0,1),
                                                                                                                         arima_seasonal_order=(3,0,1,12))

In [None]:
%%capture
# Best model with differencing
sarima_differencing = ARIMA(endog=data_train["beer_prod"].values, order=(2,1,1), seasonal_order=(3,1,1,12), dates=data_train.index)
sarima_differencing_fit = sarima_differencing.fit()

_, _, avg_mse_sarima_dif, avg_rmse_sarima_dif, avg_mape_sarima_dif, avg_smape_sarima_dif = iterate_sarima_over_evals(data_train_ar=data_train,
                                                                                                                     data_test_ar=data_test,
                                                                                                                     target="beer_prod",
                                                                                                                     arima_order=(2,1,1),
                                                                                                                     arima_seasonal_order=(3,1,1,12))

In [None]:
# printing fit metrics of both models in separate cell
print(f"SARIMA without Differencing Train Fit: MSE: {avg_mse_sarima_ndif:.3f}, RMSE: {avg_rmse_sarima_ndif:.3f}, MAPE: {avg_mape_sarima_ndif:.3f}, SMAPE: {avg_smape_sarima_ndif:.3f}")
print(f"SARIMA with Differencing Train Fit: MSE: {avg_mse_sarima_dif:.3f}, RMSE: {avg_rmse_sarima_dif:.3f}, MAPE: {avg_mape_sarima_dif:.3f}, SMAPE: {avg_smape_sarima_dif:.3f}")

In [None]:
# sarima train set predictions chart
arima_plot = go.Figure()

arima_plot.add_trace(go.Scatter(x=data_train.index, y=data_train["beer_prod"].values,
                                mode='lines',
                                name='Monthly Beer Production'))

arima_plot.add_trace(go.Scatter(x=data_train.index, y=sarima_no_differencing_fit.predict(),
                                mode='lines',
                                name="SARIMA without differencing"))

arima_plot.add_trace(go.Scatter(x=data_train.index, y=sarima_differencing_fit.predict(),
                                mode='lines',
                                name="SARIMA with differencing"))

arima_plot.update_yaxes(showline=True, linewidth=1, linecolor='black', gridcolor='black')

arima_plot.update_layout(
                    plot_bgcolor = "rgba(0,0,0,0)",
                    autosize=True,
                    xaxis_title="Date",
                    yaxis_title="Beer Production",
                    title={
                        'text': "SARIMA Train Data Predictions",
                        'y':0.9,
                        'x':0.5,
                        'xanchor': 'center',
                        'yanchor': 'top'})

arima_plot.show()

In [None]:
# sarima residuals plots
fig_residuals = make_subplots(rows=1, cols=2)

fig_residuals.add_trace(go.Histogram(x=sarima_differencing_fit.resid,
                                     name="Histogram"),row=1, col=1)

fig_residuals.add_trace(go.Scatter(x=np.arange(len(sarima_differencing_fit.resid)),y=sarima_differencing_fit.resid,
                                   mode='markers',
                                   name="Scatterplot"),row=1, col=2)

fig_residuals.update_yaxes(showline=True, linewidth=1, linecolor='black', gridcolor='black')

fig_residuals.update_layout(
                    plot_bgcolor = "rgba(0,0,0,0)",
                    autosize=True,
                    title={
                        'text': "SARIMA with differencing<br>Train Data Residual Plots",
                        'y':0.9,
                        'x':0.5,
                        'xanchor': 'center',
                        'yanchor': 'top'})

fig_residuals.show()

On the test dataset, the SARIMA model with differencing is performing better on all fit metrics. However, it still seems that the model does not fit perfectly. The residuals are not normally distributed (even when taking into account that the biggest residual outlier comes from differencing itself). Nevertheless, the model with order of (2,1,1) and seasonal order (3,1,1,12) is used for the final forecast.

In [None]:
# Final Forecast Sarima
final_sarima_model = ARIMA(endog=data["beer_prod"].values, order=(2,1,1), seasonal_order=(3,1,1,12), dates=data.index)
final_sarima_model_fitted = final_sarima_model.fit()
final_sarima_full_forecast = final_sarima_model_fitted.get_forecast(16)
final_sarima_full_forecast_yhat = final_sarima_full_forecast.predicted_mean
final_sarima_full_forecast_yhat_conf_int = final_sarima_full_forecast.conf_int(alpha=0.05)

### <center> Prophet </center>
Prophet is a time series forecasting framework developed by Facebook. As they describe it, it is <i>"a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well,"</i>(Facebook, https://facebook.github.io/prophet/) <br>
More info about the Prophet can be found in the article: Taylor SJ, Letham B. 2017. Forecasting at scale. PeerJ Preprints 5:e3190v2 https://doi.org/10.7287/peerj.preprints.3190v2 <br>
Prophet requires data in a specific format - "ds" column with datetime data and "y" column with target variable to be forecasted. For this reason, there is initial data wrangling to transform data into suitable format so that they could be used for making predictions with Prophet.

In [None]:
# Changing test and train data to format required by Prophet package
data_all_prophet = data.copy()
data_all_prophet.reset_index(inplace=True)
data_all_prophet.rename(columns={"month":"ds","beer_prod":"y"}, inplace=True)
data_all_prophet_pred = data_all_prophet[["ds","y"]]
data_train_prophet_pred = data_all_prophet_pred.iloc[:-72]
data_test_prophet_pred = data_all_prophet_pred.iloc[-72:]

In [None]:
# Intiliazing model
prophet_model = Prophet(weekly_seasonality=False, daily_seasonality=False)
# Fitting the model
prophet_model_fit = prophet_model.fit(data_train_prophet_pred)

In [None]:
# Calculating train fit statistics
prophet_train_mse = mean_squared_error(data_train_prophet_pred["y"].values, prophet_model_fit.predict()["yhat"].values)
prophet_train_rmse = mean_squared_error(data_train_prophet_pred["y"].values, prophet_model_fit.predict()["yhat"].values, squared=False)
prophet_train_mape = mape(data_train_prophet_pred["y"].values, prophet_model_fit.predict()["yhat"].values)
prophet_train_smape = smape(data_train_prophet_pred["y"].values, prophet_model_fit.predict()["yhat"].values)

print(f"Prophet Train Fit: MSE: {prophet_train_mse:.3f}, RMSE: {prophet_train_rmse:.3f}, MAPE: {prophet_train_mape:.3f}, SMAPE: {prophet_train_smape:.3f}")

In [None]:
# Chart Prophet
ph_plot = go.Figure()

ph_plot.add_trace(go.Scatter(x=data_train.index, y=data_train["beer_prod"].values,
                                mode='lines',
                                name='Monthly Beer Production'))

ph_plot.add_trace(go.Scatter(x=data_train.index, y=prophet_model_fit.predict()["yhat"].values,
                                mode='lines',
                                name="Prophet's Prediction"))

ph_plot.update_yaxes(showline=True, linewidth=1, linecolor='black', gridcolor='black')

ph_plot.update_layout(
                    plot_bgcolor = "rgba(0,0,0,0)",
                    autosize=True,
                    xaxis_title="Date",
                    yaxis_title="Beer Production",
                    title={
                        'text': "Prophet's Train Data Prediction",
                        'y':0.9,
                        'x':0.5,
                        'xanchor': 'center',
                        'yanchor': 'top'})

ph_plot.show()

Even with default parameters, Prophet seems to be performing very well on the test dataset when compared to previous models. There seems to be a tendency to overestimate highs and lows of data peaks at the beginning of the time series which reverets into slight underestimation of peaks at the end of the training dataset.

In [None]:
def iterate_prophet_over_evals(data_train_pr, data_test_pr):
    """
    Function iterates over the whole testing datasets
        1) Makes out of sample prediction of length 16
        2) Saves the model performance on the current iteration 
        3) Moves one time step ahead
        4) Refits the model and repeats steps 1-3 until the end of test dataset is reached
        5) Calculates average MSE, RMSE, MAPE and SMAPE metrics
        6) Optionally returns all predictions with corresponding true values for further inspection
    """
    list_of_reals = []
    list_of_forecasts = []
    mse_list = []
    rmse_list = []
    mape_list = []
    smape_list = []
    
    for x in range(len(data_test_pr)-16):
        if x == 0:
            prophet_eval_model = Prophet(weekly_seasonality=False, daily_seasonality=False)
            prophet_eval_model_fit = prophet_eval_model.fit(data_train_pr)
            prophet_forecast_values = prophet_eval_model_fit.predict(pd.DataFrame(data_test_pr["ds"].iloc[x:x+16]))["yhat"].values
            
            list_of_reals.append(data_test_pr["y"].values[x:x+16])
            list_of_forecasts.append(prophet_forecast_values)

            mse_list.append(mean_squared_error(list_of_reals[x], list_of_forecasts[x]))
            rmse_list.append(mean_squared_error(list_of_reals[x], list_of_forecasts[x], squared=False))
            mape_list.append(smape(list_of_reals[x], list_of_forecasts[x]))
            smape_list.append(mape(list_of_reals[x], list_of_forecasts[x]))
            
        else:
            
            data_train_ar = data_train_pr.append(data_test_pr.iloc[x-1])
            
            prophet_eval_model = Prophet(weekly_seasonality=False, daily_seasonality=False)
            prophet_eval_model_fit = prophet_eval_model.fit(data_train_pr)
            prophet_forecast_values = prophet_eval_model_fit.predict(pd.DataFrame(data_test_pr["ds"].iloc[x:x+16]))["yhat"].values
            
            list_of_reals.append(data_test_pr["y"].values[x:x+16])
            list_of_forecasts.append(prophet_forecast_values)
            
            mse_list.append(mean_squared_error(list_of_reals[x], list_of_forecasts[x]))
            rmse_list.append(mean_squared_error(list_of_reals[x], list_of_forecasts[x], squared=False))
            mape_list.append(smape(list_of_reals[x], list_of_forecasts[x]))
            smape_list.append(mape(list_of_reals[x], list_of_forecasts[x]))
            
    avg_mse = np.array(mse_list).mean()
    avg_rmse = np.array(rmse_list).mean()
    avg_mape = np.array(mape_list).mean()
    avg_smape = np.array(smape_list).mean()
        
    return list_of_reals, list_of_forecasts, avg_mse, avg_rmse, avg_mape, avg_smape

In [None]:
_, _, avg_mse_proph, avg_rmse_proph, avg_mape_proph, avg_smape_proph = iterate_prophet_over_evals(data_train_pr=data_train_prophet_pred,
                                                                                                  data_test_pr=data_test_prophet_pred)
print(f"Prophet Test Fit - AVG MSE: {avg_mse_proph:.3f}, AVG RMSE: {avg_rmse_proph:.3f}, AVG MAPE: {avg_mape_proph:.3f}, AVG SMAPE: {avg_smape_proph:.3f}")

As with other algorithms and models, the same methodology is used to measure performance of the Prophet model on the test dataset and since only the default model is tested, it is also used for making the prediction for the final evaluation and forecasts comparison.

In [None]:
# Final Forecast Prophet
final_prophet_model = Prophet()
final_prophet_model_fitted = final_prophet_model.fit(data_all_prophet_pred)
final_prophet_forecast = final_prophet_model_fitted.predict(pd.DataFrame({"ds":pd.date_range(start='1995-09-01', end='1996-12-01', freq='MS')}))["yhat"]

### <center> N-BEATS </center>

N-BEATS is an algorithm using deep neural architecture based on backward and forward residual links and a deep stack of fully-connected layers. The architecture is interpretable, applicable without modification to a wide array of target domains, and relatively fast to train. It was also tested on several competition datasets including M3, M4 and TOURISM competition datasets containing time series from diverse domains. Here, it's implementation using pytorch-forecasting package is used.<br>
For more info see the article <i> N-BEATS: Neural basis expansion analysis for interpretable time series forecasting </i> from Oreshkin et.al (2020) accesible at https://arxiv.org/abs/1905.10437. <br>
Pytorch-forecasting requires data in a specific format so that it could be used to construct TimeSeries Dataset and corresponding dataloaders. Thus, initial datawrangling and transformation has to be conducted.


In [None]:
# Reformatting data to format required by pytorch-forecasting
data_nbeats = data.copy()
data_nbeats.reset_index(inplace=True)
data_nbeats.reset_index(inplace=True)
data_nbeats.rename(columns={'index':'time_idx', "beer_prod":"target"}, inplace=True)
data_nbeats["grouping"] = 1

In [None]:
# Time series dataset and data loaders
max_encoder_length = 24 # two years backast length
max_prediction_length = 16 # 16 months prediction length

training_cutoff = data_nbeats["time_idx"].max() - 72 # taking 6 years as validation data

context_length = max_encoder_length
prediction_length = max_prediction_length

training = TimeSeriesDataSet(
    data_nbeats[lambda x: x.time_idx <= training_cutoff],
    time_idx="time_idx",
    target="target",
    group_ids=["grouping"],
    # only unknown variable is "value" - and N-Beats can also not take any additional variables
    time_varying_unknown_reals=["target"],
    max_encoder_length=context_length,
    max_prediction_length=prediction_length,
)

training_parameters = training.get_parameters() # getting scaling and encoder parameters so that they are used in transformation of validation and prediction data

validation = TimeSeriesDataSet.from_parameters(training_parameters, data_nbeats, min_prediction_idx=training_cutoff + 1)

batch_size = 128

train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=4)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=4)

# calculate baseline
actuals = torch.cat([y[0] for x, y in iter(val_dataloader)])
baseline_predictions = Baseline().predict(val_dataloader)

In [None]:
# Finding the best initial learning rate
pl.seed_everything(42)
trainer = pl.Trainer(gpus=0, gradient_clip_val=0.1)
net = NBeats.from_dataset(training, learning_rate=3e-2, weight_decay=1e-2, widths=[256, 2048], backcast_loss_ratio=0.1)
res = trainer.tuner.lr_find(net, train_dataloader=train_dataloader, val_dataloaders=val_dataloader, min_lr=0.0001, max_lr=0.1)
print(f"suggested learning rate: {res.suggestion()}")
suggested_lr = res.suggestion()

In [None]:
%%capture
# training n-beats network
# early stopping to avoid overfitting
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, verbose=False, mode="min")
trainer = pl.Trainer(
    max_epochs=100,
    gpus=0,
    weights_summary="top",
    gradient_clip_val=0.1,
    callbacks=[early_stop_callback],
)

# nbeats net specifications
net = NBeats.from_dataset(
    training,
    learning_rate=suggested_lr,
    log_interval=-1,
    log_val_interval=1,
    weight_decay=0.01,
    widths=[256, 2048],
    backcast_loss_ratio=1.0,
)

# fitting model to the data
trainer.fit(
    net,
    train_dataloader=train_dataloader,
    val_dataloaders=val_dataloader,
)

In [None]:
# Initial evaluation of the model
# loading the best iteration of the model
best_model_path = trainer.checkpoint_callback.best_model_path
best_model = NBeats.load_from_checkpoint(best_model_path)

# getting predictions
predictions = best_model.predict(val_dataloader)
print(f"Baseline SMAPE: {SMAPE()(baseline_predictions, actuals)}")
print(f"Model SMAPE: {SMAPE()(predictions, actuals)}")

# getting raw predictions
raw_predictions = best_model.predict(val_dataloader, mode="raw", return_x=False)

There is a substantial improvement over the baseline model in SMAPE. However, for the model performance to be comparable to that of other models, it is necessary to compare its ability of forecasting on the test data using the same methodology. Thus, the forecast part of prediction (not the backcast) is run through the same loop as was done with other models and the same fit metrics are calculated.

In [None]:
def iterate_nbeats_over_evals(np_evals, np_predictions):
    """
    Iterates over all predictions and their actual values
    Return mean values of fit statistics for all forecasts predicted by the model
    """
    mse_list = []
    rmse_list = []
    mape_list = []
    smape_list = []
    
    for x in range(len(np_predictions)):
        
        mse_list.append(mean_squared_error(np_evals[x], np_predictions[x]))
        rmse_list.append(mean_squared_error(np_evals[x], np_predictions[x], squared=False))
        mape_list.append(smape(np_evals[x], np_predictions[x]))
        smape_list.append(mape(np_evals[x], np_predictions[x]))
        
    avg_mse = np.array(mse_list).mean()
    avg_rmse = np.array(rmse_list).mean()
    avg_mape = np.array(mape_list).mean()
    avg_smape = np.array(smape_list).mean()
    
    return avg_mse, avg_rmse, avg_mape, avg_smape 

In [None]:
# Evaluating model
raw_predictions_np = raw_predictions["prediction"].numpy()
val_data_nbeats = next(iter(val_dataloader))
val_data_nbeats_np = val_data_nbeats[0]["decoder_target"].numpy()
avg_mse_nbeats, avg_rmse_nbeats, avg_mape_nbeats, avg_smape_nbeats = iterate_nbeats_over_evals(val_data_nbeats_np, raw_predictions_np)
print(f"N-BEATS Test Fit - AVG MSE: {avg_mse_nbeats:.3f}, AVG RMSE: {avg_rmse_nbeats:.3f}, AVG MAPE: {avg_mape_nbeats:.3f}, AVG SMAPE: {avg_smape_nbeats:.3f}")

It seems that in this particular case, N-BEATS will not beat classical time series prediction models in terms of performance on the test dataset. There are, however, many hyperparameters that could be further tuned. Nevertheless, the trained model will be used for producing forecasts that will be used in the final comparison. A uniqiue way of constructing the dataloader for this specific task has to be taken. More precisely, the last n (depending on the backcast length) datapoints of  the original dataset have to be merged with dataset with empty target values that continues with time_idx where the test dataset has ended. This way, a dataloader for the out of sample prediction can be constructed.

In [None]:
# Data for out of sample prediction
data_oos_part_one = data_nbeats.iloc[-24:]
# Creating semi-empty dataset
data_oos_part_two = pd.DataFrame({"month":pd.date_range(start='1995-09-01', end='1996-12-01', freq='MS')})
data_oos_part_two["target"] = 0
data_oos_part_two["beer_prod_log"] = 0
data_oos_part_two["beer_prod_box_cox"] = 0
data_oos_part_two["grouping"] = 1
data_oos_part_two.reset_index(inplace=True)
data_oos_part_two["time_idx"] = data_oos_part_two["index"] + 476
data_oos_part_two = data_oos_part_two[["time_idx","month","target","beer_prod_log","beer_prod_box_cox","grouping"]]
# Putting both datasets together
data_oos = data_oos_part_one.append(data_oos_part_two)

In [None]:
# Final Forecast N-BEATS
oos_pred = TimeSeriesDataSet.from_parameters(training_parameters, data_oos, min_prediction_idx=len(data_nbeats) - 24 + 1)
oos_dataloader = oos_pred.to_dataloader(train=False, batch_size=batch_size, num_workers=0)
final_nbeats_forecast = best_model.predict(oos_dataloader).numpy()[0]

## <center> Best Models Evaluation </center>
Bellow, you can see the table with all selected final models performance on the test dataset. The same methodology was applied to all algorithms. Despite the fact that only one model is selected for presenting the final forecast, I visualize all the predictions so that it can be seen that despite differences in evaluation data fit, all models forecast very similar beer production values in Australia for 1996.

In [None]:
# constructing final comparisons table
final_comparison = pd.DataFrame.from_dict({"Model":["Holt-Winters Exponential Smoothing","SARIMA","Prophet","N-BEATS"],
                                           "AVG MSE":[avg_mse_hw_ad,avg_mse_sarima_dif,avg_mse_proph,avg_mse_nbeats],
                                           "AVG RMSE":[avg_rmse_hw_ad,avg_rmse_sarima_dif,avg_rmse_proph,avg_rmse_nbeats],
                                           "AVG MAPE":[avg_mape_hw_ad,avg_mape_sarima_dif,avg_mape_proph,avg_mape_nbeats],
                                           "AVG SMAPE":[avg_smape_hw_ad,avg_smape_sarima_dif,avg_smape_proph,avg_smape_nbeats]})
final_comparison.set_index("Model",inplace=True)

#Final forecasts data table
forecasted_data = pd.DataFrame({"date":pd.date_range(start='1995-09-01', end='1996-12-01', freq='MS')})
forecasted_data["HWES"] = final_holt_winters_forecast
forecasted_data["SARIMA"] = final_sarima_full_forecast_yhat
forecasted_data["Prophet"] = final_prophet_forecast
forecasted_data["N-BEATS"] = final_nbeats_forecast

display(final_comparison)

In [None]:
# Chart of all modeled predictions for 1996
final_pred_plot = go.Figure()

final_pred_plot.add_trace(go.Scatter(x=forecasted_data["date"][-12:], y=forecasted_data["HWES"].values[-12:],
                                mode='lines',
                                name='Forecast - HWES'))

final_pred_plot.add_trace(go.Scatter(x=forecasted_data["date"][-12:], y=forecasted_data["SARIMA"].values[-12:],
                                mode='lines',
                                name='Forecast - SARIMA'))

final_pred_plot.add_trace(go.Scatter(x=forecasted_data["date"][-12:], y=forecasted_data["Prophet"].values[-12:],
                                mode='lines',
                                name='Forecast - Prophet'))

final_pred_plot.add_trace(go.Scatter(x=forecasted_data["date"][-12:], y=forecasted_data["N-BEATS"].values[-12:],
                                mode='lines',
                                name='Forecast - N-BEATS'))


final_pred_plot.update_yaxes(showline=True, linewidth=1, linecolor='black', gridcolor='black')

final_pred_plot.update_layout(
                    plot_bgcolor = "rgba(0,0,0,0)",
                    autosize=True,
                    xaxis_title="Date",
                    yaxis_title="Beer Production",
                    title={
                        'text': "Beer Production Forecasts for 1996",
                        'y':0.9,
                        'x':0.5,
                        'xanchor': 'center',
                        'yanchor': 'top'})

final_pred_plot.show()

<b> SARIMA was selected as the best performing model </b> for the final forecast due to several reasons:
1. Overall best fit on evaluation dataset - achieving best performance in all of the selected fit metrics (MSE,RMSE,MAPE,SMAPE)
2. It provides a possibility to construct confidence interval for forecasts
3. Relative simplicity of the model and speed of estimation

However, it should be taken into consideration that all 4 models performed reasonably well and, as can be seen in the comparison of all 4 forecasts, all of them provided very similar forecasted values. Moreover, further improvements in model performance could be achieved by - feature engineering (adding new datapoints as covariates) or further tuning more complex models such as N-BEATS (more hyperparameters tuning, settting different decays, amount of stacks, neurons in stacks, backcast lenght and backcast weights) or manualy testing different Prophet's hyperparameters.

## <center> Final SARIMA beer production forecast for 1996 </center>

Bellow, you can find plotted forecast based on the SARIMA model for the year 1996 with 95% confidence intervals built around the point estimate of the prediction. You can also compare the data with beer production from 1994 and partially estimated beer production from 1995.

In [None]:
# Chart with forecasted and historical data. Forecasts based on SARIMA
final_forecast_plot = go.Figure()

final_forecast_plot.add_trace(go.Scatter(x=forecasted_data["date"][-12:], y=final_sarima_full_forecast_yhat_conf_int.reshape(32,-1)[::2].reshape(1,16)[0][-12:],
                                mode='lines',
                                name='Lower 95% CFI',
                                line=dict(color='green', width=0.5)))

final_forecast_plot.add_trace(go.Scatter(x=forecasted_data["date"][-12:], y=final_sarima_full_forecast_yhat_conf_int.reshape(32,-1)[1::2].reshape(1,16)[0][-12:],
                                mode='lines',
                                fill="tonexty",
                                fillcolor='rgba(30, 130, 76, 0.1)',
                                name="Upper 95% CFI",
                                line=dict(color='green', width=0.5)))

final_forecast_plot.add_trace(go.Scatter(x=forecasted_data["date"][-12:], y=final_sarima_full_forecast_yhat[-12:],
                                mode='lines',
                                name='1996 Beer Production Forecast',
                                line=dict(color='green', width=2)))

final_forecast_plot.add_trace(go.Scatter(x=forecasted_data["date"][-12:], y=np.concatenate([data["beer_prod"].values[-8:],final_sarima_full_forecast_yhat[:4]]),
                                mode='lines',
                                name='1995 Beer Production',
                                line = dict(color='red', width=0.8, dash='dash')))

final_forecast_plot.add_trace(go.Scatter(x=forecasted_data["date"][-12:], y=data["beer_prod"].values[-20:-8],
                                mode='lines',
                                name='1994 Beer Production',
                                line = dict(color='royalblue', width=0.8, dash='dash')))

final_forecast_plot.update_yaxes(showline=True, linewidth=1, linecolor='black', gridcolor='black')

final_forecast_plot.update_layout(
                    plot_bgcolor = "rgba(0,0,0,0)",
                    autosize=True,
                    xaxis_title="Date",
                    yaxis_title="Beer Production",
                    title={
                        'text': "Final Beer Production Forecasts for 1996",
                        'y':0.9,
                        'x':0.5,
                        'xanchor': 'center',
                        'yanchor': 'top'})

final_forecast_plot.show()

In [None]:
# Table with previous years data and 1996 year performance expectations
bottom_production_1996 = final_sarima_full_forecast_yhat_conf_int.reshape(32,-1)[::2].reshape(1,16)[0][-12:].sum()
top_production_1996 = final_sarima_full_forecast_yhat_conf_int.reshape(32,-1)[1::2].reshape(1,16)[0][-12:].sum()
mean_production_1996 = final_sarima_full_forecast_yhat[-12:].sum()
mean_production_1995 = np.concatenate([data["beer_prod"].values[-8:],final_sarima_full_forecast_yhat[:4]]).sum()
production_1994 = data["beer_prod"].values[-20:-8].sum()

final_pred_df = pd.DataFrame({"1994":[round(production_1994,2)],"1995":[round(mean_production_1995,2)],
                              "Mean 1996":[round(mean_production_1996,2)],"Upper 1996":[round(bottom_production_1996,2)],
                              "Lower 1996":[round(top_production_1996,2)]})
final_pred_df.rename(index={0:'Yearly Beer Prod'},inplace=True)

data_final_monthly = pd.DataFrame( {"Jan":[round(final_sarima_full_forecast_yhat[-12],2)],
                                    "Feb":[round(final_sarima_full_forecast_yhat[-11],2)],
                                    "Mar":[round(final_sarima_full_forecast_yhat[-10],2)],
                                    "Apr":[round(final_sarima_full_forecast_yhat[-9],2)],
                                    "May":[round(final_sarima_full_forecast_yhat[-8],2)],
                                    "Jun":[round(final_sarima_full_forecast_yhat[-7],2)],
                                    "Jul":[round(final_sarima_full_forecast_yhat[-6],2)],
                                    "Aug":[round(final_sarima_full_forecast_yhat[-5],2)],
                                    "Sep":[round(final_sarima_full_forecast_yhat[-4],2)],
                                    "Oct":[round(final_sarima_full_forecast_yhat[-3],2)],
                                    "Nov":[round(final_sarima_full_forecast_yhat[-2],2)],
                                    "Dec":[round(final_sarima_full_forecast_yhat[-1],2)]})       
data_final_monthly.rename(index={0:'Monthly Beer Prod'},inplace=True)

display_side_by_side([final_pred_df, data_final_monthly], ['<b>Yearly Beer Production Comparison</b>', '<b>1996 Expected Monthly Beer Production</b>'])

The model predicts that there will not be any major movement in the beer production in 1996 compared to two previous years. The overall year producting is expected to be between -14% to +13% compared with both 1995 and 1994, with mean value being almost exactly the same. As in the previous years, the highest beer production is expected to happen in the summer months (January, October, November, December) and the lowest production is expected to be around the winter months (April, May, June, August, September).