## This kernel features...
1. [Intro](#SectionIntro)
2. [EDA (feat. Boken)](#SectionEDA)
3. [Prophet](#SectionProphet)
4. [The winning "dumb" solution](#SectionDumb)
5. [Outro](#SectionOutro)

<a id="SectionIntro"></a>
# Intro
Hi Kaggle! 

This kernel features highly visual EDA and a couple of simple time series forecasting models for [Store Item Demand Forecasting Challenge](https://www.kaggle.com/c/demand-forecasting-kernels-only): 
- [Facebook Prophet](https://facebook.github.io/prophet/docs/quick_start.html);
- A manual solution that brought us to Top 1.

I started writing this kernel when we entered the competition to share data exploration with my teammate and to keep track of performance of my models. Now, when the competion ended, I decided to tidy it up a little and to make it public. The kernel roughly shows our team's path from EDA to the winning submission - I hope you'll find it useful! And I'd really appreciate your feedback, both positive ;) and negative -_-

**Special thanks to...**
- Aditya Soni for [his helpful EDA plots](https://www.kaggle.com/adityaecdrid/my-first-time-series-comp-added-prophet);
- XYZT for sharing [his amazing solution](https://www.kaggle.com/thexyzt/keeping-it-simple-by-xyzt);
- My teammate for being there for me <3

**Known bugs**
- Ipywidgets does not work in a commited notebook due to Kaggle's custom environment ¯\\\_(ツ)\_/¯ You can still fork the notebook to see it in action (it works correctly in a notebook editing session).

**To do (maybe)**
-  We've also experimented with feature engineering + Gradient Boosting (LightGBM and CatBoost), as a standalone model and for model blending. Although it was not a part of our final solution (mostly due to the time constraint), it was pretty cool.
- It would be interesing to compare performance of Prophet with a custom state space model in Stan or PyMC3. The latter could fit multivariate time series, which could give a solid performance boost for this problem.
- In the realm of neural networks, WaveNet shows promising results for multivariate time series - could be worth trying too.

<a id="SectionEDA"></a>
# Exploratory Data Analysis

Exploratory Data Analysis a.k.a. EDA - that's the place we always begin. Let's quickly go through the problem description, data cleaning and exploration.

## Problem Description

The training dataset contains 5 years of daily sales volumes of 50 items in 10 stores (500 time series in total, from 2013 to 2017). No additional features are available, only historical sales volumes.

The goal is to predict sales volumes of all items in all stores in the first quarter of 2018. The evaluation metric is [SMAPE](https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error):

$$SMAPE = average_i \frac{|y_i-\hat{y_i}|}{\frac{1}{2}(y_i+\hat{y_i})} \cdot 100\%.$$

So, we have a time series forecasting problem. A cool thing about time series problems is that our data is likely to be non-stationary: sales probably have a trend and a seasonal patten of some kind. The trend, seasonality and even the whole data distribution may change, depending on the time period. In addition, we are dealing with a multivariate time series. Sales of items in one store or of one item in different stores could be correlated - we should use it to fit our model more accurately.

The problem is described >>>>> our next stop is data cleaning!

## Import packages
We will need:
- Numpy and Pandas to work with tabular data
- Matplotlib, Ipywidgets and Bokeh for data visualisation (BUG: Ipywidgets does not work in a commited notebook)
- StatsModels and Prophet for exporing time series properties and forecasting

To be honest, Bokeh is probably an overkill for a one-off project like this one... But Bokeh plots are pretty so it's all worth it!

In [None]:
import warnings

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    import numpy as np
    import pandas as pd
    from collections import OrderedDict
    import re
    import statsmodels
    from statsmodels.nonparametric.smoothers_lowess import lowess
    import statsmodels.api as sm
#     import sklearn
    import fbprophet
    from fbprophet import Prophet
#     import pymc3
#     import lightgbm as lgb
#     import xgboost as xgb
#     import tensorflow as tf
    import os
    import datetime as dt
    import matplotlib.pyplot as plt
#     import seaborn as sns
    import bokeh
    from bokeh.models import CustomJS, ColumnDataSource, Slider, Label, Div, HoverTool, Band, Span, BoxAnnotation
    from bokeh.plotting import figure
    from bokeh.palettes import Spectral11
    import ipywidgets as widgets
    from IPython.display import display
    from typing import Union, Dict, List, Callable
    from contextlib import contextmanager
    import sys, os
    import datetime as dt

print('numpy version: ', np.__version__)
print('pandas version: ', pd.__version__)
print('statsmodels version: ', statsmodels.__version__)
print('prophet version: ', fbprophet.__version__)
# print('xgboost version: ', xgb.__version__)
# print('pymc3 version: ', pymc3.__version__)
# print('tensorflow version: ', tf.__version__)
print('ipywidgets version: ', widgets.__version__)
warnings.filterwarnings('ignore', module='matplotlib')
bokeh.io.output_notebook()

In [None]:
# timer
# console output supresses

class timer_gen:
    """Simple timer"""
    
    def __init__(self):
        self.t0 = dt.datetime.now()
        self.t1, self.t2 = None, self.t0
    def __iter__(self):
        return self
    def __next__(self):
        self.t1, self.t2 = self.t2, dt.datetime.now()
        return "<timer = {} ({})>".format(self.t2 - self.t1, self.t2 - self.t0)

@contextmanager
def suppress_stdout(on=True):
    """Supress console output"""
    
    if on:
        with open(os.devnull, "w") as devnull:
            old_stdout = sys.stdout
            sys.stdout = devnull
            try:  
                yield
            finally:
                sys.stdout = old_stdout
    else:
        yield

class suppress_stdout_stderr(object):
    """Supress console output 2.0"""
    
    def __init__(self):
        # Open a pair of null files
        self.null_fds = [os.open(os.devnull, os.O_RDWR) for x in range(2)]
        # Save the actual stdout (1) and stderr (2) file descriptors.
        self.save_fds = (os.dup(1), os.dup(2))

    def __enter__(self):
        # Assign the null pointers to stdout and stderr.
        os.dup2(self.null_fds[0], 1)
        os.dup2(self.null_fds[1], 2)

    def __exit__(self, *_):
        # Re-assign the real stdout/stderr back to (1) and (2)
        os.dup2(self.save_fds[0], 1)
        os.dup2(self.save_fds[1], 2)
        # Close the null files
        os.close(self.null_fds[0])
        os.close(self.null_fds[1])
        
from IPython.display import display_html
def display_side_by_side(*args):
    html_str=''
    for df in args:
        if type(df) == pd.Series:
            df = pd.DataFrame(df, columns=['value'])
        html_str+=df.to_html()
    display_html(html_str.replace('table','table style="display:inline"'),raw=True)

## Load data
The data is given in csv format. The training set has less than one million rows and a size of just a few megabates, so no fancy big data tech is required. We also use model predictions pre-computed in other kernels to speed-up this notebook's exectution time.

Sales values are given as integers, so one unit probably corresponds to one item. Safety is our number one priority: to avoid any accidental number rounding in the future, we transform sales to float. For our submission, we do the opposite: we will round the predicted sales.

The data seems to be clean. If we missed any quirks in data, we will catch them during data analysis. Wonderful, we are done >>>>>> we're ready to move to EDA! In this kernel, we'll only do a basic one but with some nice looking interactive plots.

In [None]:
print("Working directory: ", os.getcwd())
print("Input directory: ", os.path.abspath("../input"))
print("Input data: ", os.listdir("../input"))
print(os.listdir("../input/sales-prophet-cv-2017"))

df_train = pd.read_csv('../input/demand-forecasting-kernels-only/train.csv', parse_dates=['date'])
df_test = pd.read_csv('../input/demand-forecasting-kernels-only/test.csv', parse_dates=['date'])
df_train.sales = df_train.sales.astype(np.float)
display_side_by_side(df_train.head(3), df_test.head(3))
print('Entries (Train / Test) : {} / {}'.format(len(df_train), len(df_test)))
s_train, s_test = df_train.store.unique(), df_test.store.unique()
print('Stores (Train / Test) : {} - {} / {} - {}'.format(s_train[0], s_train[-1], s_test[0], s_test[-1]))
s_train, s_test = df_train.item.unique(), df_test.item.unique()
print('Items (Train / Test) : {} - {} / {} - {}'.format(s_train[0], s_train[-1], s_test[0], s_test[-1]))
dates_train, dates_test = df_train.date.unique(), df_test.date.unique()
print('Dates (Train / Test) : {:.10} - {:.10} / {:.10} - {:.10}'.format(dates_train[0], dates_train[-1], dates_test[0], dates_test[-1]))
display(pd.concat([df_train.isnull().sum().rename('Training NaNs'),
                   df_train.isnull().sum().rename('Test NaNs')], axis=1))

## Plot data
Let's plot sales time series to get some general ideas about the data. The data is averaged (weekly) to make plots more readable.

Time series look quite regular, with no obvious outliers. Sales volumes across different items and stores definitely have similatities. In particular, the sales have a trend and a yearly sesonality pattern with a spike during the summer.

We can also see that sales of one item or in one store are definitely correlated. Good news: with this this information we can create a more stable estimator.

Actually, the time series look evan a little too regular. Hmmm that's weird... Could the data be synthetic? Too early to tell, we will get back to this question later.

In [None]:
# --- Matplot + Ipywidgets

# %matplotlib notebook
# %matplotlib inline

def update_ts_simple(s1_num=1, s2_num=2, i1_num=1, i2_num=2):
    fig, ax = plt.subplots(4, figsize = (12, 8))
    df_train.query('store == @s1_num & item == @i1_num').set_index('date')['sales'].resample('W').mean().plot(ax = ax[0])
    df_train.query('store == @s1_num & item == @i2_num').set_index('date')['sales'].resample('W').mean().plot(ax = ax[1])
    df_train.query('store == @s1_num & item == @i1_num').set_index('date')['sales'].resample('W').mean().plot(ax = ax[2])
    df_train.query('store == @s2_num & item == @i2_num').set_index('date')['sales'].resample('W').mean().plot(ax = ax[3])
    ax[0].set_title('item {} store {}'.format(i1_num, s1_num))
    ax[1].set_title('item {} store {}'.format(i2_num, s1_num))
    ax[2].set_title('item {} store {}'.format(i1_num, s2_num))
    ax[3].set_title('item {} store {}'.format(i2_num, s2_num))
    fig.suptitle('Sales Volume TS (Ipywidgets)')
    fig.tight_layout(rect=[0, 0, 1, 0.94])
    fig.canvas.draw()
    fig.show()
    
s1_slider = widgets.IntSlider(value=1, min=1, max=10, continuous_update=False, description='store A', layout={'width': '2.1in', 'height': '1in'})
s2_slider = widgets.IntSlider(value=2, min=1, max=10, continuous_update=False, description='store B', layout={'width': '2.1in', 'height': '1in'})
i1_slider = widgets.IntSlider(value=1, min=1, max=50, continuous_update=False, description='item A', layout={'width': '2.1in', 'height': '1in'})
i2_slider = widgets.IntSlider(value=2, min=1, max=50, continuous_update=False, description='item B', layout={'width': '2.1in', 'height': '1in'})
ui = widgets.HBox([s1_slider, s2_slider, i1_slider, i2_slider], layout={'min_width': '12in'})
out = widgets.interactive_output(update_ts_simple, {'s1_num': s1_slider, 's2_num': s2_slider, 'i1_num': i1_slider, 'i2_num': i2_slider})
display(ui, out)

Here some interactive plots in Bokeh - because I can!

(please don't ask me how much time I wasted on debugging of this plot...)

In [None]:
# --- Matplot + Ipywidgets
# --- Should be a efficient but does NOT work

# store_num = 1
# a, b = 1, 10
# sales = df_train.loc[(df_train.store == store_num), ['date', 'sales', 'item']]
# sales = sales.pivot(index='date', columns='item', values='sales')
# sales.columns = ['it_{}'.format(x) for x in sales.columns]
# sales_w = sales.resample('W').sum()
# # display(sales_w.head(3))
# source_data = ColumnDataSource(data=sales_w)
# p1 = figure(plot_width=750, plot_height=150, title='item {}'.format(a), x_axis_type='datetime', tools="pan,box_zoom,reset")
# line1 = p1.line(x='date', y='it_{}'.format(a), source=source_data)
# p2 = figure(plot_width=750, plot_height=150, title='item {}'.format(b), x_axis_type='datetime', tools=p1.tools,
#             x_range=pa.x_range)
# p2.line('date', 'it_{}'.format(b), source=source_data)

# def update(w):
#     linea.glyph.line_width = w
#     linea.glyph.y = 'it_{}'.format(w)
#     bokeh.io.push_notebook()
# bokeh.io.show(column(p1, p2), notebook_handle=True)
# wid = widgets.interactive(update, w=(1,50))
# wid.children[0].description = ""
# display(wid)

In [None]:
# --- Pure Bokeh

i1, i2 = '1_1', '2_1'
df_train['it_sa'] = df_train.item.astype(str) + '_' + df_train.store.astype(str) 
sales = df_train.pivot(index='date', columns='it_sa', values='sales').resample('W').mean()
df_train.drop(columns=['it_sa'], inplace=True)
# display(sales_w.head(3))
sales_source = sales.loc[:, [i1, i2]].copy()
source = ColumnDataSource(data=sales_source)
source_ref = ColumnDataSource(data=sales)
p1 = figure(plot_width=750, plot_height=150, title=i1, x_axis_type='datetime', tools="pan,wheel_zoom,reset")
line1 = p1.line(x='date', y=i1, source=source)
p2 = figure(plot_width=750, plot_height=150, title=i2, x_axis_type='datetime', tools=p1.tools,
            x_range=p1.x_range)
line2 = p2.line('date', i2, source=source)
p1.add_tools(HoverTool(tooltips=[('sales', '@{}'.format(i1)), ('vs.', '@{}'.format(i2))], 
                       renderers=[line1, line2], mode='vline'))
p2.add_tools(HoverTool(tooltips=[('sales', '@{}'.format(i2)), ('vs.', '@{}'.format(i1))], 
                       renderers=[line1, line2], mode='vline'))

slider_it1 = Slider(start=1, end=50, value=1, step=1, title="item a", callback_policy="mouseup")
slider_it2 = Slider(start=1, end=50, value=1, step=1, title="item b")
slider_sa1 = Slider(start=1, end=10, value=1, step=1, title="store a")
slider_sa2 = Slider(start=1, end=10, value=1, step=1, title="store b")
js_code = """
    var v = cb_obj.value;
    var y_old = source.data['{old}'];
    var y_new = ref.data[{new}];
    for (var i = 0; i < y_old.length; i++) {
        y_old[i] = y_new[i];
    }
    source.change.emit();
"""
callback_it1 = CustomJS(args=dict(source=source, ref=source_ref, s=slider_sa1), code=js_code.replace('{old}', i1).replace('{new}', 'v + "_" + s.value'))
callback_it2 = CustomJS(args=dict(source=source, ref=source_ref, s=slider_sa2), code=js_code.replace('{old}', i2).replace('{new}', 'v + "_" + s.value'))
callback_sa1 = CustomJS(args=dict(source=source, ref=source_ref, s=slider_it1), code=js_code.replace('{old}', i1).replace('{new}', 's.value + "_" + v'))
callback_sa2 = CustomJS(args=dict(source=source, ref=source_ref, s=slider_it2), code=js_code.replace('{old}', i2).replace('{new}', 's.value + "_" + v'))
slider_it1.js_on_change('value', callback_it1)
slider_it2.js_on_change('value', callback_it2)
slider_sa1.js_on_change('value', callback_sa1)
slider_sa2.js_on_change('value', callback_sa2)

layout = bokeh.layouts.column(Div(text='<h4>Sales Volumes TS (Bokeh)</h4>'),
                              bokeh.layouts.row(slider_it1, slider_sa1), p1, 
                              bokeh.layouts.row(slider_it2, slider_sa2), p2)
bokeh.io.show(layout)

In [None]:
# --- Bokeh + Ipywidgets
# --- Probably a simpler way to update data source in a Boken plot

# x = np.linspace(0, 2*np.pi, 2000)
# y = np.sin(x)

# p = figure(title="simple line example", plot_height=300, plot_width=600, y_range=(-5,5))
# r = p.line(x, y, color="#2222aa", line_width=3)

# def update(f, w=1, A=1, phi=0):
#     if   f == "sin": func = np.sin
#     elif f == "cos": func = np.cos
#     elif f == "tan": func = np.tan
#     r.data_source.data['y'] = A * func(w * x + phi)
#     bokeh.io.push_notebook()
    
# bokeh.io.show(p, notebook_handle=True)
# widgets.interact(update, f=["sin", "cos", "tan"], w=(0,100), A=(1,5), phi=(0, 20, 0.1))

## Trend and seasonality

Let's analyze sales trends and seasonalities. First, we add columns with some date-related information: year, month, weekday, etc.

In [None]:
def add_datepart(df, fldname, inplace=False, drop=False):
    if not inplace: df = df.copy()        
    fld = df[fldname]
    fld_dtype = fld.dtype
    if isinstance(fld_dtype, pd.core.dtypes.dtypes.DatetimeTZDtype):
        fld_dtype = np.datetime64
    if not np.issubdtype(fld_dtype, np.datetime64):
        df[fldname] = fld = pd.to_datetime(fld, infer_datetime_format=True)
    targ_pre = re.sub('[Dd]ate$', '', fldname)
    
    attr = ['Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear','Weekofyear']
#     attr = ['Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear','Weekofyear',
#             'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start']
    for n in attr: 
        df[targ_pre + n] = getattr(fld.dt, n.lower())
    if drop: 
        df.drop(fldname, axis=1, inplace=True)
    if not inplace: return df 

df_trainext = add_datepart(df_train, 'date', inplace=False)
# df_testext = add_datepart(df_test, 'date', inplace=False)
display(df_trainext.head(3))

Let's have a look at the aggregared data: sales time series averaged across all items and stores. At first glance, it may look like aggregated time series is much more turbulent than individual ones. In reality, it's not the case: I just plotted it without weekly averaging.

We can see clear yearly and weekly seasonality patterns, which don't really change with time. This is definitely interesting, and also confirms our hypothesis that the sales could be correlated. In the meanwhile, monthly seasonality is not very pronounced.

In [None]:
df_trainext.groupby('date').mean()['sales'].plot(figsize=(12,3), title='Sales TS (aggregated data)')

fig, ax = plt.subplots(2, 2, figsize=(10, 10))
_ = pd.pivot_table(df_trainext, values='sales', columns='Year', index='Month').plot(title="Yearly seasonality", ax=ax[0,0])
_ = pd.pivot_table(df_trainext, values='sales', columns='Month', index='Day').plot(title="Monthly seasonality", ax=ax[0,1])
_ = pd.pivot_table(df_trainext, values='sales', columns='Year', index='Dayofweek').plot(title="Weekly seasonality (by year)", ax=ax[1,0])
_ = pd.pivot_table(df_trainext, values='sales', columns='Month', index='Dayofweek').plot(title="Weekly seasonality (by month)", ax=ax[1,1])
fig.suptitle('Sales seasonality patterns (aggregated data)')
fig.tight_layout(rect=[0, 0, 1, 0.96])

There is also a clear upward trend. Markets are bullish, time to buy! The trend is almost linear, with higher than average increase in 2014 and lower in 2017.

In [None]:
_ = pd.pivot_table(df_trainext, values='sales', index='Year').plot(style='-o', title="Annual trend (aggregated data)")

## Volatility

For sales data we are like to observe periods with high and low volatilites. For example, periods before holidays or certain seasons (depending on what kind of items we have) may have higher customer turn-over.

We check standard deviation of sales volumes across all stores and items, computed independantly at every day (and normalized by average sales volume for every item and store). This plot is almost exactly the same as the plot with aggregated sales above: meaning, periods with higher sales have higher dispersion across different items and stores. The match between these two plot is really good, but standard deviation is "flatter". It indicates that "noise" in sales data is somewhere between multiplicative and additive. Let's keep that in mind.

30-day rolling volatility shows a similar picture. No new insights, except that this plot seems to be consistent with what we've seen so far. 

In [None]:
df_train_norm = df_trainext.copy()
df_train_norm['sales'] /= df_trainext.groupby(['item', 'store'])['sales'].transform('mean')
_ = df_train_norm.groupby(['date'])['sales'].std().plot(figsize=(12,3), title='Volatility (across items and stores)')
_ = (df_train_norm.groupby(['store', 'item'])[['date', 'sales']].rolling(30, on='date').std().groupby(['date']).mean()
     .plot(figsize=(12,3), title='Volatility (30-d rolling, aggregated data)'))

## STL decomposition

We so a rather strong evidence that sales have both trend and seasonality components. As the last step in our EDA, let's apply classical [STL decomposition](https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.seasonal_decompose.html) to individual sales time series. 

We plot trend, seasonal and residual components of the STL decomposition. For for the residuals, partial autocorrelation (PACF) is also plotted; in the title, you can see a p-value of Ljung-Box test for 7 and 30 lags. P-value <0.05 means that the residuals are likely to be autocorrelated, and there is still some information to be extracted.

It seems yearly seasonality is a must: without it, our trend becomes increadibly noisy. However, just yearly seasonality is not sufficient: the PACF plot shows a weekly seasonality pattern in residuals, and Ljung-Box test fails badly. Thus, both annual and weekly seasonalities should be modelled. 

We also notice that multiplicative STL fits data better - but only marginally so.

In [None]:
freq_season_mapping = {'None':None, 'Weekly': 7, 'Monthly':30, 'Yearly':365}

def update_stl_decompose(i_num, s_num, seasonality='Yearly', stl_style='additive'):
    ts = df_train.query('store == @s_num & item == @i_num').set_index('date')['sales']
    freq = freq_season_mapping[seasonality]
    
    fig, ax = plt.subplots(5, 1, figsize=(12,10))
    decomposition = sm.tsa.seasonal_decompose(ts, model=stl_style, freq=freq)
    _ = decomposition.observed.plot(ax=ax[0], title='observed')
    _ = decomposition.trend.plot(ax=ax[1], title='trend')
    _ = decomposition.seasonal.plot(ax=ax[2], title='seasonal')
    _ = decomposition.resid.plot(ax=ax[3], title='residual')
    res = decomposition.resid.values
    res = res[np.isfinite(res)]
#     adfuller_stat = statsmodels.tsa.stattools.adfuller(res)
    ljungbox_stat = statsmodels.stats.diagnostic.acorr_ljungbox(res)
    statsmodels.graphics.tsaplots.plot_pacf(res, ax=ax[4], lags=40, 
                                           title='residuals pacf; ljung-box p-value = {:.2E} / {:.2E}'.format(ljungbox_stat[1][6], 
                                                                                                              ljungbox_stat[1][30]))
    fig.suptitle('STL decomposition')
    fig.tight_layout(rect=[0, 0, 1, 0.96])
    
s_slider = widgets.IntSlider(min=1, max=10, continuous_update=False, description='store', layout={'width': '2.1in', 'height': '1in'})
i_slider = widgets.IntSlider(min=1, max=50, continuous_update=False, description='item', layout={'width': '2.1in', 'height': '1in'})
season_drop = widgets.Dropdown(value='Yearly', options=['Weekly', 'Monthly', 'Yearly'], description='seasonality', layout={'width': '2.1in'})
stltype_drop = widgets.Dropdown(value='multiplicative', options=['additive', 'multiplicative'], description='STL type', layout={'width': '2.1in'})
ui = widgets.HBox([s_slider, i_slider, season_drop, stltype_drop], layout={'min_width': '6in', 'max_width': '6in'})
out = widgets.interactive_output(update_stl_decompose, {'s_num': s_slider, 'i_num': i_slider, 
                                                        'seasonality': season_drop, 'stl_style': stltype_drop})
display(ui, out)

<a id="SectionProphet"></a>
# Prophet

Now, let's test Facebook Prophet package. Like statsmodels STL, Prophet also uses a trend-seasonality decomposition; however, it is more flexible and allows making predictions. Under the hood, Prophet solves a state space model using the Bayesian framework of [Stan](http://mc-stan.org/).

**Pros:**
- Quite easy to use
- Allows specifying multiple seasonalities
- Allows specifying special events
- Can compute quick MAP and slow but accurate Bayesian estimates
- Provides methods for basic plotting out of the box

**Cons:**
- Can only treat univariate time series
- Assumes Gaussian priors
- Does not provide methods to tune hyper-parameters out of the box (for example, seasonality and trend flexibility priors)
- Outputs useless warning messages in the console - and I found no way to hide them

## First look

Prophet here is fit on 2013 - 2016 data with annual and weekly seasonalities using additive decomposition. The latest version of the package allows selecting multiplicative decomposition, we don't have it on Kaggle. Alas! Our analysis above has shown that it might have been beneficial. Of course the keen ones can mannualy log-transform the data to obtain a similar effect.

In any case, the fit looks reasonable. Trend and seasonalities of sales across stores and items look similar and consistent with our previous analysis. We see that Prophet adjusted the trend quite a bit in 2014 - well, we saw the trend in 2014 was kind of an outiler on the aggregate level. We see almost zero uncertainty in the trend component, which is probably good (uncertainty in the seasonal component is not plotted). 

Partial autocorrelation plot of the predicted residuals looks much better than the one for STL decomposition. Ljung-Box tests is doing definitely better as well; p-value for monthly lags is a bit low but still acceptable.

I also output SMAPE over time for the training (2013 - 2016) and validation (2017 Q1) data, smoothed using LOWESS for better visibility. On average, SMAPE is around 15 and 16 for the training and validation data, respectively. Of course, we yet have no idea how it would perform on the real test data (2018 Q1). 

Overall, the results look okay.

In [None]:
# SMAPE - the official metric for the submission

def smape(y: Union[np.ndarray, float], yhat: Union[np.ndarray, float], average=True, signed=False) -> float:
    """SMAPE evaluation metric"""
    
    if signed:
        result = 2. * (yhat - y) / (np.abs(y) + np.abs(yhat)) * 100
    else:
        result = 2. * np.abs(yhat - y) / (np.abs(y) + np.abs(yhat)) * 100
    if average: return np.mean(result)
    return result

def smape_df(df: pd.DataFrame, average=True, signed=False) -> pd.DataFrame:
    return smape(df.y, df.yhat, average=average, signed=signed)

In [None]:
# -- TODO: Add Ipywidgets?

def prophet_show(item, store, cutoff_train, cutoff_eval, prophet_kwargs, title, 
                 plot_components=True, display_df=True):
    ts = (df_train.query('item == @item & store == @store & date < @cutoff_eval')[['date', 'sales']]
          .rename(columns={'date':'ds', 'sales':'y'})).reset_index(drop=True)
    ind_train = pd.eval('ts.ds < cutoff_train')
    ind_eval = ~ ind_train
    len_train, len_eval = ind_train.sum(), ind_eval.sum()
    ts_train = ts.loc[ind_train]
    m = Prophet(**prophet_kwargs)
    m.fit(ts_train)
    ts_hat = m.predict(ts).merge(ts[['ds', 'y']], on='ds', how='left')
    if display_df: display(ts_hat.tail(3))

    df_combined = ts_hat.assign(smape=0, smape_smooth=0)
    df_combined.smape = smape_df(df_combined, average=False)
    df_combined.loc[ind_train, 'smape_smooth'] = lowess(df_combined.loc[ind_train, 'smape'], range(len_train), frac=0.03, return_sorted=False)
    df_combined.loc[ind_eval, 'smape_smooth'] = lowess(df_combined.loc[ind_eval, 'smape'], range(len_eval), frac=0.35, return_sorted=False)
    smape_in = df_combined.loc[ind_train].smape.mean()
    smape_oos = df_combined.loc[ind_eval].smape.mean()
    
    source = ColumnDataSource(data=df_combined)
    p = figure(plot_width=750, plot_height=200, title=("**{}**     item = {} store = {}     train / test = ..{} / ..{}"
                                                       .format(title, item, store, cutoff_train, cutoff_eval)), 
               x_axis_type='datetime', tools="pan,wheel_zoom,reset")
    _ = p.line(x='ds', y='yhat', source=source)
    _ = p.line(x='ds', y='yhat_lower', source=source, line_alpha=0.4)
    _ = p.line(x='ds', y='yhat_upper', source=source, line_alpha=0.4)
    _ = p.scatter(x='ds', y='y', source=source, color='black', radius=0.2, radius_dimension='y', alpha=0.4)
    _ = p.scatter(x='ds', y='y', source=source, color='black', radius=0.2, radius_dimension='y', alpha=0.4)
       
    deltas = np.abs(m.params['delta'][0])
    delta_max = np.max(deltas)
    df_deltas = pd.DataFrame({'ds': m.changepoints.values, 'delta':deltas, 'delta_scaled':ts_hat.yhat.mean() * deltas / delta_max})
    source2 = ColumnDataSource(df_deltas)
    cp1 = p.vbar(x='ds', source=source2, width=1, top=ts_hat.yhat.mean(), color='red', alpha=0.2, hover_color='red', hover_alpha=1)
    cp2 = p.vbar(x='ds', source=source2, width=1.5e+9, top='delta_scaled', color='red', alpha=0.5)
    p.add_tools(HoverTool(tooltips=[('trend delta', '@delta{.000}')], renderers=[cp2], mode='mouse'))
    # p.add_layout(Label(x=1e+10, y=10, text='xasfdfsdfsd'))
    p.add_layout(BoxAnnotation(left=ts_train.ds.iloc[-1], right=ts.ds.iloc[-1]))
    
    p2 = figure(plot_width=750, plot_height=100, title="SMAPE IS / OOS = {:.3f} / {:.3f}".format(smape_in, smape_oos), x_axis_type='datetime', tools="",
                x_range=p.x_range)
    sm1 = p2.line(x='ds', y='smape_smooth', source=source, color='green')
    p2.add_tools(HoverTool(tooltips=[('smape', '@smape')], renderers=[sm1], mode='vline', line_policy='interp'))
    p2.add_layout(BoxAnnotation(left=ts_train.ds.iloc[-1], right=ts.ds.iloc[-1]))
    p2.yaxis[0].ticker.desired_num_ticks = 2
    bokeh.io.show(bokeh.layouts.column(p, p2))
    
    if plot_components:
        _ = m.plot_components(ts_hat, uncertainty=True)
        fig, ax = plt.subplots(1, 1, figsize=(12, 2))
#         res = ts_hat.query('ds < @cutoff_train').yhat - ts_train.y
        res = (df_combined['y'] - df_combined['yhat'])
#         adfuller_stat = statsmodels.tsa.stattools.adfuller(res.values)
        ljungbox_stat = statsmodels.stats.diagnostic.acorr_ljungbox(res.values)
        _ = statsmodels.graphics.tsaplots.plot_pacf(res, lags=40, ax=ax,
                                                    title='residuals pacf; ljung-box p-value = {:.2E} / {:.2E}'.format(ljungbox_stat[1][6], 
                                                                                                                      ljungbox_stat[1][30]))
    
prophet_show(item=1, store=1, cutoff_train="2017-01-01", cutoff_eval="2017-04-01",
             prophet_kwargs={'yearly_seasonality':True, 'weekly_seasonality':True,
                            'uncertainty_samples':500},
            title='Prophet')

## Cross-validated error

Now let's properly evaluate Prophet predictive strenght using cross-validation (CV). It is always tricky to cross-validate a time series. Here I used the following strategy: a model is fit on the data since the day 0 till the day $X_k$, then the model is evaluated on the data from the day $X_k$ till the day $X_{k+1}$. The plot below shows CV SMAPE evaluated on quaterly CV folds, ranging from 2015 Q1 to 2017 Q4.

While in-sample SMAPE is quite stable, out-of-sample SMAPE varies quite a lot: it ranges from 11 to 18. Unfortunately, the worst performace is always in the first quarter of the year - and evaluation period for the submission is 2018 Q1.

In [None]:
prophet_cv = pd.read_csv('../input/sales-prophet-cv-2017/cv_prophet_yk.csv', index_col=[0,1,2,3])
display(prophet_cv.head(3))
print('N rows = ', prophet_cv.shape[0])

In [None]:
def show_cv_sampe_agg(data_cv):
    cv_error = data_cv.groupby(['cv_fold', 'sample']).apply(smape_df)
    display(pd.DataFrame(cv_error).T)
    source_index = cv_error[:, 'oos'].index
    source_in = ColumnDataSource(data=pd.DataFrame(cv_error[:, 'in'],
                                                   index=source_index,
                                                   columns=['smape']))
    source_oos = ColumnDataSource(data=pd.DataFrame(cv_error[:, 'oos'], 
                                                   index=source_index,
                                                   columns=['smape']))
    p = figure(plot_width=750, plot_height=250, title="Prophet CV SMAPE", 
               x_range=source_index.values, tools="pan,wheel_zoom,reset")
    p.xaxis.major_label_orientation = -np.pi / 4
    l1 = p.circle(x='cv_fold', y='smape', source=source_in, legend='in-sample', size=7)
    l2 = p.circle(x='cv_fold', y='smape', source=source_oos, color='red', legend='out-of-sample', size=6)
    p.legend.click_policy="hide"
    p.add_tools(HoverTool(tooltips=[('smape', '@smape')], renderers=[l1, l2], mode='mouse'))
    bokeh.io.show(p)
    
show_cv_sampe_agg(prophet_cv)

If we take a closer look at one CV fold, we can see a clear monthly seasonal pattern for both in- and out-of-sample fit.

We saw that sales volatility is higher in winter - so worse in-sample fit in winter is to be expected. High monthly osciliations out-of-sample are more difficult to interpret. There a number of possible explanations: whether we could not fit monthly seasonality properly, or we overfit yearly seasonality, or our trend does not generalize well, or it is once again due to osciliations in volatility...
 
Overall, however, there are no obvious red flags.

In [None]:
def show_cv_smape_by_time(data_cv, cutoff_eval):
    df = pd.DataFrame({'smape_in': data_cv.query('cv_fold == @cutoff_eval & sample == "in"').groupby(['ds']).apply(smape_df),
                       'smape_oos': data_cv.query('cv_fold == @cutoff_eval & sample == "oos"').groupby(['ds']).apply(smape_df)
                       })
    df['smape_in_smooth'] = lowess(df.smape_in, range(df.shape[0]), frac=0.015, return_sorted=False)
    df['smape_oos_smooth'] = lowess(df.smape_oos, range(df.shape[0]), frac=0.15, return_sorted=False)
    df.index = pd.to_datetime(df.index)

    source = ColumnDataSource(df)
    p = figure(plot_width=750, plot_height=200, title="Prophet SMAPE by time: {} cv fold".format(cutoff_eval),
               x_axis_type='datetime', tools="pan,wheel_zoom,reset")
    sm1 = p.line(x='ds', y='smape_in_smooth', source=source, color='green')
    sm2 = p.line(x='ds', y='smape_oos_smooth', source=source, color='red')
    p.add_tools(HoverTool(tooltips=[('smape', '@smape_in')], renderers=[sm1], mode='vline', line_policy='interp'))
    p.add_tools(HoverTool(tooltips=[('smape', '@smape_oos')], renderers=[sm2], mode='vline', line_policy='interp'))
    # p.yaxis[0].ticker.desired_num_ticks = 2
    bokeh.io.show(p)
    
show_cv_smape_by_time(prophet_cv, cutoff_eval="2017-04-01")

Let's check out-of-sample SMAPE only. The OOS error also has a clear repetitive pattern: in particular, 2015 Q1 kind of resembes 2016 Q1 and 2017 Q1.

What does it tell us? It is yet another indication that our data is quite regular. Apart from that - not much really, but it may become more useful when compared to other models.

In [None]:
def show_cv_smape_by_time_oos(data_cv):
    df = data_cv.query('sample == "oos" & ds > "2015-01-01"').groupby(['cv_fold', 'ds']).apply(smape_df).unstack(level='cv_fold')
#     folds = df['cv_fold'].unique()
#     df['smape_smooth'] = np.nan
    for col in df.columns:
        df[col] = lowess(df[col], range(df.shape[0]), frac=0.15, return_sorted=False)
    df.index = pd.to_datetime(df.index)

    source = ColumnDataSource(df)
    p = figure(plot_width=750, plot_height=200, title="Prophet OOS SMAPE by time",
               x_axis_type='datetime', tools="pan,wheel_zoom,reset")
    for col in df.columns.values:
        _ = p.line(x='ds', y=col, source=source, color='red')
        p.add_layout(Span(location=pd.to_datetime(col).value / 1e6,
                          dimension='height', line_color='black',
                          line_dash='dashed', line_width=1))
    for col in np.append(df.columns.values, "2015-01-01"):
        p.add_layout(Span(location=pd.to_datetime(col).value / 1e6,
                          dimension='height', line_color='black',
                          line_dash='dashed', line_width=1))
    # p.yaxis[0].ticker.desired_num_ticks = 2
    p.add_tools(HoverTool(tooltips=[('smape', '$y')], mode='vline', line_policy='interp'))
    bokeh.io.show(p)
    
show_cv_smape_by_time_oos(prophet_cv)

Let's check what item-store combinations correspond to the best and worst predictions (i.e., predictions with the lowest and the highest SMAPE, respectively). 

We see that "best performers" have a smoother shape of their historical sales, while "worst performers" are generally noisier - well, nothing unexpected. Interestingly, all best performers have higher sales volume values. It means that noise in sales data is not purely multiplicative, which is consistent with our volatility plots.

In [None]:
def plot_data_cv_example(df, cv_fold_idx, num_idx, ax):
        ts = df.loc[(cv_fold_idx, 'in', num_idx, slice(None)), 'y'].reset_index(level=[0,1,2], drop=True)
        ts = ts[pd.notnull(ts)]
        ts.index = pd.to_datetime(ts.index)
        ax.plot(ts.resample('w').mean())
        ax.set_title(num_idx)

def show_cv_best_worst(data_cv, cutoff_eval):
    cv_error_num = data_cv.groupby(['sample', 'num']).apply(smape_df)
    cv_error_num_oos = cv_error_num.loc['oos', :].sort_values()
    cv_error_num_in = cv_error_num.loc['in', :].sort_values()
    print("-------- Best: Lowest SMAPE ---------")
    display_side_by_side(cv_error_num_in.head(), cv_error_num_oos.head())
    print("-------- Worst: Highest SMAPE ---------")
    display_side_by_side(cv_error_num_in.tail(), cv_error_num_oos.tail())

    cv_fold_idx = cutoff_eval
    fig, ax = plt.subplots(3, 1, figsize=(10, 5))
    plot_data_cv_example(df=data_cv, cv_fold_idx=cv_fold_idx, num_idx=cv_error_num_oos.index[0][1], ax=ax[0])
    plot_data_cv_example(df=data_cv, cv_fold_idx=cv_fold_idx, num_idx=cv_error_num_oos.index[1][1], ax=ax[1])
    plot_data_cv_example(df=data_cv, cv_fold_idx=cv_fold_idx, num_idx=cv_error_num_oos.index[2][1], ax=ax[2])
    fig.suptitle('Best: Lowest OOS SMAPE')
    fig.tight_layout(rect=[0, 0, 1, 0.94])

    fig, ax = plt.subplots(3, 1, figsize=(10, 5))
    plot_data_cv_example(df=data_cv, cv_fold_idx=cv_fold_idx, num_idx=cv_error_num_oos.index[-1][1], ax=ax[0])
    plot_data_cv_example(df=data_cv, cv_fold_idx=cv_fold_idx, num_idx=cv_error_num_oos.index[-2][1], ax=ax[1])
    plot_data_cv_example(df=data_cv, cv_fold_idx=cv_fold_idx, num_idx=cv_error_num_oos.index[-3][1], ax=ax[2])
    fig.suptitle('Worst: Highest data_cv SMAPE')
    fig.tight_layout(rect=[0, 0, 1, 0.94])
    
show_cv_best_worst(prophet_cv, cutoff_eval="2017-04-01")

## Addings US holidays

Prophet can incorporate information about holidays or any other special events. Splendid! Let's try to include US federal holidays. We don't know what these sales are but there is a good chance that the firm is US based or at least have US customers.

Unfortunately, including US federal holidays does not seem to improve the fit. On the bright side, now I know when the birthday of Martin Luther King is!

In [None]:
holidays_us_raw = pd.read_csv('../input/federal-holidays-usa-19662020/usholidays.csv')
display(holidays_us_raw.head(3))
holidays_us = pd.DataFrame({
    'holiday':'US',
    'ds':holidays_us_raw.Date, 
    'lower_window': -1,
    'upper_window': 0})
display(holidays_us.head(3))

In [None]:
prophet_show(item=1, store=1, cutoff_train="2017-01-01", cutoff_eval="2017-04-01",
             prophet_kwargs={'yearly_seasonality':True, 'weekly_seasonality':True,
                             'holidays':holidays_us, 'uncertainty_samples':500},
            title='Prophet (US holidays)', plot_components=True)

## Tuning hyper-parameters by cross-validation

So far, we used Prophet with the default hyper-parameters that define seasonal and trend flexibility. This model gives a public score of 16.9 and a private score of 14.1.

I tuned the hyper-parameters by cross-validation on 2017 Q1 in a separate notebook. The resulting model has a less flexible yearly seasonality and does not account for US holidays. That model gives a public score of 15.4 and a private score of 13.9. That's better... but the performance is still far from top models. At the moment of testing Prophet, top public scores had already been below 14.0.

So, what do we do now? We could of course work a little more on tuning Prophet hyper-parameters. Instead, we'll try a new approach.

In [None]:
# -- TODO: add CV plots 


<a id="SectionDumb"></a>
# The winning "dumb" solution

Let's get back to our data analysis. If you remember, we've seen quite a few quirks:
- The data has no apperent outliers at all;
- Visually, shapes of sales volume time series across all items and stores look suspiciously regular and similar to each other;
- Weekly and yearly seasonal patterns are very stable for every store-item combination, and trend is relatively stable too;
- Consistency between time series level and standard deviation indicates unusual regularity in time series noise component;
- Very simple state space models fit the data quite well, which indicates regularity in time series trend and seasonal components;
- Out-of-sample error has a seasonal pattern, which may indicate some structural regularity in sales time series.

Okay, we cannot ignore this anymore. This sales volume data is way too clean and regular... it's totally synthetic! To be honest, I was not very happy when I found out: that was not a real-world problem I had hoped to solve... Well, two chocolate bars managed to make me feel better.

Short time after I found this horrific truth, I stumbled upon [the kernel of XYZT](https://www.kaggle.com/thexyzt/keeping-it-simple-by-xyzt). It showed me how deep the rabbit hole goes.

In [None]:
# sales_total_average = df_train['sales'].mean()
# df_train_norm = df_train.copy()
# df_train_norm['sales'] /= df_train.groupby(['item', 'store'])['sales'].transform('mean')

fig, ax = plt.subplots(3, 2, figsize=(12, 16))
_ = df_train_norm.groupby(['item', 'Dayofweek'])['sales'].mean().unstack('item').plot(title='Weekly sales by item', legend=False, ax=ax[0,0])
_ = df_train_norm.groupby(['store', 'Dayofweek'])['sales'].mean().unstack('store').plot(title='Weekly sales by store', legend=False, ax=ax[1,0])
_ = df_train_norm.groupby(['item', 'store', 'Dayofweek'])['sales'].mean().unstack(['item', 'store']).plot(title='Weekly sales by item and store', legend=False, ax=ax[2,0])
_ = df_train_norm.groupby(['item', 'Year'])['sales'].mean().unstack('item').plot(title='Yearly sales by item', legend=False, ax=ax[0,1])
_ = df_train_norm.groupby(['store', 'Year'])['sales'].mean().unstack('store').plot(title='Yearly sales by store', legend=False, ax=ax[1,1])
_ = df_train_norm.groupby(['item', 'store', 'Year'])['sales'].mean().unstack(['item', 'store']).plot(title='Yearly sales by item and store', legend=False, ax=ax[2,1])
fig.suptitle('This is how deep the rabbit hole goes')
fig.tight_layout(rect=[0, 0, 1, 0.96])

## Math

So, the data is not just synthetic. Basically, a sales volume time series for any item and store seems to be generated approximately as 

$TS(item, sales, date) = TS_{base}(date) * c(item, sales) * \eta_1(item, sales, date) + \eta_2(item, sales, date)$,

where 
- $TS_{base}$ is a fixed time series common for all items and stores,
- $c(item, sales)$ is a unique time series multiplier,
- $\eta_1$ and $\eta_2$ are white noise variables.

Now we need to model $TS_{base}$. Judging by what we've seen so far, it can be expressed as

$TS_{base}(date) = trend(year)*seasonality_{weekly}(weekday)*seasonality_{yearly}(month)$.

Thus, to model sales volumes we need to estimate 4 functions: $c(item, sales)$, $trend(year)$, $seasonality_{weekly}(weekday)$ and $seasonality_{yearly}(month)$. Let's do it in the "dumbest" way possible: by averaging data.

$c(item, sales) = average_{date} \, TS(item, sales, date)$

$trend(year) = \frac{average_{item, store, year} \, TS(item, sales, date)}{average_{total} \, TS}$

$seasonality_{weekly}(weekday) = \frac{average_{item, store, weekday} \, TS(item, sales, date)}{average_{total} \, TS}$

$seasonality_{yearly}(month) = \frac{average_{item, store, month} \, TS(item, sales, date)}{average_{total} \, TS}$

Now to make predictions, we simply look up values of $c$, $seasonality$ and $trend$. Okay, there is still one small issue: we have to extrapolate our trend.


## Extrapolating trend

Function extrapolation is never an easy task as even functions that have a similar fit on internal data points can have a very different behaviour outside of them. Check some examples below.

In [None]:
annual_sales_avg = pd.pivot_table(df_trainext, values='sales', index='Year')
x_range = np.linspace(2013, 2018, 50)
annual_growth_linear = np.poly1d(np.polyfit(annual_sales_avg.index.values, annual_sales_avg['sales'].values, 1))
annual_growth_quadratic = np.poly1d(np.polyfit(annual_sales_avg.index.values, annual_sales_avg['sales'].values, 2))
annual_growth_linear_ll = (statsmodels.nonparametric
                           .kernel_regression.KernelReg(annual_sales_avg['sales'].values,
                                                        annual_sales_avg.index.values, 
                                                        'c', bw=[1]))
_ = annual_sales_avg.plot(style='o', title="Average annual trend", figsize=(8, 6))
_ = plt.plot(x_range, annual_growth_linear(x_range), '--', label='linear')
_ = plt.plot(x_range, annual_growth_linear_ll.fit(x_range)[0], '-.', label='local linear')
_ = plt.plot(x_range, annual_growth_quadratic(x_range), '.', label='quadratic')
_ = plt.legend()

We could try using cross-validation to find the best extrapolation method. The problem is we only have 5 year to do so it is unlikely to be very efficient. When little data available stable methods usually perform the best: they have low variance and there is not much hope of having low bias anyway. So the best solution seems to be picking linear extrapolation or local linear extrapolation with high regularization. The solution with linear trend give us a public score of 13.875. Pretty good for such a "dumb" algorithm! This is also one of the earlier solutions described in XYZT's kernel.

It's unfortunate that we don't have more data to tune the trend. I guess we should stop here. Unless we use one ancient dark and forbidden technique... We can tune the trend using the public score! Yes, it is typically better only to use it during final model validation rather than model parameter tuning - otherwise it's very easy overfit the model. However, in this case we should be fine: we are only tuning one parameter of a pretty stable model. This method gives us expected value of sales in 2018 approximately equal to 60.5, or $trend(2018) \approx 1.158$. Let's just hardcode this value.

The solution with a hardcoded trend give us a public score of 13.852 (and a private score of 12.587), which is pretty damn good!

Just to be on the safe side, I cross-validated "dumb" models with linear, qudratic and hardcoded trend extrapolation, from 2016 Q1 to 2017 Q4 using the same methodology as before. The hardcoded version (HC) generally shows a lower error which is a pretty good sign.

In [None]:
# Just for the reference, here is the code of the "dumb" model
from abc import abstractmethod, ABC

class DumbBase(ABC):
    """Dumb base model"""
    
    name = "dumb_base"
    
    def __init__(self, growth='linear', fit_window_years=None):
        self.growth = growth
        self.fit_window_years = fit_window_years
        self.verbose = True
    
    @staticmethod
    def expand_data(data):
        data_exp = data.copy()
        data_exp['day'] = data['date'].apply(lambda x: x.day)
        data_exp['month'] = data['date'].apply(lambda x: x.month)
        data_exp['year'] = data['date'].apply(lambda x: x.year)
        data_exp['dayofweek'] = data['date'].apply(lambda x: x.dayofweek)
        data_exp['weekofyear'] = data['date'].apply(lambda x: x.weekofyear)
        data_exp['dayofyear'] = data['date'].apply(lambda x: x.dayofyear)
        return data_exp
    
    def _fit_annual_sales(self):
        if isinstance(self.growth, pd.DataFrame):
            self._annual_sales = lambda x: self.growth.loc[x, 'sales']
        
        else:
            print('Dumb fit: functional annual growth')
            year_table = pd.pivot_table(self.data, index='year', values='sales', aggfunc=np.mean)
            years = year_table.index.values
            annual_sales_avg = year_table.values.squeeze()

            if growth == 'linear': 
                self._annual_sales = np.poly1d(np.polyfit(years, annual_sales_avg, 1))
            elif growth == 'quadratic': 
                self._annual_sales = np.poly1d(np.polyfit(years, annual_sales_avg, 2))
            else:
                raise KeyError
    
    @abstractmethod
    def _fit_base_seasonality(self):
        pass
    
    def fit(self, data):
        if 'year' in data.columns:
            self.data = data.copy()
        else:
            print('Dumb fit: Expand data')
            self.data = self.expand_data(data)
        
        if self.fit_window_years is not None:
            date_max = self.data['date'].max()
            date_min = date_max.replace(year=date_max.year-self.fit_window_years)
            self.data = self.data.query('date > @date_min')
            
        self.data['sales'] /= self.data['sales'].mean()
        self._fit_base_seasonality()
        self._fit_annual_sales()        
        
    @abstractmethod
    def _predict_base_seasonality(self, item, store, date):
        pass
    
    def _predict_annual_sales(self, year):
        return self._annual_sales(year)
        
    def predict(self, data):
        data = data.assign(sales_hat=.001)
        with suppress_stdout(not self.verbose):
            timer = timer_gen()
            count = 1
            for i, row in data.iterrows():
                if count % 100000 == 0: print("dumb predict {} {}".format(count, next(timer), end=' | '))
                date, item, store = row['date'], row['item'], row['store']
                pred_sales = self._predict_base_seasonality(item, store, date) * self._predict_annual_sales(date.year)
                data.at[i, 'sales_hat'] = pred_sales
                count += 1
        return data
    
    
class DumbOriginal(DumbBase):
    """
    Original Dumb model
    
    sales = base(item, store) * s(dayofweek) * s(month) * trend(year)
    """
    
    name = "dumb_original"
    
    def _fit_base_seasonality(self):
        self.store_item_table = pd.pivot_table(self.data, index='store', columns='item',
                                               values='sales', aggfunc=np.mean)
        self.month_table = pd.pivot_table(self.data, index='month', values='sales', aggfunc=np.mean)
        self.dow_table = pd.pivot_table(self.data, index='dayofweek', values='sales', aggfunc=np.mean)
        
    def _predict_base_seasonality(self, item, store, date):
        dow, month, year = date.dayofweek, date.month, date.year
        base_sales = self.store_item_table.at[store, item]
        seasonal_sales = self.month_table.at[month, 'sales'] * self.dow_table.at[dow, 'sales']
        return base_sales * seasonal_sales    

In [None]:
dumb_hc_cv = pd.read_csv('../input/sales-dumb-all-cv2017/cv_dumb_hc_oos_yk.csv', index_col=[0,1,2,3])
print('N rows = ', dumb_hc_cv.shape[0])
display(dumb_hc_cv.head(3))

In [None]:
dumb_linear_cv = pd.read_csv('../input/sales-dumb-all-cv2017/cv_dumb_linear_oos_yk.csv', index_col=[0,1,2,3])
display(dumb_linear_cv.head(3))
dumb_quadratic_cv = pd.read_csv('../input/sales-dumb-all-cv2017/cv_dumb_quadratic_oos_yk.csv', index_col=[0,1,2,3])
display(dumb_quadratic_cv.head(3))

In [None]:
def create_df_cv_smape_arrg(df_cv_dict):
    df_smape_aggr = pd.concat((val.groupby(['cv_fold']).apply(smape_df).rename(key)
                               for key, val in df_cv_dict.items()), axis=1)
    df_smape_aggr.index = [x - pd.Timedelta(days=1) for x in pd.to_datetime(df_smape_aggr.index)]
    return df_smape_aggr
    
df_dumb_smape_aggr = create_df_cv_smape_arrg(OrderedDict((
    ('Dumb HC', dumb_hc_cv),
    ('Dumb Linear', dumb_linear_cv),
    ('Dumb Quadratic', dumb_quadratic_cv),
)))
_ = df_dumb_smape_aggr.plot(figsize=(10, 3), style='o--', title='Dumb CV SMAPE')

To be on a even safer side, let's check the cross-validated errors date by date. *Signed* SMAPE is SMAPE without absolute operator:

$$SMAPE_{signed} = average_i \frac{y_i-\hat{y_i}}{\frac{1}{2}(y_i+\hat{y_i})} \cdot 100\%.$$

If our model is not strongly biased, signed SMAPE should be close to zero. 

The hardcoded version seems to perform alright. Awesome!

In [None]:
def show_cv_smape_by_time_oos_2(data_cv, m_name):
    data_gb = data_cv.groupby(['cv_fold', 'ds'])
    df_u = data_gb.apply(smape_df, signed=False).unstack(level='cv_fold')
    df_s = data_gb.apply(smape_df, signed=True).unstack(level='cv_fold')
    cols_original = df_u.columns.copy()
    df_u.columns = [x + '_u' for x in cols_original]
    df_s.columns = [x + '_s' for x in cols_original]
    df = pd.concat([df_u, df_s], axis=1)
    for col in df.columns:
        df[col] = lowess(df[col], range(df.shape[0]), frac=0.15, return_sorted=False)
    df.index = pd.to_datetime(df.index)

    source = ColumnDataSource(df)
    p_u = figure(plot_width=750, plot_height=200, title="{} OOS SMAPE by time".format(m_name),
                 x_axis_type='datetime', tools="pan,wheel_zoom,reset")
    for col in df_u.columns:
        _ = p_u.line(x='ds', y=col, source=source, color='red')
        
    p_s = figure(plot_width=750, plot_height=200, title="{} OOS SMAPE by time (signed)".format(m_name),
                 x_axis_type='datetime', tools="pan,wheel_zoom,reset", x_range=p_u.x_range)
    for col in df_s.columns:
        _ = p_s.line(x='ds', y=col, source=source, color='blue')
        
    for col in np.append(cols_original.values, "2016-01-01"):
        p_s.add_layout(Span(location=pd.to_datetime(col).value / 1e6,
                            dimension='height', line_color='black',
                            line_dash='dashed', line_width=1))
        p_u.add_layout(Span(location=pd.to_datetime(col).value / 1e6,
                            dimension='height', line_color='black',
                            line_dash='dashed', line_width=1))
    p_s.add_tools(HoverTool(tooltips=[('smape', '$y')], mode='vline', line_policy='interp'))    
    p_u.add_tools(HoverTool(tooltips=[('smape', '$y')], mode='vline', line_policy='interp'))
    
    bokeh.io.show(bokeh.layouts.column(p_u, p_s))
    
show_cv_smape_by_time_oos_2(dumb_hc_cv, 'Dumb HC')
show_cv_smape_by_time_oos_2(dumb_linear_cv, 'Dumb Linear')
show_cv_smape_by_time_oos_2(dumb_quadratic_cv, 'Dumb Quadratic')

Like before, we can check best and worst performers across items and stores. Okay, I don't see any obvious pattern here and the time is limited... Let's move on!

In [None]:
def plot_data_cv_example_2(df, item, store, ax):
    ts = df.loc[(slice(None), slice(None), item, store), ['y', 'yhat']].reset_index(level=[2,3], drop=True).unstack(level=['cv_fold'])
    ts.index = pd.to_datetime(ts.index)
    (ts['yhat'] - ts['y']).abs().resample('w').mean().plot(ax=ax, title="{}_{}".format(item, store), legend=False)

def show_cv_best_worst_2(data_cv, cutoff_eval):
    smape_itsa = data_cv.groupby(['item', 'store']).apply(smape_df).sort_values()
    print("-------- Best: Lowest SMAPE OOS ---------")
    display_side_by_side(smape_itsa.head())
    print("-------- Worst: Highest SMAPE OOS ---------")
    display_side_by_side(smape_itsa.tail())

    cv_fold_idx = cutoff_eval
    fig, ax = plt.subplots(3, 1, figsize=(12, 6))
    plot_data_cv_example_2(df=data_cv, item=smape_itsa.index[0][0], store=smape_itsa.index[0][1], ax=ax[0])
    plot_data_cv_example_2(df=data_cv, item=smape_itsa.index[1][0], store=smape_itsa.index[1][1], ax=ax[1])
    plot_data_cv_example_2(df=data_cv, item=smape_itsa.index[2][0], store=smape_itsa.index[2][1], ax=ax[2])
    fig.suptitle('Best: Lowest OOS SMAPE')
    fig.tight_layout(rect=[0, 0, 1, 0.94])

    fig, ax = plt.subplots(3, 1, figsize=(12, 6))
    plot_data_cv_example_2(df=data_cv, item=smape_itsa.index[-1][0], store=smape_itsa.index[-1][1], ax=ax[0])
    plot_data_cv_example_2(df=data_cv, item=smape_itsa.index[-2][0], store=smape_itsa.index[-2][1], ax=ax[1])
    plot_data_cv_example_2(df=data_cv, item=smape_itsa.index[-3][0], store=smape_itsa.index[-3][1], ax=ax[2])
    fig.suptitle('Worst: Highest OOS SMAPE')
    fig.tight_layout(rect=[0, 0, 1, 0.94])
    
show_cv_best_worst_2(dumb_hc_cv, cutoff_eval="2017-04-01")

## Tuning seasonality by CV

The "dumb" model works pretty well already. However, this model is quite rigid so it could benifit from adding some flexibility. There are a few ways to do it:
- Custom seasonalities: fit seasonalities individually for different stores or items;
- More granular seasonality: e.g., add monthly seasonality;
- Fit the model using recent data only;
- Fit another model on residuals: e.g., Prophet;
- Blend with other models: e.g., with Boosted Trees.

I've tried all these approaches. Some of them did alright and may have been used in our final submission if we had more time; others less so. In this kernel we'll focus on the very first approach.

## Custom item-store weekly seasonalities

To begin, let's try fitting weekly seasonalities separately for every store, item or both. For example, if we fit sesonalities separately by item, $seasonality_{weekly}(weekday)$ becomes $seasonality_{item, weekly}(item, weekday)$. 
 
Great, there is a slight improvement for seasonalities fit by item and seasonalities fit by store! If we try fitting seasonalities by item and store at once though, our model seems to overfit.

In [None]:
# Just for the reference, here is the code of the seasonal variations

class DumbItemDayofweek(DumbBase):
    """
    sales = base(store) * s(dayofweek, item) * s(month) * trend(year)
    """
    
    name = "dumb_item_dow"
    
    def _fit_base_seasonality(self):
        self.store_table = pd.pivot_table(self.data, index='store', values='sales', aggfunc=np.mean)

        self.month_table = pd.pivot_table(self.data, index='month', values='sales', aggfunc=np.mean)

        self.dow_item_table = pd.pivot_table(self.data, index='dayofweek', columns='item',
                                             values='sales', aggfunc=np.mean)
        
    def _predict_base_seasonality(self, item, store, date):
        dow, month, year = date.dayofweek, date.month, date.year
        base_sales = self.store_table.at[store, 'sales']
        seasonal_sales = self.month_table.at[month, 'sales'] * self.dow_item_table.at[dow, item]
        return base_sales * seasonal_sales
    
    
class DumbStoreDayofweek(DumbBase):
    """
    sales = base(item) * s(dayofweek, store) * s(month) * trend(year)
    """
    
    name = "dumb_store_dow"
    
    def _fit_base_seasonality(self):
        self.item_table = pd.pivot_table(self.data, index='item', values='sales', aggfunc=np.mean)

        self.month_table = pd.pivot_table(self.data, index='month', values='sales', aggfunc=np.mean)

        self.dow_store_table = pd.pivot_table(self.data, index='dayofweek', columns='store',
                                             values='sales', aggfunc=np.mean)
        
    def _predict_base_seasonality(self, item, store, date):
        dow, month, year = date.dayofweek, date.month, date.year
        base_sales = self.item_table.at[item, 'sales']
        seasonal_sales = self.month_table.at[month, 'sales'] * self.dow_store_table.at[dow, store]
        return base_sales * seasonal_sales
    
    
class DumbItemStoreDayofweek(DumbBase):
    """
    sales = base() * s(dayofweek, item, store) * s(month) * trend(year)
    """
    
    name = "dumb_item_store_dow"
    
    def _fit_base_seasonality(self):
        self.base_scalar = 1.

        self.month_table = pd.pivot_table(self.data, index='month', values='sales', aggfunc=np.mean)

        self.dow_item_store_table = pd.pivot_table(self.data, index=['item', 'store'],
                                                   columns='dayofweek',
                                                   values='sales', aggfunc=np.mean)
        
    def _predict_base_seasonality(self, item, store, date):
        dow, month, year = date.dayofweek, date.month, date.year
        base_sales = self.base_scalar
        seasonal_sales = self.month_table.at[month, 'sales'] * self.dow_item_store_table.at[(item, store), dow]
        return base_sales * seasonal_sales

In [None]:
# --- Tweaking item-store dependencies

dumb_original_cv = pd.read_csv('../input/sales-dumb-all-alt-cv2017/cv_dumb_original.csv', index_col=[0,1,2,3])
dumb_item_dow_cv = pd.read_csv('../input/sales-dumb-all-alt-cv2017/cv_dumb_item_dow.csv', index_col=[0,1,2,3])
dumb_store_dow_cv = pd.read_csv('../input/sales-dumb-all-alt-cv2017/cv_dumb_store_dow.csv', index_col=[0,1,2,3])
dumb_item_store_dow_cv = pd.read_csv('../input/sales-dumb-all-alt-cv2017/cv_dumb_item_store_dow.csv', index_col=[0,1,2,3])
print('N rows = ', dumb_original_cv.shape[0])

In [None]:
df_dumb_smape_aggr = create_df_cv_smape_arrg(OrderedDict((
    ('Dumb HC Original', dumb_original_cv),
    ('Dumb HC Item DOW', dumb_item_dow_cv),
    ('Dumb HC Store DOW', dumb_store_dow_cv),
    ('Dumb HC Item-Store DOW', dumb_item_store_dow_cv),
)))
_ = df_dumb_smape_aggr.plot(figsize=(10, 3), style='o--', title='Dumb HC CV SMAPE')

df_dumb_smape_aggr['Dumb HC Item DOW'] -= df_dumb_smape_aggr['Dumb HC Original']
df_dumb_smape_aggr['Dumb HC Store DOW'] -= df_dumb_smape_aggr['Dumb HC Original']
df_dumb_smape_aggr['Dumb HC Item-Store DOW'] -= df_dumb_smape_aggr['Dumb HC Original']
df_dumb_smape_aggr.drop(columns=['Dumb HC Original'], inplace=True)
_ = df_dumb_smape_aggr.plot(figsize=(10, 3), style='o--', title='Difference with Dumb HC Original')

In [None]:
show_cv_smape_by_time_oos_2(dumb_original_cv, 'Dumb HC Original')
show_cv_smape_by_time_oos_2(dumb_item_dow_cv, 'Dumb HC Item DOW')
show_cv_smape_by_time_oos_2(dumb_store_dow_cv, 'Dumb HC Store DOW')
show_cv_smape_by_time_oos_2(dumb_item_store_dow_cv, 'Dumb HC Item-Store DOW')

<a id="SectionOutro"></a>
# Outro

For our final submission, we selected the original "dumb" model (with a hardcoded trend) and the "dumb" model with weekly seasonalities fit for every item separately. The latter had a low and stable cross-validated error and a pretty good public score of 13.845. This choice has payed off: the model got a private score of 12.580 and brought our team to Top 1! Yay! It was quite close to the second best private score of 12.584 by jnng. 

Going through this kernel again, I think the key points contribusing to *Fantastic 2*'s victory in this competions were:
- Realization that the data is synthetic;
- Ideas inspired by other public kernels;
- Careful use of a public score to tune the trend;
- Cross-validation to select the best model;
- Luck =)

On ideas from public kernels: [XYZT's kernel](https://www.kaggle.com/thexyzt/keeping-it-simple-by-xyzt) made a big difference. To be honest, in this era of Gradient Boosting and DNN, I'm not sure we would have thought of such a simple approach to model sales volumes. Simple yet very effective! The leaderboard changed drastically after XYZT shared it - it was very funny to see how at least a dozen top submissions appreared by simply forking his script ^^

And... that's it folks! Thanks for reading. Please share your thoughts in the comment section. And cross-validate responsibly.

> \---------------------------
>
> Stay awesome Kaggle,
>
> Mysterious Ben
>
> \---------------------------