#### Title
In this notebook I would like to share my experience with Facebook [Prophet](https://facebook.github.io/prophet/) in TPS2022 competition.<br>
I never ended up getting any good scores, but maybe this notebook can be useful for those who are interested in Prophet

In [None]:
import numpy as np 
import pandas as pd
pd.options.display.max_rows = None
pd.options.display.max_columns = None
import os

from fbprophet import Prophet

import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

This hidden cell is used to mute Prophet (that is very "noisy")

In [None]:
class suppress_stdout_stderr(object):
    def __init__(self):
        # Open a pair of null files
        self.null_fds = [os.open(os.devnull, os.O_RDWR) for x in range(2)]
        # Save the actual stdout (1) and stderr (2) file descriptors.
        self.save_fds = (os.dup(1), os.dup(2))

    def __enter__(self):
        # Assign the null pointers to stdout and stderr.
        os.dup2(self.null_fds[0], 1)
        os.dup2(self.null_fds[1], 2)

    def __exit__(self, *_):
        # Re-assign the real stdout/stderr back to (1) and (2)
        os.dup2(self.save_fds[0], 1)
        os.dup2(self.save_fds[1], 2)
        # Close the null files
        os.close(self.null_fds[0])
        os.close(self.null_fds[1])

#### Data loading and some preparation

In [None]:
train_raw = pd.read_csv('../input/tabular-playground-series-jan-2022/train.csv', parse_dates = ['date'])
test_raw = pd.read_csv('../input/tabular-playground-series-jan-2022/test.csv', parse_dates = ['date'])
full = pd.concat([train_raw, test_raw])

We have 3 contries, 2 stores and 3 products. It means that actually we have 3 x 2 x 3 = 18 segments and have to do all operations 18 times, for each segment.<br>
Here I create and encode 18 segments (0-17) from ```country```, ```store``` and ```product``` features.

In [None]:
full['seg'] = (full['country'] + full['store'] + full['product']).astype('category').cat.codes
seg_count = full['seg'].nunique()

I have to replace conutries names by its prefix to use it in built-in holidays feature in Prophet

In [None]:
full = full.replace({'country' : { 'Finland' : 'FI', 'Norway' : 'NO', 'Sweden' : 'SE' }})

#Drop unused columns
full = full.drop(['store', 'product'], axis=1)

#Rename columns in accordance with Prophet requirements
full.columns = ['row_id', 'ds', 'country' , 'y', 'seg']

Split validation set from train set. Even though Prophet has uts own cross valdation module I think train data is not enough that's why I wil use 2018 year for validation and all the rest for training and will use simple validation instead of cross validation.

In [None]:
test = full[full['ds'] >= '2019-1-1']
valid = full[(full['ds'] >= '2018-1-1') & (full['ds'] < '2019-1-1') ]
train = full[full['ds'] < '2018-1-1']

test = test.drop('y', axis=1)
print(train.shape, valid.shape, test.shape, 'Segments:', seg_count)

Although Prophet has buil-in holdays database, my experience has shown that its effect is not enough for Easter and Christmas holidays. This is why I decided to add these dates once more to get multiplicative effect. Likely Prophet let me do this.<br>
```lower_window``` and ```upper_window``` parameters set holidays duration from the given date

In [None]:
#Easter holidays
Easter = pd.DataFrame({
'holiday': 'Easter',
'ds': pd.to_datetime(['2015-04-05', '2016-03-27', '2017-04-16', '2018-04-1', '2019-04-21']),
'lower_window': 0,
'upper_window': 7,
})
    
#Christmas holidays
Christmas = pd.DataFrame({
'holiday': 'Christmas',
'ds': pd.to_datetime(['2015-12-24', '2016-12-24', '2017-12-24', '2018-12-24', '2019-12-24']),
'lower_window': -3,
'upper_window': 4,
})

add_holidays = pd.concat([Easter, Christmas])

In this function I'm setting up the model. Even though Prophet has some hyperparameters to tune after a cople days of playing I decided to use only a couple of them to avoid overfitting.<br>
You can read [here](https://facebook.github.io/prophet/docs/diagnostics.html#hyperparameter-tuning) about parameters that can be tuned

In [None]:
def proph (Train, Valid, Seg):
    #Local variables
    TTrain = Train.loc[Train['seg'] == Seg]
    VValid = Valid.loc[Valid['seg'] == Seg]
    Country = TTrain.iloc[0]['country']

    #Model
    m = Prophet(
                holidays = add_holidays, #Additional Easter and Christmas holidays
                yearly_seasonality=52,
                seasonality_mode='multiplicative',
                )
    
    #Built-in country holidays
    m.add_country_holidays(country_name = Country)
    
    #Silent script
    with suppress_stdout_stderr():
        m.fit(TTrain[['ds', 'y']])
        
    forecast = m.predict(VValid['ds'].to_frame())
    
    #Re-index forecast to keep original indexes for every segment
    forecast_idx = forecast.set_index(VValid.index)
    return forecast_idx, m, forecast 

#### Validation

In [None]:
result = valid[['ds','seg','y']]
for i in range(seg_count):
    temp, M, Forecast = proph(train, valid, i)
    result.loc[result['seg'] == i, 'yhat'] = round(temp['yhat'])
#Calculate SMAPE
result['smape'] = 200 * abs(result.yhat - result.y) / (abs(result.yhat) + abs(result.y))
#Calculate percentage error (PE)
result['pe'] = 100 * (result.yhat - result.y) / result.y
print('SMAPE on validation is:', "%.3f" % (result['smape'].mean()))

#### Plots
Caution, long output

In [None]:
for x in range(seg_count):
    Title1 = 'Real values / predictions for segment ' + str(x)
    Title2 = 'Percentage error for segment ' + str(x)
    #Plot real values
    ax = result[result['seg'] == x].set_index('ds')['y'].plot(
        color = 'gray', figsize=(18, 4),
        title = Title1, label = 'Real values')
    #Plot predictions
    result[result['seg'] == x].set_index('ds')['yhat'].plot(
        color = 'red', ax=ax, label = 'Predictions')
    plt.legend(loc="upper center")
    plt.show()
    #Plot error
    ax2 = result[result['seg'] == x].set_index('ds')['pe'].plot(
        color = 'blue', figsize=(18, 4), title=Title2)
    plt.show()

#### Predictions

In [None]:
pred = test[['row_id', 'ds','seg']]
for i in range(seg_count):
    temp2, a, b = proph(pd.concat([train, valid]), test, i)
    pred.loc[pred['seg'] == i, 'yhat'] = round(temp2['yhat'])
print('Predictions complete')
pred.head()

#### Submission

In [None]:
sub = pd.read_csv('../input/tabular-playground-series-jan-2022/sample_submission.csv')
sub['num_sold'] = pred['yhat']
sub.to_csv('submission.csv', index = False)