<a href="https://www.kaggle.com/code/lorresprz/forecasting-sales-with-simple-prophet?scriptVersionId=144954251" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

This work is a competition notebook in the Playground series forecasting fictional course sales. Here we used Facebook Prophet - without any customized parameters (such as seasonality mode), additional regressors or exogenous factors - to predict each individual time series in the dataset and combine the results together for submission. Unfortunately, due to the late entry into the submission, there wasn't enough time to explore more sophisticated models or to improve on the existing model. 

In [None]:
from prophet import Prophet

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Preliminary data exploration

In [None]:
df = pd.read_csv('/kaggle/input/playground-series-s3e19/train.csv')

In [None]:
df

In [None]:
list_store = df['store'].unique()
list_country = df['country'].unique()
list_prod = df['product'].unique()

print(f'Unique country values:', list_country)
print(f'Unique store values: ', list_store)
print(f'Unique products:', list_prod)

In what follows, we will extract individual time series in the dataset (there are 74 of them), and call each time series  $Y[i,j,k]$ where 

 - $i = 0, \ldots, 4$ labels  *country* 
 - $j = 0, \ldots, 2$ labels *store*
 - $k = 0, \ldots, 4$ labels *product* 

In [None]:
#Extract the time series Y_ijk corresponding to country i, store j, product k
Y = {}
for i in range(len(list_country)):
    for j in range(len(list_store)):
        for k in range(len(list_prod)):
            Y[i,j,k] = df[(df['country']==list_country[i])& (df['store']==list_store[j]) & (df['product']==list_prod[k])]
            Y[i,j,k] = Y[i,j,k].drop(['country','store','product'], axis = 1)
            Y[i,j,k]['date'] = pd.to_datetime(Y[i,j,k]['date'])

In [None]:
#write a function to plot the time series of 'num_sold' vs 'date' for any chosen values of the tuple (i,j,k)
def plot_time_series(i,j,k):
    fig, ax = plt.subplots(figsize = (36,12))
    #ax.plot(x, y, linewidth=2.0)
    ax.plot( Y[i,j,k]['date'], Y[i,j,k]['num_sold'], color = 'C2')
    #sns.lineplot(x='date', y='num_sold', data=Y0)
    ax.set_xlabel('date', fontsize = 24)
    ax.set_ylabel('num_sold', fontsize = 24)
    ax.set_title(list_prod[k]+ '---'+ list_country[i] +'---'+ list_store[j] ,
                 fontsize=36)
    plt.show()

Plot a few time series to visualize

In [None]:
i_n, j_n, k_n = 1,2,2
for i in range(i_n):
    for j in range(j_n):
        for k in range(k_n):
            plot_time_series(i,j,k)

## Using Prophet to predict

In [None]:
#Test dataset 
df_test = pd.read_csv('/kaggle/input/playground-series-s3e19/test.csv')
df_test

In [None]:
#Extract the time series Y_ijk for df_test (containing the date data for 2022)
Yt = {}
for i in range(len(list_country)):
    for j in range(len(list_store)):
        for k in range(len(list_prod)):
            Yt[i,j,k] = df_test[(df_test['country']==list_country[i])& (df_test['store']==list_store[j]) & (df_test['product']==list_prod[k])]
            Yt[i,j,k] = Yt[i,j,k].drop(['country','store','product'], axis = 1)
            Yt[i,j,k]['date'] = pd.to_datetime(Yt[i,j,k]['date'])
            Yt[i,j,k] = Yt[i,j,k].set_index('id')

In [None]:
#Create an empty list to store 365 predictions for all Y[i,j,k] series
pred_list = np.empty((len(list_country), len(list_store), len(list_prod), 365))
pred_list.shape

Create a loop to loop through all $(i,j,k)$ values in which Prophet is used to predict the next 365 days worth of sales for each $Y[i,j,k]$ series. 

In [None]:
#These are used to silent some of the warnings and verbose outputs
import logging
logging.getLogger("prophet").setLevel(logging.ERROR)
logging.getLogger("cmdstanpy").setLevel(logging.ERROR)
import matplotlib as mpl
mpl.rcParams['figure.max_open_warning'] = 0

In [None]:
for i in range(len(list_country)):
    for j in range(len(list_store)):
        for k in range(len(list_prod)):                      
            #input dataframe
            dfi = Y[i,j,k][['date', 'num_sold']].copy()
            #renaming the columns of the input dataframe to Prophet's format
            dfi.columns = ['ds', 'y']
            m = Prophet()
            m.fit(dfi)
            #Create another dataframe holding future date values
            ft = m.make_future_dataframe(periods = 365)
            #Calling m.predict returns another dataframe
            forecast = m.predict(ft)
            
            #Extract the forecast value stored in 'yhat' column
            pred_list[i,j,k] = forecast['yhat'][1826:]
            
            #Store the 365 predictions for 2022 in Yt[i,j,k]
            y_test_idx = Yt[i,j,k][Yt[i,j,k]['date']>='2022-01-01'].index
            Yt[i,j,k].loc[y_test_idx, 'num_sold'] = pred_list[i,j,k]
  
            #Plot the forecast and real values
            fv= forecast[['ds','yhat']].copy()
            train_idx = fv.index <= 1825
            test_idx = fv.index>1825
            fv.loc[train_idx,'real'] = dfi['y'].to_numpy()
            fv = fv.set_index('ds')
            #print(f'Forecasting for series {(i,j,k)}')
            plt.rcParams.update({'font.size': 26})
            fv[['real', 'yhat']].plot(figsize = (30,10),
                                      title = f'Forecast for series {[i,j,k]}--{list_country[i]}--{list_store[j]}--{list_prod[k]}')

**

## Preparing submission file

In [None]:
#Concatenate all Y series together to form a single dataframe
df_p = Yt[0,0,0]
for i in range(len(list_country)):
    for j in range(len(list_store)):
        for k in range(len(list_prod)):
            #This will add (0,0,0) again so need to get rid of (0,0,0) afterwards
            df_p = pd.concat([df_p, Yt[i,j,k]])

In [None]:
#Get rid of the first 365 values corresponding to the extra (0,0,0)
df_putback = df_p[365:]

In [None]:
df_putback = df_putback.sort_values(by ='id')

In [None]:
df_putback

In [None]:
df_sub = pd.read_csv('/kaggle/input/playground-series-s3e19/sample_submission.csv')


In [None]:
df_sub['num_sold'] = np.round(df_putback['num_sold'].to_numpy())

In [None]:
df_sub.to_csv('submission.csv', index = False)