### Objective:  Sales Forecasting Using Facebook Prophet

#### Summary Of Steps Followed: 
1. Exploratory Data Analysis to undestand different data, its features and associated trend 
2. Extract Inference from Data Analysis 
3. Prepare Data for Training & Testing Model 
4. Create Forecast Model using Facebook Prophet 
5. Create Future Data Frame duration for which we need predection values 
6. Predect Sales for Future Data Frame
7. Evaluate accuracy of your model 
8. Conclusion Report

If you like this notebook or learned anything from here. Share a token of appreciation by casting an upvote :)

In [None]:
import numpy as np 
import pandas as pd 
import time
import os
import seaborn as sns
import matplotlib.pyplot as plt
from fbprophet import Prophet
import statsmodels.api as sm
from plotly.offline import init_notebook_mode, iplot
from plotly import graph_objs as go
init_notebook_mode(connected=True)
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

### Step 1: Exploratory Data Analysis to undestand different data, its features and associated trend

In [None]:
df_raw= pd.read_csv('/kaggle/input/demand-forecasting-kernels-only/train.csv')
df_test=pd.read_csv('/kaggle/input/demand-forecasting-kernels-only/test.csv')
df_raw.head()

In [None]:
df_raw.describe().T

In [None]:
plt.figure(figsize=(16,6))
sns.barplot(data=df_raw,x='store',y='sales')

### From above analysis we can see that Store 2 has the highest Sales, hence let's take this store for futher analysis to identify which Item is the most selling in recent year.

In [None]:
df_store=df_raw[(df_raw['store']==2) & (df_raw['date']>='2017-01-01')]
df_sc=df_store.copy()
df_sc.loc[:,'month'] = pd.DatetimeIndex(df_sc['date']).month
#df_sc['month'] = pd.DatetimeIndex(df_sc['date']).month
df_sc.head()

In [None]:
df_store1=pd.DataFrame(df_sc.groupby(['month','item']).sum()['sales'])
df_store1.reset_index(inplace=True)

In [None]:
import plotly.express as px
fig = px.line(df_store1, x='month', y='sales',color='item')
fig.show()

### With above visulisation its avident that the most selling item no is 15.

In [None]:
df1=pd.DataFrame(df_raw.groupby('date').sum()['sales'],columns=['sales'])
df2=df1.reset_index()
df2['date']=pd.to_datetime(df2['date'])
df2['sales']=df2['sales']*1.0
df2.head()

In [None]:
plt.figure(figsize=(16,6))
sns.lineplot(data=df2,x='date',y='sales')

In [None]:
df3=df2.set_index(pd.to_datetime(df2['date']))
df3.info()

In [None]:
y = df3['sales'].resample('MS').mean() 
decomposition = sm.tsa.seasonal_decompose(y)
plt.figure(figsize=(16,12))
decomposition.plot()

### With below decomposition we can see that the model is Additive, since the seasonal component is similar (not getting multiplied) over the period of time. 

In [None]:
df4=df3.reset_index(drop=True)
df4.columns=['ds','y']
df4.head()

In [None]:
#df4['ds'].dt.strftime('%Y-%m')
df4['year'] = pd.DatetimeIndex(df4['ds']).year
df4['month'] = pd.DatetimeIndex(df4['ds']).month
df4['week'] = df4['ds'].dt.strftime('%A')
#https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Period.strftime.html
df4.tail()

In [None]:
plt.figure(figsize=(10,6))
sns.lineplot(data=df4,x='year',y='y',ci=1)

In [None]:
plt.figure(figsize=(16,6))
sns.lineplot(data=df4,x='month',y='y', hue='year',ci=1)

In [None]:
plt.figure(figsize=(16,6))
sns.lineplot(data=df4,x='week',y='y',sort='True',hue='year',ci=1)

In [None]:
#df4.head()
df_raw.tail()

## Step 2: Inference from EDA: 
1. Store 2 has maxium sales in which the most popular Item is number 15
2. With Trend analysis we can see that the Model is Additive (seasonal component is similar over period of time, not multiplying) 
3. Trend is positive with increase in sales over the year 
4. Sales increases in Q2, where in Maximum sales has been observed during month of July after which it again decreases 
5. With Weekly Trend we can infer that consumers prefer to shop during weekends. Sales trend increases from Friday with maximum sales on Sunday and then it again gets back to normal during weekdays.

### Step 3: Prepare Data for Training & Testing Model  

In [None]:
df_train = df_raw[(df_raw['item'] == 15) & (df_raw['store'] == 2) & (df_raw['date']<='2016-12-31')]
df_train.columns=['ds','store','item','y']
#Renaming is required since Facebook Prohet requires date column with name as ds and metric column name as y
df_train.tail()

### Step 4: Create Forecast Model using Facebook Prophet 

In [None]:
m = Prophet(yearly_seasonality=True, weekly_seasonality=True)
m.fit(df_train[['ds','y']])

### Step 5: Create Future Data Frame duration for which we need predection values
In this case we are keeping it as 365 i.e. next complete year. 

In [None]:
future = m.make_future_dataframe(periods=365)
future.tail(n=3)

### Step 6: Predict Sales for Future Data Frame

In [None]:
forecast = m.predict(future)
forecast.head(n=3)

In [None]:
m.plot(forecast)

In [None]:
m.plot_components(forecast)

### Step 7: Evaluate Accuracy Of Your Model 

In [None]:
df_orig = df_raw[(df_raw['item'] == 15) & (df_raw['store'] == 2)]
df_orig.columns=['ds','store','item','y']
df_orig.loc[:,('ds')]=pd.to_datetime(df_orig['ds'])

In [None]:
df_forecast=forecast[['ds','yhat_lower','yhat_upper','yhat']]
df_result= pd.merge(df_orig,df_forecast,on='ds')
df_result.tail()

In [None]:
from fbprophet.diagnostics import cross_validation
df_cv = cross_validation(m, initial='730 days', period='90 days', horizon = '365 days')

In [None]:
from fbprophet.diagnostics import performance_metrics
df_p = performance_metrics(df_cv)
df_p.tail()

In [None]:
from fbprophet.plot import plot_cross_validation_metric
fig = plot_cross_validation_metric(df_cv, metric='mape')

> As you can see from above Cross Validation Report that the MAPE across all the date range is less then 0.1 i.e. 10% which is decent % Error. 

> ** Below is another traditional way of calculating MAPE, lets see what results it gives. 

In [None]:
def mean_absolute_percentage_error(y_true, y_pred): 
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

In [None]:
mean_absolute_percentage_error(df_result['y'],df_result['yhat'])

In [None]:
df_result_2017= df_result[df_result['ds']>='2017-01-01']
mean_absolute_percentage_error(df_result_2017['y'],df_result_2017['yhat'])

> With this we can see that the MAPE value is similar to what we have calcualted via prophet metric validation report.
Lets plot these error & see its quantification.

In [None]:
df_result['y - yhat']=df_result['y'] - df_result['yhat']
plt.figure(figsize=(16,6))
sns.lineplot(data=df_result,x='ds',y='y - yhat')

> Lets get the stats of Actuls, Forecasted & Error Values.

In [None]:
df_result.describe().T

### Conclusion: 

1. FB Prophet works very well for this time series sales data set, its is givig us MAPE <10% which means accuracy of >90% 
2. With above measurement report & stats we can visualize that the variations (y-yhat) between actuals (y) & predicted (yhat)  value, which seems pretty good. 


> If you like this notebook or learned anything from here. Share a token of appreciation by casting an upvote. :) 

Cheers For The Solution! 
