# Application of Facebook's PROPHET algorithm for sucessful Sales Forecasting of avocado prices in the US

## Requirements


In [None]:
#pip install pandas
#pip install fbprophet
#pip install plotly

## Importing required modules

In [None]:
# Import modules
import pandas as pd 
from fbprophet import Prophet 
from fbprophet.plot import add_changepoints_to_plot
from sklearn.metrics import mean_squared_error, mean_absolute_error 
import numpy as np 

## Reading the CSV - Avocado Sales

In [None]:
# Read Data
data = pd.read_csv("../input/avocado-prices/avocado.csv")
data.head()

### Data Preprocessing

We should aggregate data accordingly to our objective. In this example we will aggregate data per month since we intend to analyze monthly forecasts of the avocado sales.

Depending on our objective we can predict and train the model with different parameters such as Total turn over, profit, etc. In this example we will train our model to predict the number of avocados that will be sold.

In [None]:
data['Date'] = data['Date'].str[:-3] #Remove day

aggregation_functions = {'Total Volume': 'sum'}
data = data.groupby(data['Date']).aggregate(aggregation_functions).reset_index()
data.head()

## Creating the dataframe
We convert the data into the convention of ds (date timeseries) and y (output) to be predicted.  
In this case, it is the number of monthly avocado sales.

In [None]:
df = pd.DataFrame() 
df['ds'] = pd.to_datetime(data['Date']) 
df['y'] = data['Total Volume'] 
df.head()

To test our model we decided to split the data in train and test where the test data will be the last 3 months

In [None]:
split_date = "2018-01-01"
df_train = df[df['ds'] < split_date].copy()
df_test = df[df['ds'] >= split_date].copy()

## The model - GAM
The model used in prohpet is called General Additive model - (GAM), and it is a generalized linear model in which the linear response variable depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions, so it basically models a time-series as the sum of different components (non-linear trend, periodic components and holidays or special events) and allows to incorporate extra-regressors (categorical or continuous).

## Initialize the model
The first step in the Facebook Prophet is to instantiate the model, it is here that you can set the prior scales for each component of your time-series, as well as the number of Fourier series to use to model the sales components. In this case we dont add any extras such as the country holidays or seasonality mode, since the datasets country wasnt provided and therefore we cant make assumptions about the country.

To pick the scale, the general rule is that larger prior scales and larger number of Fourier series will make the model more flexible, but at the potential cost of generalisation: i.e. the model might overfit, learning the noise (rather than the signal) in the training data, but giving poor results when applied to yet unseen data (the test data).  
Setting these hyperparameters is a fine tuning that is specific to the problem, and is sometimes the most time consuming task that clearly improves the results.

In [None]:
## The model - GAM
m = Prophet() 
m.fit(df_train) 

## Testing

After the model fitted the training dataset we compared the results with the real values

In [None]:
forecast = m.predict(df_test) 
forecast.tail()

MAPE is used to quantify the overall accuracy of the forecasting framework and calculate the expected level of reliability.
According to the paper approximately 50% of the products (with a sufficiently long historical data) can be forecasted with MAPE < 30% on monthly basis.
Also, the the best selling products will have a lower MAPE.


In [None]:
def mean_abs_perc_err(y_true, y_pred):  
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

In [None]:
print(f"Mean Squared Error (MSE): %.2f" % (mean_squared_error(y_true = df_test["y"], y_pred = forecast['yhat']))) 
print(f"Mean Absolute Error (MAE): %.2f" % (mean_absolute_error(y_true = df_test["y"], y_pred = forecast['yhat']))) 
print(f"Mean Absolute Error (MAPE): %.2f" % (mean_abs_perc_err(y_true = np.asarray(df_test["y"]), y_pred = np.asarray(forecast['yhat']))))

## Generate the future dataframe
Here, we create predictions for the next 5 years, and since the frequency is monthly, we need to create 12*5 periods

In [None]:
m = Prophet(yearly_seasonality=True, \
            daily_seasonality=False, weekly_seasonality=False) 
m.fit(df) #retrain for all the dataset 

future = m.make_future_dataframe(periods=12*5, freq='M')

## Analyze predictions
### Prophet forecasts

In [None]:
forecast = m.predict(future) 
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper', 'trend', 'trend_lower', 'trend_upper']].tail()

### Plotting the model
The below image shows the basic prediction. The light blue is the uncertainty level(yhat_upper and yhat_lower), the dark blue is the prediction(yhat) and the black dots are the original data. We can see that the predicted data is very close to the actual data. In the next five years, there is no data yet, but looking at the performance of our model in years where data is available it is safe to say that the predictions are close to accurate.

#### Adding changepoints

The changepoints could be specified by the analyst using known dates of product launches
and other growth-altering events, or may be automatically selected given a set of candidates.
Automatic selection can be done quite naturally as we demonstrate below.


In [None]:
# Adding changepoints
fig = m.plot(forecast) 
a = add_changepoints_to_plot(fig.gca(), m, forecast)

#### Plotting the components

In [None]:
fig2 = m.plot_components(forecast) 

As we can see, the function to plot components takes into account the yearly seasonality, which takes into account yearly patterns.  

Because of this, we can see that the sale of avocados is somewhat uniform over the year, except for the months of march and november, as we can easily visualize a drop in sales, but we can also see a big increase in sale during the months of August and October.

## Other use cases for prophet analysis

Prophet can be used in a variety of ways to best predict what a certain company decides to predict about their sales. In the last example we predicted the monthly financial turnover but,  if required, this tool can be used to predict other parameters.  

As another example we will predict the weekly volume of products sold for the year, based on the last sales.

Since we are predicting the total number of sales weekly, we can also take into account weekly seasonality and analyze the buying habits of consumers. Doing this, we can clearly see that consumers are way more keen to buying on avocados on sundays, probably since most shopping is done on the weekends.

In [None]:
# Read Data
data = pd.read_csv("../input/avocado-prices/avocado.csv")   #Data was aggregated, need to read it again


df = pd.DataFrame() 
df['ds'] = pd.to_datetime(data['Date']) 
df['y'] = data['Total Volume'] 

m = Prophet(weekly_seasonality=True, \
            daily_seasonality=False, \
            yearly_seasonality=False) 
m.fit(df)
future = m.make_future_dataframe(periods=52, freq='W')  #Predict the next 20 days
forecast = m.predict(future) 
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper', 'trend', 'trend_lower', 'trend_upper']].tail()

fig2 = m.plot_components(forecast) 


## References
https://peerj.com/preprints/3190/  
https://facebook.github.io/prophet/  
https://arxiv.org/ftp/arxiv/papers/2005/2005.07575.pdf  
https://www.kaggle.com/neuromusic/avocado-prices