# Use FB Prophet for Time-series Forecasting: Six Group merchant trasactions

Prophet is an open source library developed by Facebook which aims to make time-series forecasting easy and scalable. It is a type of a generalized additive model (GAM), which uses regression model with potentially non-linear smoothers. It is called additive because it addes multiple decomposed parts to explain some trends. For example, Prophet uses the following components: 

$$ y(t) = g(t) + s(t) + h(t) + e(t) $$

where,  
$g(t)$: Growth. Big trend. Non-periodic changes.   
$s(t)$: Sesonality. Periodic changes (e.g. weekly, yearly, etc.) represented by Fourier Series.  
$h(t)$: Holiday effect that represents irregular schedules.   
$e(t)$: Error. Any idiosyncratic changes not explained by the model. 

# Table of Contents 
1. [Prepare Data](#prep)
2. [Train And Predict](#train)
3. [Check Components](#components)
4. [Evaluate](#eval)
5. [Trend Change Points](#trend)
6. [Seasonality Mode](#season)
7. [Saving Model](#save)
8. [References](#ref)

<a id=prep></a>
# 1. Prepare Data

The goal of the time series model is to predict the six group merchant transactions. Prophet requires at least two columns as inputs: a ds column and a y column.

 * The ds column has the time information. Currently we have the date as the index, so we name this index as ds.
 * The y column has the time series transaction values. In this example, because we are predicting six group merchant transactions, the column name for the transactions is named y.

In [None]:
# importing packages
import json
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tools.eval_measures import rmse

# prophet functionalities we will explore
from prophet import Prophet
from prophet.plot import add_changepoints_to_plot, plot_cross_validation_metric
from prophet.diagnostics import cross_validation, performance_metrics 
from prophet.serialize import model_to_json, model_from_json
# Model performance evaluation
import sklearn
# import the math module 
import math 
# to mute Pandas warnings Prophet needs to fix
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [None]:
# you need to change the file path
data_path = "../data/ninety_sum.csv"
df = pd.read_csv(data_path)

In [None]:
df.columns = ['ds', 'y']
df.head()

In [None]:
# plot raw data 
fig, ax = plt.subplots(figsize=(12, 7))
plt.plot(df['ds'], df['y'])
plt.xlabel('Time period')
plt.ylabel('No of Transactions')
plt.title('Six Group: No of Transactions vs Time period')
plt.grid(True)
plt.tight_layout()
plt.show()

For train test split, do not forget that we cannot do a random split for time-series data. We use ONLY the earlier part of data for training and the later parts of data for testing given a cut-off point. Here, let's use 2019/1/1 as our cut-off point. 

In [None]:
# Train test split the date need to be changed 
train_end_date = '2022-04-30'
# split data 
train = df[df['ds'] <= train_end_date]
test = df[df['ds'] > train_end_date]

In [None]:
print(f"Number of months in train data: {len(train)}")
print(f"Number of months in test data: {len(test)}")

<a id=train></a>
# 2. Train And Predict

Let's train a Prophet model. You just initialize an object and `fit`! 

Prophet warns that it disabled weekly and daily seasonaility. That's fine because our data set is monthly and does not have more granular interval to capture weekly or daily seasonality.

In [None]:
# fit model - ignore train/test split for now 
m = Prophet(interval_width=0.99, seasonality_mode='multiplicative')
m.fit(train)

When making predictions with Prophet, we need to prepare a special object called future dataframe. It is a Pandas DataFrame with a single column `ds` that includes all datetime within the training data plus additional periods given by user. 

The parameter `periods` is basically the number of points (rows) to predict after the end of the training data. The interval (parameter `freq`) is set to 'D' (day) by default, so we need to adjust it to 'MS' (month start) as our data is monthly. I set `periods=7` as it is the number points in the test data.

In [None]:
# future dataframe - placeholder object
future = m.make_future_dataframe(periods=len(test), freq='M')  # one period = one row = 1 month with freq='MS'

In [None]:
# start of the future df is same as the original data 
future.head()

In [None]:
# end of the future df is original + 21 periods (21 months)
future.tail()

It's time to make actual predictions. It's simple - just `predict` with the placeholder DataFrame `future`. 

In [None]:
# predict the future
forecast = m.predict(future)
m.plot(forecast); # Add semi-colon to remove the duplicated chart

Prophet has a nice built-in plotting function to visualize forecast data. Black dots are for actual data and blue lines are prediction. You can also use matplotlib functions to adjust the figure, such as adding legend or adding xlim or ylim.

In [None]:
# Prophet's own plotting tool to see 
fig = m.plot(forecast)
plt.legend(['Actual', 'Prediction', 'Uncertainty interval'])
plt.show()

<a id=components></a>
# 3. Check Components

So, what is in our forecast data? Let's take a look.

In [None]:
forecast.head()

There are many components in it but the main thing that you would care about is `yhat` which has the final predictions. `_lower` and `_upper` flags are for uncertainty intervals. 

- Final predictions: `yhat`, `yhat_lower`, and `yhat_upper`

Other columns are components that comprise the final prediction as we discussed in the introduction. Let's compare Prophet's additive components and what we see in our forecast DataFrame. 

$$y(t) = g(t) + s(t) + h(t) + e(t) $$

- Growth ($g(t)$): `trend`, `trend_lower`, and `trend_upper`
- Sesonality ($s(t)$): `additive_terms`, `additive_terms_lower`, and `additive_terms_upper`
    - Yearly seasonality: `yearly`, `yearly_lower`, and`yearly_upper`

The `additive_terms` represent the total seasonality effect, which is the same as yearly seasonality as we disabled weekly and daily seasonalities. All `multiplicative_terms` are zero because we used additive seasonality mode by default instead of multiplicative seasonality mode, which I will explain later.

Holiday effect ($h(t)$) is also not represented here as it's hourly data and we did not specify holidays for this data.

Prophet also has a nice built-in function for plotting each component. When we plot our forecast data, we see two components; general growth trend and yearly seasonality that appears throughout the years. 

In [None]:
# plot components
fig = m.plot_components(forecast)

<a id="eval"></a>
# 4. Evaluate 

## 4.1. Evaluate the model on one test set

So, how good is our model? One way we can understand the model performance in this case is to simply calculate the root mean squared error (RMSE) between the actual and predicted values of the above test period.

In [None]:
predictions = forecast.iloc[-len(test):]['yhat']
actuals = test['y']

print(f"RMSE: {round(rmse(predictions, actuals))}")

## 4.2. Cross validation

Alternatively, we can perform cross validation. As previously discussed, time-series analysis strictly uses train data whose time range is always earlier than that of test data. Below is an example where we use 5 years of train data to predict 1 year of test data. Each cut-off point is equally spaced with 1 year gap.

Prophet also provides built-in model diagnostics tools to make it easy to perform this cross validation. You just need to define three parameters: horizon, initial, and period. The latter two are optional. 

Make sure to define these parameters in straing and in this format: 'X unit'. X is the number and unit is 'days' or 'secs', etc. that is compatiable with `pd.Timedelta`.

In [None]:
# horizon = test period of each fold
horizon = '91 days'

# initial: training period. (optional. default is 3x of horizon)
initial = str(91 * 2) + ' days'  

# period: spacing between cutoff dates (optional. default is 0.5x of horizon)
period = str(91 * 2) + ' days' 

df_cv = cross_validation(m, initial=initial, period=period, horizon=horizon)

This is the predicted output using cross validation. When there are many predictions for the same timestamp due to overlapping test periods is case of `period` shorter than `horizon`, Prophet will use the ones predicted using the latest data.

In [None]:
# predicted output using cross validation
df_cv

Below is different performance metrics for different rolling windows. As we did not define any rolling window, Prophet went ahead and calculated many different combinations and stacked up in rows. Each metrics are first calculated within each rolling window and then averaged across many available windows. 

In [None]:
# performance metrics  
df_metrics = performance_metrics(df_cv)  # can define window size, e.g. rolling_window=365
df_metrics

<a id="trend"></a>
# 5. Trend Change Points

Another interesting functionality of `Prophet` is `add_changepoints_to_plot`. As we discussed in the earlier sections, there are a couple of points where the growth rate changes. Prophet can find those points automatically and plot them!

In [None]:
# plot change points
fig = m.plot(forecast)
a = add_changepoints_to_plot(fig.gca(), m, forecast)

<a id=season></a>
# 6. Seasonality Mode

The growth in trend can be additive (rate of change is linear) or multiplicative (rate changes over time). When you see the original data below, the amplitude of seasonality of the data is changing - smaller in the early years and bigger in the later years. So, this would be `multiplicative` growth case rather than `additive` growth case. We can adjust the `seasonality` parameter so we can take into account this effect. 

In [None]:
# additive mode
m = Prophet(seasonality_mode='additive', weekly_seasonality=False, daily_seasonality=False)
m.fit(df)
# make a future data frame for the next 20 months
future_months = 20
future = m.make_future_dataframe(future_months, freq='MS')
forecast = m.predict(future)
fig = m.plot(forecast)
plt.title('Additive seasonality')

In [None]:
# multiplicative mode
m = Prophet(seasonality_mode='multiplicative', weekly_seasonality=False, daily_seasonality=False)
m.fit(df)
future = m.make_future_dataframe(future_months, freq='MS')
forecast = m.predict(future)
fig = m.plot(forecast)
plt.title('Multiplicative seasonality')

<a id=save></a>
# 7. Saving Model

We can also easily export and load the trained model as json.

In [None]:
# Save model
with open('serialized_model.json', 'w') as fout:
    json.dump(model_to_json(m), fout)

# Load model
with open('serialized_model.json', 'r') as fin:
    m = model_from_json(json.load(fin))  

<a id=ref></a>
# 8. References

[1] [Prophet Documentation](https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html)