# Practice with Facebook Prophet

### Installation in Python


In Python you can install Prophet using PyPI:
```
$ pip install fbprophet
```


In [None]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.api as sm
import matplotlib.pyplot as plt

# sharper plots
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

In [None]:
df = pd.read_csv('../input/mediumpostfbprophet/medium_posts.csv')

In [None]:
df.head()

Data yang kita butuhkan hanya published date dan url, duplicate dan na juga perlu di drop

In [None]:
df = df[['published', 'url']].dropna().drop_duplicates()

Published data masih dalam format string, jadi harus di convert ke tipe date terlebih dahulu

In [None]:
df['published'] = pd.to_datetime(df['published'])

Check data, sort by published date

In [None]:
df.sort_values(by=['published']).head(n=3)

Medium itu rilis tanggal August 15, 2012. Kalau dilihat dari data yang kita dapatkan diatas, sepertinya itu ada beberapa dummy data. Jadi kita pastikan saja data yang kita ambil mulai dari 15 Agustus 2021 sampai 26 July 2017

In [None]:
df = df[(df['published'] > '2012-08-15') & (df['published'] < '2017-06-26')].sort_values(by=['published'])
df.head(n=3)

In [None]:
df.tail(n=3)

Kali ini kita ingin memprediksi jumlah postingan di medium, sehingga data yang kita punya harus kita aggregasikan dengan menggunakan count() berdasarkan published date

In [None]:
aggr_df = df.groupby('published')[['url']].count()
aggr_df.columns = ['posts']

Data yang dihasilkan dari proses aggregasi ini bukan jumlah post per hari, mari kita check

In [None]:
aggr_df.head(n=3)

Nah agar datanya bisa jumlah post per day, kita bisa memanfaatkan fitur dari pandas, untuk resample bins sehingga kita bisa mendapatkan data jumlah post perday

In [None]:
daily_df = aggr_df.resample('D').apply(sum)
daily_df.head(n=3)

# Coba plotting data

In [None]:
from plotly.offline import init_notebook_mode, iplot
from plotly import graph_objs as go

# Initialize plotly
init_notebook_mode(connected=True)

In [None]:
def plotly_df(df, title=''):
    """Visualize all the dataframe columns as line plots."""
    common_kw = dict(x=df.index, mode='lines')
    data = [go.Scatter(y=df[c], name=c, **common_kw) for c in df.columns]
    layout = dict(title=title)
    fig = dict(data=data, layout=layout)
    iplot(fig, show_link=False)

In [None]:
plotly_df(daily_df, title='Posts on Medium (daily)')

Kalau kita lihat dari plot diatas, dengan menggunakan daily data, ternyata tampilannya cukup cluttered, coba kita ganti data post per weekly

In [None]:
weekly_df = daily_df.resample('W').apply(sum)

In [None]:
plotly_df(weekly_df, title='Posts on Medium (weekly)')

Coba kita experiment pakai data dari mulai January 2015

In [None]:
daily_df = daily_df.loc[daily_df.index >= '2015-01-01']
daily_df.head(n=3)

### Prediksi dengan Fb Prophet

In [None]:
from fbprophet import Prophet

import logging
logging.getLogger().setLevel(logging.ERROR)

Convert data sesuai dengan format Fb prophet

In [None]:
df = daily_df.reset_index()
df.columns = ['ds', 'y']
# converting timezones (issue https://github.com/facebook/prophet/issues/831)
df['ds'] = df['ds'].dt.tz_convert(None)
df.tail(n=3)

Penulis Fb Prophet menyarankan untuk prediksi sebaiknnya menggunakan data minimum beberapa bulan atau lebih dari setahun historical data. Dalam kasus ini, kita punya data lebih dari setahun, jadi cukup untuk fit ke model. 

Untuk menghitung kualitas prediksi fb prophet, kita harus membagi dataset kita menjadi dua bagian, historical part, bagian terbesar dari data kita dan prediction part. Jadi kita akan menghapus data bulan terkahir dari dataset, bulan terkahir ini akan menjadi target prediksi, dan nanti kita bisa bandingan hasil prediksi dengan data yang sebenarnya. 

In [None]:
prediction_size = 30
train_df = df[:-prediction_size]
train_df.tail(n=3)

Cara simple fit model Fb prophet

In [None]:
m = Prophet()
m.fit(train_df);

Gunakan `Prophet.make_future_dataframe`, untuk generate prediksi

In [None]:
future = m.make_future_dataframe(periods=prediction_size)
future.tail(n=3)

In [None]:
forecast = m.predict(future)
forecast.tail(n=3)

Ketika kita lihat data diatas, ada banyak sekali kolom, termasuk trend dan seasonality dengan confidence intervals. Hasil prediksi disimpan di `yhat` column.

Fb prophet juga sudah menyediakan fitur untuk plot hasil prediksi

In [None]:
m.plot(forecast);

Gambar diatas sepertinya tidak memberikan kita banyak informasi. Sepertinya model nya menganggap banyak data outliers, yang keluar dari range/confidence interval dari prediksi. 

Function `Prophet.plot_components` mungkin lebih berguna di case ini. Kita bisa lihat trend secara umum,trend pekanan, dan trend tahunan 

In [None]:
m.plot_components(forecast);

Kalau kita amati ternyata FB prophet bagus juga fitting modelnya, jumlah post di Medium naik banget diakhir tahun 2016. Kemudian weekend cenderung sedkit post, dan public holiday juga cenderung sedikit post, chrismast dan new year

### Mengevaluasi prediksi

In [None]:
print(', '.join(forecast.columns))

Kita join hasil prediksi dengan data aktual bulan terakhir yang kita hapus sebelumnya

In [None]:
def make_comparison_dataframe(historical, forecast):
    """Join the history with the forecast.
    
       The resulting dataset will contain columns 'yhat', 'yhat_lower', 'yhat_upper' and 'y'.
    """
    return forecast.set_index('ds')[['yhat', 'yhat_lower', 'yhat_upper']].join(historical.set_index('ds'))

In [None]:
cmp_df = make_comparison_dataframe(df, forecast)
cmp_df.tail(n=3)

Evaluasi dengan MAPE dan MAE

In [None]:
def calculate_forecast_errors(df, prediction_size):
    """Calculate MAPE and MAE of the forecast.
    
       Args:
           df: joined dataset with 'y' and 'yhat' columns.
           prediction_size: number of days at the end to predict.
    """
    
    # Make a copy
    df = df.copy()
    
    # Now we calculate the values of e_i and p_i according to the formulas given in the article above.
    df['e'] = df['y'] - df['yhat']
    df['p'] = 100 * df['e'] / df['y']
    
    # Recall that we held out the values of the last `prediction_size` days
    # in order to predict them and measure the quality of the model. 
    
    # Now cut out the part of the data which we made our prediction for.
    predicted_part = df[-prediction_size:]
    
    # Define the function that averages absolute error values over the predicted part.
    error_mean = lambda error_name: np.mean(np.abs(predicted_part[error_name]))
    
    # Now we can calculate MAPE and MAE and return the resulting dictionary of errors.
    return {'MAPE': error_mean('p'), 'MAE': error_mean('e')}

In [None]:
for err_name, err_value in calculate_forecast_errors(cmp_df, prediction_size).items():
    print(err_name, err_value)

Prosentase error MAPE = 22.6%, dan rata-rata model yang kita buat salah kurang lebih 70 posts (MAE).

### Visualization

In [None]:
def show_forecast(cmp_df, num_predictions, num_values, title):
    """Visualize the forecast."""
    
    def create_go(name, column, num, **kwargs):
        points = cmp_df.tail(num)
        args = dict(name=name, x=points.index, y=points[column], mode='lines')
        args.update(kwargs)
        return go.Scatter(**args)
    
    lower_bound = create_go('Lower Bound', 'yhat_lower', num_predictions,
                            line=dict(width=0),
                            marker=dict(color="gray"))
    upper_bound = create_go('Upper Bound', 'yhat_upper', num_predictions,
                            line=dict(width=0),
                            marker=dict(color="gray"),
                            fillcolor='rgba(68, 68, 68, 0.3)', 
                            fill='tonexty')
    forecast = create_go('Forecast', 'yhat', num_predictions,
                         line=dict(color='rgb(31, 119, 180)'))
    actual = create_go('Actual', 'y', num_values,
                       marker=dict(color="red"))
    
    # In this case the order of the series is important because of the filling
    data = [lower_bound, upper_bound, forecast, actual]

    layout = go.Layout(yaxis=dict(title='Posts'), title=title, showlegend = False)
    fig = go.Figure(data=data, layout=layout)
    iplot(fig, show_link=False)

show_forecast(cmp_df, prediction_size, 100, 'New posts on Medium')

Kalau dilihat dari hasil ini, merah (data aktual), biru adalah prediksi, secara mean sepertinya benar. Tetapi model kita ternyata tidak bisa menangkap peak dan dips weekly seasonality. 

Terlihat pula banyak data actual keluar dari confident interval fb prophet model. Ini mungkin karena unstable variance. Coba kita gunakan box-cox transformation.

## Box-Cox Transformation

More info: [Box–Cox transformation](http://onlinestatbook.com/2/transformations/box-cox.html)


In [None]:
def inverse_boxcox(y, lambda_):
    return np.exp(y) if lambda_ == 0 else np.exp(np.log(lambda_ * y + 1) / lambda_)

In [None]:
train_df2 = train_df.copy().set_index('ds')

In [None]:
train_df2.head()

In [None]:
train_df2['y'], lambda_prophet = stats.boxcox(train_df2['y'])
train_df2.reset_index(inplace=True)

New Prophet model

In [None]:
m2 = Prophet()
m2.fit(train_df2)
future2 = m2.make_future_dataframe(periods=prediction_size)
forecast2 = m2.predict(future2)

In [None]:
for column in ['yhat', 'yhat_lower', 'yhat_upper']:
    forecast2[column] = inverse_boxcox(forecast2[column], lambda_prophet)

Compare prediksi result setelah box-cox dengan actual data

In [None]:
cmp_df2 = make_comparison_dataframe(df, forecast2)
for err_name, err_value in calculate_forecast_errors(cmp_df2, prediction_size).items():
    print(err_name, err_value)

Nah, kita bisa improve modelnya nih, sekarang percentage error MAPE 12%

In [None]:
show_forecast(cmp_df, prediction_size, 100, 'No transformations')
show_forecast(cmp_df2, prediction_size, 100, 'Box–Cox transformation')

Resources:
1. https://towardsdatascience.com/getting-started-with-facebook-prophet-20eccb25b06b
2. https://medium.com/analytics-vidhya/forecasting-using-facebooks-prophet-library-ce628e76586b
3. https://www.kaggle.com/jagangupta/time-series-basics-exploring-traditional-ts
4. https://machinelearningmastery.com/time-series-forecasting-with-prophet-in-python/
5. https://www.kaggle.com/kashnitsky/topic-9-part-2-time-series-with-facebook-prophet
