# Time Series Forecasting with Prophet 

## Mini-Vignette: Modeling the Price of Ethereum

By: [Paul Jeffries](https://twitter.com/ByPaulJ) 

Created: 03/27/2019

## What is Prophet?

[Prophet](https://facebook.github.io/prophet/) is an open source package (for both Python and R) for forecasting time series data based on an additive model where non-linear trends are fit wth yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well (according to the pcakage documentation). Prophet is open source software released by Facebook's core Data Science team. It is [available for download](https://facebook.github.io/prophet/docs/installation.html) on CRAN and PyPI. The academic paper (still in peer-review at the time of this writing) [is available here](https://peerj.com/preprints/3190/). 

tl;dr --> Prophet is essentially Facebook's data science team's version of [auto-ARIMA](https://www.analyticsvidhya.com/blog/2018/08/auto-arima-time-series-modeling-python-r/).

## Use Case: Understanding Seasonal & Temporal Patterns in the Price of Ethereum

Prophet is particularly well-suited for a use case like examining the [daily price of ETH](https://coinmarketcap.com/currencies/ethereum/) (aka Ether)--the 2nd largest cryptoasset [after Bitcoin](https://www.youtube.com/watch?v=0UBk1e5qnr4)--as we have multiple years of daily data, and we suppose that there may be seasonal effects at various levels of granularity that moderate price movement in some way. The motivation for this vignette came from some of [my other work examining price movements in the crypto markets](https://github.com/pmaji/crypto-whale-watching-app), in which I observed many traders claiming that there was a discernable "weekend bump" or "weekend bull" effect in crypto markets. To summarize, **there have been many claims that weekends are, *ceteris paribus*, more bullish for ETH trading than weekdays. This methodology will enable us to test the veracity of such claims.** While in this vignette I will build a "forecast" for the price of ETH, the main goal of this vignette is not to arrive at a perfect price forecast. Instead, the goal is to make use of the explanatory capabilities built into the Prophet package to look for evidence that may either support or contradict the 'weekend bull' thesis. 

This vignette doesn't fully exlore anywhere near all of the features of Prophet, and is only meant as an introduction. Hopefully you are inspired by what you see below to think of ways to improve this work, perhaps applying it to other time series forecasting applications as well. I am by no means an expert (either of Prophet specifically, or time-series forecasting writ large), but if you have any questions, feel free to [open an issue here on GitHub](https://github.com/pmaji/data-science-toolkit/issues), and, as always, I'll do my best to respond expeditiously. With that, let's get into the code!

## Basic Setup

In [40]:
# key packages used throughout:

# basic packages
import pandas as pd
import datetime
import numpy as np

# forecasting package
from fbprophet import Prophet
# package to import historical crypto OHLC data
from cryptory import Cryptory

## Pulling Historical Price Data for Ethereum

There are plenty ways that one can access historical OHLC data for top cryptocurrencies. If you're interested in data science projects pertaining to cryptocurrencies and/or how to access pertinent crypto data specifically, I highly recommend checking out two other public projects I maintain that deal in the crypto space:

- [This one here](https://github.com/pmaji/financial-asset-comparison-tool/blob/master/README.md) is a Python app that tracks "whale" (aka large volume) activity in US crypto markets
    - This project pulls OHLC crypto data from [Coin Metrics](https://coinmetrics.io/)
- [This one here](https://github.com/pmaji/crypto-whale-watching-app) is an R Shiny app that allows for the comparison of the performance of traditional financial assets vs. cryptos
    - This project pulls OHLC crypto data [from GDAX's (aka Coinbase Pro's) API](https://docs.pro.coinbase.com/?r=1)
    
That said, for this vignette, I decided to use the very simple and user-friendly [package "cryptory"](https://github.com/dashee87/cryptory) to pull in OHLC data for ETH (sourced from CoinMarketCap). Why ETH... why not Bitcoin or insert-other-cryptocurrency-here? I chose ETH simply because it's the project and community in which I am most personally interested and involved. If you want to see what a forecast for another cryptoasset would look like, simply fork this repo, change one line of code in this notebook, and if you discover something cool, let me know (my twitter is hyperlinked above in the byline)!

In [41]:
# initialize the cryptory object 
# pull data from start of 2015 to present day
my_cryptory_obj = Cryptory(from_date = "2015-01-01")

# get historical ETH OHLC data from coinmarketcap
eth_ohlc_hist_df = my_cryptory_obj.extract_coinmarketcap("ethereum")

In [42]:
# inspect the results
print(eth_ohlc_hist_df.shape)
eth_ohlc_hist_df.head()

(1331, 7)


Unnamed: 0,date,open,high,low,close,volume,marketcap
0,2019-03-29,139.34,142.55,138.05,142.5,5125602702,15025361637
1,2019-03-28,141.01,141.01,138.43,139.42,4163212475,14698609599
2,2019-03-27,135.45,141.08,135.34,140.99,5228240093,14862394451
3,2019-03-26,135.05,135.46,133.76,135.46,4499271679,14277816266
4,2019-03-25,137.08,137.7,133.49,135.03,4480516753,14230733149


From the above we can see that we have some pretty clean data to work with. In most real-world use cases, we won't be that lucky, so if we did have unclean data, here is where in the workflow I would turn to handy packages [like pyjanitor](https://github.com/ericmjl/pyjanitor). If you're interested in one that would look like, you can see that at work in a recent [vignette I wrote up on radar charts here](https://github.com/pmaji/data-science-toolkit/blob/master/eda-and-visualization/radar_charts.ipynb). 

For the purpose of this project, all we really need from the colums above is our date column, and our price column. For price, we could technically choose from any of the OHLC columns, but I'm going to go with closing price. There is a fundamentals vs. technicals argument to be made here about whether I should care more about closing price than market cap, but that's not the purpose of this vignette. If that subject interests you, [here's a good place to start](https://www.thebalance.com/why-per-share-price-is-not-important-3140791) to learn more. 

As for time period of interest, I chose to just look at the past ~1 year of data. The reason for this is that prior to that time period, the cryptocurrency markets were incredibly choppy. There was a relatively [large price bubble](https://seekingalpha.com/article/4234452-crypto-bubble-generation-investors-got-burned) in cryptocurrency markets that peaked around December of 2017, so limiting ourselves to a recent vintage prevents, to some extent, this volatility from affecting our forecast. To put it simply, I want this vignette to only **examine temporal trends in ETH's price after the popping of the 2017 price bubble.**

In [43]:
# subset main df to get just the date and the closing price
eth_price_hist_df = eth_ohlc_hist_df.loc[:,['date','close']]

# then filtering to only data from on or after April 1st of 2018
eth_price_hist_df = eth_price_hist_df.loc[eth_price_hist_df['date']>='01APR018',:]

# rename date column to ds and continous value column to y per fbprophet's docs
eth_price_hist_df.columns = ['ds', 'y']

In [44]:
# re-inspecting our df post-subsetting and filtering
print(eth_price_hist_df.shape)
eth_price_hist_df.head()

(363, 2)


Unnamed: 0,ds,y
0,2019-03-29,142.5
1,2019-03-28,139.42
2,2019-03-27,140.99
3,2019-03-26,135.46
4,2019-03-25,135.03


In [45]:
m = Prophet(
)
m.fit(eth_price_hist_df)

  min_dt = dt.iloc[dt.nonzero()[0]].min()
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
  elif np.issubdtype(np.asarray(v).dtype, float):


<fbprophet.forecaster.Prophet at 0x11fe03588>

In [None]:
# Python
future = m.make_future_dataframe(periods=30)
# future.loc[:,'floor'] = saturating_min
# future.loc[:,'cap'] = saturating_max
future.tail()

Unnamed: 0,ds
388,2019-04-24
389,2019-04-25
390,2019-04-26
391,2019-04-27
392,2019-04-28


In [None]:
# Python
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

In [None]:
# Python
fig1 = m.plot(forecast)

In [None]:
%run -i 'prophet_helper_functions.py'

In [None]:
dt_restricted_prophet_plt(
    m = m,
    fcst = forecast,
    visible_window_start_dt = '01JAN2019',
    visible_window_end_dt = '01JUNE2019',
    ylabel = "ETH Price ($)",
    xlabel = "Time",
    x_axis_date_format = '%Y-%m'
);

In [None]:
# Python
fig2 = m.plot_components(forecast)

In [None]:
%%capture
# capturing warnings here because one of the dependencies throws a bunch of numpy deprecation warnings

from fbprophet.diagnostics import cross_validation
df_dv = cross_validation(m, horizon = '30 days')

In [None]:
df_dv.head()

In [None]:
from fbprophet.diagnostics import performance_metrics
df_p = performance_metrics(df_dv)
df_p.head()

In [None]:
from fbprophet.plot import plot_cross_validation_metric
fig = plot_cross_validation_metric(df_dv, metric = 'mape')