# Prophet from Facebook

[Getting started](https://facebook.github.io/prophet/docs/quick_start.html#python-api)

[Prophet+Fastai](https://www.martinalarcon.org/2018-12-31-ab-timeseries/)

[Arima, LSTM, Prophet](https://medium.com/analytics-vidhya/time-series-forecasting-arima-vs-lstm-vs-prophet-62241c203a3b)

In [1]:
!pip install -qq pystan
!pip install -qq fbprophet

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib
from pathlib import Path
from fbprophet import Prophet

DATASET_NAME = '4D.zip'

In [3]:
!git clone https://github.com/mengwangk/dl-projects
!cp dl-projects/*utils* .
!cp dl-projects/preprocess* .

Cloning into 'dl-projects'...
remote: Enumerating objects: 170, done.[K
remote: Counting objects: 100% (170/170), done.[K
remote: Compressing objects: 100% (149/149), done.[K
remote: Total 2380 (delta 104), reused 42 (delta 21), pack-reused 2210[K
Receiving objects: 100% (2380/2380), 80.67 MiB | 15.42 MiB/s, done.
Resolving deltas: 100% (1483/1483), done.


In [4]:
%reload_ext autoreload
%autoreload 2

%matplotlib notebook

In [5]:
from utils import *
from preprocess import *

In [6]:
from google.colab import drive
drive.mount('/content/gdrive')
GDRIVE_DATASET_FOLDER = Path('gdrive/My Drive/datasets/')
DATASET_PATH = GDRIVE_DATASET_FOLDER
ORIGIN_DATASET_PATH = Path('dl-projects/datasets')
ORIGIN_DATASET = ORIGIN_DATASET_PATH/DATASET_NAME

Mounted at /content/gdrive


In [7]:
data = format_tabular(ORIGIN_DATASET)
data.head(10)

Unnamed: 0,DrawNo,DrawDate,PrizeType,LuckyNo
0,40792,1992-05-06,1stPrizeNo,19
1,40792,1992-05-06,2ndPrizeNo,1124
2,40792,1992-05-06,3rdPrizeNo,592
3,40792,1992-05-06,ConsolationNo1,5311
4,40792,1992-05-06,ConsolationNo10,407
5,40792,1992-05-06,ConsolationNo2,1949
6,40792,1992-05-06,ConsolationNo3,1606
7,40792,1992-05-06,ConsolationNo4,3775
8,40792,1992-05-06,ConsolationNo5,6226
9,40792,1992-05-06,ConsolationNo6,1271


In [8]:
data.rename(columns={"DrawDate": 'ds', "LuckyNo": "y" }, inplace=True)
ts_data = data.drop(columns=["DrawNo", "PrizeType"])
ts_data.head(10)

Unnamed: 0,ds,y
0,1992-05-06,19
1,1992-05-06,1124
2,1992-05-06,592
3,1992-05-06,5311
4,1992-05-06,407
5,1992-05-06,1949
6,1992-05-06,1606
7,1992-05-06,3775
8,1992-05-06,6226
9,1992-05-06,1271


In [9]:
from ts_utils import *


pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.



In [19]:
train_data = ts_data[ts_data['ds'].dt.year < 2020]
test_data = ts_data[ts_data['ds'].dt.year >= 2020]

In [20]:
len(ts_data), len(train_data), len(test_data)

(107847, 106835, 1012)

In [21]:
m = Prophet()
m.fit(train_data)

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


<fbprophet.forecaster.Prophet at 0x7fc97bcf83c8>

In [25]:
future = pd.DataFrame(test_data['ds'])
future

Unnamed: 0,ds
106835,2020-01-01
106836,2020-01-01
106837,2020-01-01
106838,2020-01-01
106839,2020-01-01
...,...
107842,2020-07-04
107843,2020-07-04
107844,2020-07-04
107845,2020-07-04


In [30]:
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(23)

Unnamed: 0,ds,yhat,yhat_lower,yhat_upper
989,2020-07-04,4994.148624,1180.202507,8668.951516
990,2020-07-04,4994.148624,1220.599797,8638.878536
991,2020-07-04,4994.148624,1499.860519,8504.334684
992,2020-07-04,4994.148624,1448.563933,8717.213672
993,2020-07-04,4994.148624,1345.00413,8684.236243
994,2020-07-04,4994.148624,1163.002173,8479.969127
995,2020-07-04,4994.148624,1142.696376,8583.786378
996,2020-07-04,4994.148624,1570.40478,8868.374577
997,2020-07-04,4994.148624,1416.605484,8987.025916
998,2020-07-04,4994.148624,1536.475456,8708.640598


In [28]:
test_data.tail(23)

Unnamed: 0,ds,y
107824,2020-07-04,2999
107825,2020-07-04,115
107826,2020-07-04,5808
107827,2020-07-04,6792
107828,2020-07-04,5049
107829,2020-07-04,9926
107830,2020-07-04,8257
107831,2020-07-04,7643
107832,2020-07-04,204
107833,2020-07-04,6606
