# 2022 UOS Big Data Algorithm Competition

## Hyperparameter Tuning and Regularization for Time Series Model Using Prophet in Python

### Reference URL:
- https://medium.com/grabngoinfo/hyperparameter-tuning-and-regularization-for-time-series-model-using-prophet-in-python-9791370a07dc

### What we learn:
- What are the hyperparameters for a prophet time series model?


- Among all the hyperparameters, what hyperparameters should be tuned, and what hyperparameters are better not to be tuned?


- Among all the hyperparameters that should be tuned, what hyperparameters should be tuned automatically vs. manually?


- How are hyperparameter tunings related to regularization?


- How does data transformation impact the model performance?

### For begineer:
#### Reference URL:
- https://medium.com/grabngoinfo/time-series-forecasting-of-bitcoin-prices-using-prophet-1069133708bc


- https://medium.com/grabngoinfo/multivariate-time-series-forecasting-with-seasonality-and-holiday-effect-using-prophet-in-python-d5d4150eeb57



- https://medium.com/grabngoinfo/3-ways-for-multiple-time-series-forecasting-using-prophet-in-python-7a0709a117f9

<br>

## Step 0. Overview of All the Hyperparameters for a Prophet Model

#### 하이퍼 파라미터는 크게 세 그룹으로 나뉘어짐
1. The first group contains hyperparameters that are suitable for automatic tuning. We can specify a list of values and do a grid search for the best value combination. For more information about grid search, please refer to my previous tutorial Hyperparameter Tuning For XGBoost


##### XGBoost:
- https://pub.towardsai.net/lasso-l1-vs-ridge-l2-vs-elastic-net-regularization-for-classification-model-409c3d86f6e9

<br>

2. The second group contains hyperparameters that are suitable for manual tuning. A human needs to make a judgment on what hyperparameter value to use based on knowledge about data and business.

<br>

3. The third group contains the hyperparameters that are better left untuned with the default values.

### Group 1: Hyperparameters Suitable for Automatic Tuning

- 그리드 서치를 통해서 자동 튜닝할 수 있는 하이퍼파라미터는 네 가지!!


    - changepoint_prior_scale 
    (Most impactful parameter, default=0.05, 0.001~0.5, log_scale)

    - seasonality_prior_scale 
    (L2 Ridge regularization, default=10, recommended=0.01~10, log_scale)

    - holidays_prior_scale 
    (default=10, recommended=0.01~10)

    - seasonality_mode 
    (two_options=[additive, multiplicative])
    


### Group 2: Hyperparameters Suitbale for Manual Tuning

- Business knowledge 또는 data observations를 통한 human-based 튜닝이 필요한 요소들


- 디테일하게 7가지


    - changepoint_range
    (value=0~1, default=0.8)
    
    - growth
    (two_options=[linear, logistic], default=linear, knowning growth saturating point > logistic)
    
    - changepoints
    (default=None)
    
    - yearly_seaonality
    (three_options=[auto, True, False], default=auto)
    
    - weekly_seasonality, daily_seasonality
    (yearly_seasonality와 동일한 방식으로 설정 가능)
    
    - holidays
    (special day > manually include or exclude holidays or events)
    
    

### Group 3: Hyperparameters Suitable for No Tuning

- 튜닝하면 안되는 다섯 가지 요소!
    
    
    - n_changepoints
    (default=25, 해당 데이터의 수정보다는 changepoint_prior_scale을 수정하자!)
    
    - interval_width
    (default=0.8, yhat_upper 및 yhat_lower 값이 80%의 불확실성 주기를 갖고 있다는 의미, 값을 변경해도 큰 의미 X)
    
    - uncertainty_samples
    (default=1000)
    
    - mcmc_samples
    (Bayesian inference[샘플>0] or Maximum a Posterior (MAP)[샘플=0], default_mcmc_samples=0)
    
    - stan_backend
    (pystan and cmdstanpy 값에 대한 선택, 변경에 따른 이득 X)

<br>

## Step 1. Install and Import Libraries

In [2]:
# import module
# The Four Musketeers
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import itertools
from prophet import Prophet
from prophet.diagnostics import cross_validation, performance_metrics

<br>

## Step 2. Pull Data

In [3]:
!ls

UOS_2022.ipynb                        submission_11-14-2022 19:28:33 PM.csv
UOS_2022_custom.ipynb                 submission_11-14-2022 19:31:33 PM.csv
[1m[36mdacon_236029_open[m[m                     submission_11-14-2022 21:02:11 PM.csv
dacon_236029_open.zip                 train_all.csv
[1m[36mresult[m[m                                [31mtrain_test_all_2.numbers[m[m


In [4]:
train = pd.read_csv('./dacon_236029_open/train.csv')
train

Unnamed: 0,일시,광진구,동대문구,성동구,중랑구
0,20180101,0.592,0.368,0.580,0.162
1,20180102,0.840,0.614,1.034,0.260
2,20180103,0.828,0.576,0.952,0.288
3,20180104,0.792,0.542,0.914,0.292
4,20180105,0.818,0.602,0.994,0.308
...,...,...,...,...,...
1456,20211227,3.830,3.416,2.908,2.350
1457,20211228,4.510,3.890,3.714,2.700
1458,20211229,4.490,3.524,3.660,2.524
1459,20211230,4.444,3.574,3.530,2.506


In [5]:
sample = pd.read_csv('./dacon_236029_open/sample_submission.csv')
sample

Unnamed: 0,일시,광진구,동대문구,성동구,중랑구
0,20220101,0,0,0,0
1,20220102,0,0,0,0
2,20220103,0,0,0,0
3,20220104,0,0,0,0
4,20220105,0,0,0,0
...,...,...,...,...,...
329,20221126,0,0,0,0
330,20221127,0,0,0,0
331,20221128,0,0,0,0
332,20221129,0,0,0,0


In [6]:
train['일시'] = pd.to_datetime(train['일시'], format='%Y%m%d')
train

Unnamed: 0,일시,광진구,동대문구,성동구,중랑구
0,2018-01-01,0.592,0.368,0.580,0.162
1,2018-01-02,0.840,0.614,1.034,0.260
2,2018-01-03,0.828,0.576,0.952,0.288
3,2018-01-04,0.792,0.542,0.914,0.292
4,2018-01-05,0.818,0.602,0.994,0.308
...,...,...,...,...,...
1456,2021-12-27,3.830,3.416,2.908,2.350
1457,2021-12-28,4.510,3.890,3.714,2.700
1458,2021-12-29,4.490,3.524,3.660,2.524
1459,2021-12-30,4.444,3.574,3.530,2.506
