## **Conformal Prediction using Energy Hospital Load**

One example of the NP data is the electricity consumption of a hospital in SF. It has hourly reservation for the entire year of 2015.

### Extract Data and Split Data into Train, Val, Cal, and Test


**Extract Data From GitHub**

In [1]:
import numpy as np
import pandas as pd
from collections import defaultdict
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

from neuralprophet import NeuralProphet, set_log_level, set_random_seed

In [2]:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
file = 'energy/SF_hospital_load.csv'
# file = 'air_passengers.csv'

In [3]:
data_df = pd.read_csv(data_location + file)

In [4]:
# data_df.head(5)

In [5]:
# data_df.tail(5)

**Split data into train, val, cal, and test in that order**

Do we need to instantiate a NP model `m` in order to split the df into train and test? If so, does the NP params make any diff to the outcome of this split? I'm presuming no atm.

In [6]:
train_df, test_df = NeuralProphet().split_df(data_df, freq='H', valid_p = 1.0/16)
train_df.shape, test_df.shape

INFO - (NP.df_utils._infer_frequency) - Major frequency H corresponds to 99.989% of the data.
INFO - (NP.df_utils._infer_frequency) - Defined frequency is equal to major frequency - H
INFO - (NP.df_utils.return_df_in_original_format) - Returning df with no ID column
INFO - (NP.df_utils.return_df_in_original_format) - Returning df with no ID column


((8213, 2), (547, 2))

In [7]:
print(f"Train set time range:        {train_df['ds'].min()} - {train_df['ds'].max()}")
print(f"Test set time range:         {test_df['ds'].min()} - {test_df['ds'].max()}")

Train set time range:        2015-01-01 01:00:00 - 2015-12-09 05:00:00
Test set time range:         2015-12-09 06:00:00 - 2016-01-01 00:00:00


### Create Folds using CV Splits

In [8]:
B = 10               # Number of bootstraps and number of k-folds
val_cov_pct = 0.5    # Overall validation set coverage %
val_fold_pct = 0.05  # Validation fold size % of entire input
print(f"Fold overlap proportion: {round(B*val_fold_pct - val_cov_pct, 2)}")

Fold overlap proportion: 0.0


In [9]:
folds = NeuralProphet().crossvalidation_split_df(
    train_df,
    freq='H',
    k=B,
    fold_pct=val_fold_pct,
    fold_overlap_pct=(B*val_fold_pct - val_cov_pct),
)

INFO - (NP.df_utils._infer_frequency) - Major frequency H corresponds to 99.988% of the data.
INFO - (NP.df_utils._infer_frequency) - Defined frequency is equal to major frequency - H


In [10]:
train_df.shape

(8213, 2)

In [11]:
for i, (fold_train_df, fold_cal_df) in enumerate(folds):
    print(f"Fold {i+1}:")
    print(f"  - Train start: {fold_train_df.ds.min()}, Train end: {fold_train_df.ds.max()}")
    print(f"  - Cal start:   {fold_cal_df.ds.min()}, Cal end:   {fold_cal_df.ds.max()}")
    print(f"  - Train shape:   {fold_train_df.shape}, Cal shape:   {fold_cal_df.shape}")

Fold 1:
  - Train start: 2015-01-01 01:00:00, Train end: 2015-06-21 09:00:00
  - Cal start:   2015-06-21 10:00:00, Cal end:   2015-07-08 11:00:00
  - Train shape:   (4113, 2), Cal shape:   (410, 2)
Fold 2:
  - Train start: 2015-01-01 01:00:00, Train end: 2015-07-08 11:00:00
  - Cal start:   2015-07-08 12:00:00, Cal end:   2015-07-25 13:00:00
  - Train shape:   (4523, 2), Cal shape:   (410, 2)
Fold 3:
  - Train start: 2015-01-01 01:00:00, Train end: 2015-07-25 13:00:00
  - Cal start:   2015-07-25 14:00:00, Cal end:   2015-08-11 15:00:00
  - Train shape:   (4933, 2), Cal shape:   (410, 2)
Fold 4:
  - Train start: 2015-01-01 01:00:00, Train end: 2015-08-11 15:00:00
  - Cal start:   2015-08-11 16:00:00, Cal end:   2015-08-28 17:00:00
  - Train shape:   (5343, 2), Cal shape:   (410, 2)
Fold 5:
  - Train start: 2015-01-01 01:00:00, Train end: 2015-08-28 17:00:00
  - Cal start:   2015-08-28 18:00:00, Cal end:   2015-09-14 19:00:00
  - Train shape:   (5753, 2), Cal shape:   (410, 2)
Fold 6:
  