## Benchmark 5: Oneshot naive model structured bootstrap resampling
Ok, so, I talked about the rationale and set-up for this at length in notebooks 08 & 09. Let's give it a test. Success criteria here is that our observed public leader score of ~1.09 for the naive 'carry-forward' model should not have a vanishingly small probability of coming from this dataset using this resampling procedure. If the SMAPE score distribution generated by this sampling procedure with the naive model is consistent with the observed public leaderboard score for the same model, we can declare victory and move on the better models and better data.
1. [Abbreviations & definitions](#abbrevations_definitions)
2. [Load & inspect](#load_inspect)

<a name="abbreviations_definitions"></a>
### 1. Abbreviations & definitions
+ MBD: microbusiness density
+ MBC: microbusiness count
+ OLS: ordinary least squares
+ Model order: number of past timepoints used as input data for model training
+ Origin (forecast origin): last known point in the input data
+ Horizon (forecast horizon): number of future data points predicted by the model
+ SMAPE: Symmetric mean absolute percentage error

<a name="load_inspect"></a>
### 2. Load & inspect

In [2]:
# Add parent directory to path to allow import of config.py
import sys
sys.path.append('..')
import config as conf
import functions.data_manipulation_functions as data_funcs

import numpy as np
import pandas as pd
import multiprocessing as mp
from statistics import NormalDist

print(f'Python: {sys.version}')
print()
print(f'Numpy {np.__version__}')
print(f'Pandas {pd.__version__}')

Python: 3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:24:10) [GCC 9.4.0]

Numpy 1.23.5
Pandas 1.4.3


In [3]:
# Load parsed data
block_size = 8

output_file = f'{conf.DATA_PATH}/parsed_data/structured_bootstrap_blocksize{block_size}.npy'
timepoints = np.load(output_file)

print(f'Timepoints shape: {timepoints.shape}')
print()
print('Column types:')

for column in timepoints[0,0,0,0:]:
    print(f'\t{type(column)}')

print()
print(f'Example block:\n{timepoints[0,0,0:,]}')

Timepoints shape: (31, 3135, 8, 5)

Column types:
	<class 'numpy.float64'>
	<class 'numpy.float64'>
	<class 'numpy.float64'>
	<class 'numpy.float64'>
	<class 'numpy.float64'>

Example block:
[[1.0010000e+03 1.5646176e+18 3.0076818e+00 1.2490000e+03 0.0000000e+00]
 [1.0010000e+03 1.5672960e+18 2.8848701e+00 1.1980000e+03 1.0000000e+00]
 [1.0010000e+03 1.5698880e+18 3.0558431e+00 1.2690000e+03 2.0000000e+00]
 [1.0010000e+03 1.5725664e+18 2.9932332e+00 1.2430000e+03 3.0000000e+00]
 [1.0010000e+03 1.5751584e+18 2.9932332e+00 1.2430000e+03 4.0000000e+00]
 [1.0010000e+03 1.5778368e+18 2.9690900e+00 1.2420000e+03 5.0000000e+00]
 [1.0010000e+03 1.5805152e+18 2.9093256e+00 1.2170000e+03 6.0000000e+00]
 [1.0010000e+03 1.5830208e+18 2.9332314e+00 1.2270000e+03 7.0000000e+00]]
