# Prototype Bemchmark
For each record to be scored, the "county" specific model is read from external storage.  No model caching in memory.

* `model_load_time` (seconds) is the time required to read a model file from external storage and deserialize the model object.
* `model_score_time` (seconds) is the time to score one record.

## Metrics reported in this notebook are from synthetic data and **have not** been calibrated to representative dataset or model sizes.

## Notebook run-time enviornment
* **Hardware:** MacBook Pro Intel(2019), 16GB RAM, 1TB SSD drive
* **OS:** MacOS 11.6.1
* **Docker:** Docker for Desktop 4.2.0 (Mac)
* **Docker Image:** Base image: `jupyter/datascience-notebook:lab-3.2.5` with ONNX packages added

In [1]:
import glob
import os
import shutil
import pandas as pd
import numpy as np
import onnxruntime as rt
import pickle

## Setup for tests

In [2]:
# required to allow for import of project speccific utility functions
os.chdir('..')

In [3]:
# import project specific utiity functions
from utils.utils import BenchmarkDriver, load_config, actualsize_mb

In [4]:
# get configuration parameters
config = load_config('./config.yaml')
config

{'data_dir': '/Users/jim/Desktop/onnx_sandbox/data',
 'models_dir': '/Users/jim/Desktop/onnx_sandbox/models',
 'number_records': 100000,
 'number_features': 20,
 'number_informative': 14,
 'number_trees': 500,
 'fraction_for_test': 0.2,
 'number_counties': 20,
 'random_seed': 123}

In [5]:
TEST_DATA = os.path.join(config['data_dir'],'benchmark', 'test.parquet')
PERFORMANCE_DATA_DIR = os.path.join(config['data_dir'],'performance')
MODELS_DIR = os.path.join(config['models_dir'], 'benchmark500')


In [6]:
# setup directory to collect performance data
shutil.rmtree(PERFORMANCE_DATA_DIR, ignore_errors=True)
os.makedirs(PERFORMANCE_DATA_DIR, exist_ok=True)

## Test Design

### Model Training
In an offline process, trained 20 Random Forest (RF) models using synthetic data generated by `sklearn.datasets.make_regression()` method.  These data sets have 20 explanatory variables (`X_00` to `X_19`) with a target variable called 'y'.  With the exception of setting `random_state` parameter to a known value amd `n_estimators=500`, all other hyper-parameters were allowed to take their default value.

In [7]:
pd.read_parquet(os.path.join(TEST_DATA)).head()

Unnamed: 0,county,X_00,X_01,X_02,X_03,X_04,X_05,X_06,X_07,X_08,...,X_11,X_12,X_13,X_14,X_15,X_16,X_17,X_18,X_19,y
0,cnty0005,-0.047924,0.876722,-1.69104,2.626353,0.663601,0.667906,0.707864,1.312462,1.974233,...,-0.736528,-0.6912,-0.903873,-1.193744,0.687999,-0.800075,-1.300539,-0.639801,0.175235,93.457985
1,cnty0010,-0.401803,-0.685433,-0.823452,-0.191975,-0.232961,0.288845,0.599367,-1.502481,-0.917875,...,1.726174,0.693494,1.264482,1.459226,-0.508734,0.324457,1.48368,-0.582111,-0.202034,202.375458
2,cnty0000,-0.699637,0.310477,-0.535438,-0.361965,0.234813,-0.303082,-0.433491,-1.283665,0.634701,...,0.599885,-0.93146,-1.013379,0.504252,-0.556672,0.119437,1.545638,-1.011144,-0.343707,-73.529205
3,cnty0012,-0.810314,0.612156,-0.563249,0.46172,0.393322,0.8736,-0.676884,0.017982,1.143867,...,-0.559798,-0.53823,0.14518,-0.487649,1.367196,0.176917,-1.886566,1.49764,0.867901,-93.428177
4,cnty0003,-0.063003,-0.254967,0.643265,2.217894,0.429902,1.054095,0.624055,1.037485,-0.754566,...,-0.610848,-0.195606,1.082975,-0.200524,-0.438538,-1.54788,-0.146857,1.458846,-0.724704,-7.727258


For each Random Forest model, saved two model files: sklearn (.pkl file extension) and onnx (.onnx file extension).  These files are indentified by the `county` data used to generate the RF model.  Here is a sample of saved model files.

In [8]:
os.listdir(MODELS_DIR)[:10]

['cnty0000.onnx',
 'cnty0000.pkl',
 'cnty0001.onnx',
 'cnty0001.pkl',
 'cnty0002.onnx',
 'cnty0002.pkl',
 'cnty0003.onnx',
 'cnty0003.pkl',
 'cnty0004.onnx',
 'cnty0004.pkl']

### Model Scoring
For this test selected 100 random records from the test data.  For each record performed the following:
```
# Process test batch
while there are input records:
    Read one record
    Based on 'county' value, load the RF model for that 'county'  
    Record time to load and make model useable
    Score the record
    Record time to score the record

# record collected metrics
Write collected run-time metrics to an external file.
```
No explicit caching of model objects were done in this test.

## Analysis of RF Tree Structure


In [9]:
# collect data on RF tree structure
tree_metrics = []
rf_models = glob.glob(os.path.join(MODELS_DIR, '*.pkl'))
for model in rf_models:
    # get file sizes
    fp_parts = os.path.splitext(model)
    metrics = {'model': fp_parts[0].split('/')[-1]}
    metrics['sklearn_file_size_mb'] = os.path.getsize(model) / (1024 * 1024)
    metrics['onnx_file_size_mb'] = os.path.getsize(fp_parts[0] + '.onnx') / (1024 * 1024)
    
    # extract tree structure
    with open(model, 'rb') as f:
        rf = pickle.load(f)
    metrics['sklearn_in_memory_mb'] = actualsize_mb(rf)
    metrics['number_of_trees'] = len(rf.estimators_)
    tree_depth = [tree.tree_.max_depth for tree in rf.estimators_]
    metrics['tree_min_depth'] = np.min(tree_depth)
    metrics['tree_max_depth'] = np.max(tree_depth)
    metrics['tree_mean_depth'] = np.mean(tree_depth)
    
    # get onnx in memory size
    onnx_rf = rt.InferenceSession(fp_parts[0] + '.onnx')
    metrics['onnx_in_memory_mb'] = actualsize_mb(onnx_rf)
    
    del rf
    del onnx_rf
        
    # collect metrics
    tree_metrics.append(metrics)

In [10]:
# overview of tree structure
tree_metrics_df = pd.DataFrame(tree_metrics)
tree_metrics_df

Unnamed: 0,model,sklearn_file_size_mb,onnx_file_size_mb,sklearn_in_memory_mb,number_of_trees,tree_min_depth,tree_max_depth,tree_mean_depth,onnx_in_memory_mb
0,cnty0000,156.392322,97.80603,55.0762,500,19,27,22.504,55.0822
1,cnty0001,154.341662,96.520756,55.0812,500,19,27,22.068,55.0829
2,cnty0002,152.485583,95.357236,55.0826,500,20,28,22.172,55.0842
3,cnty0003,156.238269,97.709489,55.0832,500,20,30,22.718,55.0848
4,cnty0004,152.140979,95.141204,55.0838,500,19,27,22.258,55.0855
5,cnty0005,155.043933,96.960824,55.0845,500,19,29,22.376,55.0866
6,cnty0006,152.87218,95.599707,55.0856,500,19,29,22.446,55.0874
7,cnty0007,152.600574,95.42942,55.0869,500,19,27,22.066,55.0886
8,cnty0008,155.510486,97.253286,55.0876,500,19,29,22.376,55.0892
9,cnty0009,152.87218,95.599447,55.0883,500,19,28,22.598,55.0904


In [11]:
tree_metrics_of_interest = ['sklearn_file_size_mb', 'onnx_file_size_mb', 'sklearn_in_memory_mb', 'onnx_in_memory_mb',
                           'number_of_trees', 'tree_min_depth', 'tree_mean_depth', 'tree_max_depth']
tree_metrics_df[tree_metrics_of_interest].describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
sklearn_file_size_mb,20.0,154.426062,2.117732,151.755725,152.804278,153.957141,155.539111,159.79845
onnx_file_size_mb,20.0,96.573636,1.327537,94.899602,95.556941,96.279709,97.271222,99.941508
sklearn_in_memory_mb,20.0,55.088695,0.005801,55.0762,55.084325,55.08885,55.09235,55.0977
onnx_in_memory_mb,20.0,55.090725,0.005452,55.0822,55.086325,55.09075,55.094525,55.0993
number_of_trees,20.0,500.0,0.0,500.0,500.0,500.0,500.0,500.0
tree_min_depth,20.0,19.3,0.470162,19.0,19.0,19.0,20.0,20.0
tree_mean_depth,20.0,22.4,0.222541,22.066,22.2365,22.385,22.5305,22.828
tree_max_depth,20.0,28.2,1.151658,27.0,27.0,28.0,29.0,31.0


## Model Scoring Test

### Get data for test

In [12]:
test_df = pd.read_parquet(TEST_DATA)
test_df = test_df.sample(n=100, random_state=config['random_seed'])
test_df.reset_index(inplace=True)
test_df.shape

(100, 23)

In [13]:
test_df[['index','county', 'y']].head(10)

Unnamed: 0,index,county,y
0,12136,cnty0015,-31.640778
1,16812,cnty0004,123.967995
2,10072,cnty0011,-59.197975
3,5850,cnty0010,-154.407074
4,4320,cnty0014,5.997156
5,4429,cnty0003,37.982521
6,11744,cnty0011,-19.720573
7,12106,cnty0011,271.501892
8,4230,cnty0019,140.779358
9,16858,cnty0002,45.374973


In [14]:
metrics_of_interest = ['model_memory_size_mb', 'model_load_time', 'model_score_time', 'model_prediction_time']

### sklearn Model Scoring

In [15]:
%%time
sklearn_driver = BenchmarkDriver(
    model_type='sklearn',
    models_dir=MODELS_DIR,
    performance_fp=os.path.join(PERFORMANCE_DATA_DIR, 'sklearn_benchmark.csv'),
    test_scenario='county-level'
)

# iterate over each row and collect run-time performance statistics
for idx, row in test_df.iterrows():
    one_record = pd.DataFrame(row).T
    sklearn_driver.score_one_record(row['county'], row['index'],
            one_record.drop(['index', 'county', 'y'], axis='columns'))
    
sklearn_driver.close_performance_data()

CPU times: user 38.2 s, sys: 7.14 s, total: 45.3 s
Wall time: 44.1 s


In [16]:
# display collected performance metrics
sklearn_metrics_df = pd.read_csv(os.path.join(PERFORMANCE_DATA_DIR, 'sklearn_benchmark.csv'))
sklearn_metrics_df['model_prediction_time'] = sklearn_metrics_df['model_load_time'] + sklearn_metrics_df['model_score_time']
sklearn_metrics_df.head(10)

Unnamed: 0,scenario,record_id,model_type,county_id,model_memory_size_mb,model_load_time,model_score_time,prediction,model_prediction_time
0,county-level,12136,sklearn,cnty0015,55.3205,0.089932,0.106137,-110.873899,0.19607
1,county-level,16812,sklearn,cnty0004,55.3504,0.094985,0.105705,167.356637,0.200691
2,county-level,10072,sklearn,cnty0011,55.3534,0.154069,0.089972,-26.419206,0.244041
3,county-level,5850,sklearn,cnty0010,55.3585,0.157792,0.115116,-118.903737,0.272908
4,county-level,4320,sklearn,cnty0014,55.3601,0.079035,0.095368,61.026405,0.174403
5,county-level,4429,sklearn,cnty0003,55.3606,0.074138,0.103399,41.992953,0.177537
6,county-level,11744,sklearn,cnty0011,55.3608,0.053839,0.094309,-20.34258,0.148148
7,county-level,12106,sklearn,cnty0011,55.3609,0.051595,0.101851,133.894013,0.153446
8,county-level,4230,sklearn,cnty0019,55.3615,0.06116,0.098168,99.426442,0.159328
9,county-level,16858,sklearn,cnty0002,55.3618,0.088466,0.098616,72.667956,0.187082


In [17]:
sklearn_metrics_df[metrics_of_interest].describe(percentiles=[.25, .5, .75, .9]).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,90%,max
model_memory_size_mb,100.0,55.379321,0.012139,55.3205,55.37235,55.3839,55.38615,55.39224,55.3948
model_load_time,100.0,0.064808,0.022545,0.049616,0.052265,0.055492,0.06999,0.083372,0.169208
model_score_time,100.0,0.100649,0.007975,0.085623,0.097569,0.099843,0.103151,0.105931,0.165059
model_prediction_time,100.0,0.165457,0.023696,0.140118,0.151781,0.157044,0.173987,0.187333,0.272908


### onnx scoring test

In [18]:
%%time
onnx_driver = BenchmarkDriver(
    model_type='onnx',
    models_dir=MODELS_DIR,
    performance_fp=os.path.join(PERFORMANCE_DATA_DIR, 'onnx_benchmark.csv'),
    test_scenario='county-level'
)

# iterate over each row and collect run-time performance statistics
for idx, row in test_df.iterrows():
    one_record = pd.DataFrame(row).T
    onnx_driver.score_one_record(row['county'], row['index'],
            one_record.drop(['index', 'county', 'y'], axis='columns').astype(np.float32).to_numpy())
    
onnx_driver.close_performance_data()

CPU times: user 5min 5s, sys: 7.43 s, total: 5min 12s
Wall time: 5min 6s


In [19]:
# display collected performance metrics
onnx_metrics_df = pd.read_csv(os.path.join(PERFORMANCE_DATA_DIR, 'onnx_benchmark.csv'))
onnx_metrics_df['model_prediction_time'] = onnx_metrics_df['model_load_time'] + onnx_metrics_df['model_score_time']
onnx_metrics_df.head(10)

Unnamed: 0,scenario,record_id,model_type,county_id,model_memory_size_mb,model_load_time,model_score_time,prediction,model_prediction_time
0,county-level,12136,onnx,cnty0015,55.1071,2.742688,0.000687,-110.873856,2.743375
1,county-level,16812,onnx,cnty0004,55.1073,2.514581,0.000499,167.356613,2.51508
2,county-level,10072,onnx,cnty0011,55.1075,2.596117,0.000427,-26.419207,2.596545
3,county-level,5850,onnx,cnty0010,55.1077,2.666501,0.000442,-118.903809,2.666943
4,county-level,4320,onnx,cnty0014,55.1082,2.510002,0.00046,61.026405,2.510462
5,county-level,4429,onnx,cnty0003,55.1084,2.588674,0.000469,41.992981,2.589144
6,county-level,11744,onnx,cnty0011,55.1085,2.537812,0.000479,-20.342573,2.538291
7,county-level,12106,onnx,cnty0011,55.1086,2.567874,0.00044,133.893936,2.568314
8,county-level,4230,onnx,cnty0019,55.1088,2.603316,0.000495,99.426453,2.603811
9,county-level,16858,onnx,cnty0002,55.109,2.511119,0.00045,72.667938,2.511569


In [20]:
onnx_metrics_df[metrics_of_interest].describe(percentiles=[.25, .5, .75, .9]).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,90%,max
model_memory_size_mb,100.0,55.121673,0.009319,55.1071,55.11185,55.12465,55.12955,55.13283,55.1352
model_load_time,100.0,2.615942,0.10668,2.498341,2.550918,2.593923,2.653663,2.705412,3.168388
model_score_time,100.0,0.000478,6.8e-05,0.000362,0.00045,0.000468,0.00049,0.000504,0.00091
model_prediction_time,100.0,2.616421,0.106684,2.498791,2.551395,2.594531,2.654112,2.705901,3.168933


### Differences in sklearn vs onnx predictions

In [21]:
differences = np.abs(sklearn_metrics_df['prediction'] - onnx_metrics_df['prediction'])
differences.describe()

count    1.000000e+02
mean     3.724963e-05
std      4.179030e-05
min      3.367662e-09
25%      9.410888e-06
50%      2.558222e-05
75%      5.249660e-05
max      2.576325e-04
Name: prediction, dtype: float64