# Prototype Bemchmark
For each record to be scored, the "county" specific model is read from external storage.  No model caching in memory.

* `model_load_time` (seconds) is the time required to read a model file from external storage and deserialize the model object.
* `model_score_time` (seconds) is the time to score one record.

## Metrics reported in this notebook are from synthetic data and **have not** been calibrated to representative dataset or model sizes.

## Notebook run-time enviornment
* **Hardware:** MacBook Pro Intel(2019), 16GB RAM, 1TB SSD drive
* **OS:** MacOS 11.6.1
* **Docker:** Docker for Desktop 4.2.0 (Mac)
* **Docker Image:** Base image: `jupyter/datascience-notebook:lab-3.2.5` with ONNX packages added

In [1]:
import glob
import os
import shutil
import pandas as pd
import numpy as np
import onnxruntime as rt
import pickle

## Setup for tests

In [2]:
# required to allow for import of project speccific utility functions
os.chdir('..')

In [3]:
# import project specific utiity functions
from utils.utils import BenchmarkDriver, load_config, actualsize_mb

In [4]:
# get configuration parameters
config = load_config('./config.yaml')
config

{'data_dir': '/Users/jim/Desktop/onnx_sandbox/data',
 'models_dir': '/Users/jim/Desktop/onnx_sandbox/models',
 'number_records': 100000,
 'number_features': 20,
 'number_informative': 14,
 'fraction_for_test': 0.2,
 'number_counties': 20,
 'random_seed': 123}

In [5]:
TEST_DATA = os.path.join(config['data_dir'],'benchmark', 'test.parquet')
PERFORMANCE_DATA_DIR = os.path.join(config['data_dir'],'performance')
MODELS_DIR = os.path.join(config['models_dir'], 'benchmark')


In [6]:
# setup directory to collect performance data
shutil.rmtree(PERFORMANCE_DATA_DIR, ignore_errors=True)
os.makedirs(PERFORMANCE_DATA_DIR, exist_ok=True)

## Test Design

### Model Training
In an offline process, trained 20 Random Forest (RF) models using synthetic data generated by `sklearn.datasets.make_regression()` method.  These data sets have 20 explanatory variables (`X_00` to `X_19`) with a target variable called 'y'.  With the exception of setting `random_state` parameter to a known value, all other hyper-parameters were allowed to take their default value.

In [7]:
pd.read_parquet(os.path.join(TEST_DATA)).head()

Unnamed: 0,county,X_00,X_01,X_02,X_03,X_04,X_05,X_06,X_07,X_08,...,X_11,X_12,X_13,X_14,X_15,X_16,X_17,X_18,X_19,y
0,cnty0005,-0.047924,0.876722,-1.69104,2.626353,0.663601,0.667906,0.707864,1.312462,1.974233,...,-0.736528,-0.6912,-0.903873,-1.193744,0.687999,-0.800075,-1.300539,-0.639801,0.175235,93.457985
1,cnty0010,-0.401803,-0.685433,-0.823452,-0.191975,-0.232961,0.288845,0.599367,-1.502481,-0.917875,...,1.726174,0.693494,1.264482,1.459226,-0.508734,0.324457,1.48368,-0.582111,-0.202034,202.375458
2,cnty0000,-0.699637,0.310477,-0.535438,-0.361965,0.234813,-0.303082,-0.433491,-1.283665,0.634701,...,0.599885,-0.93146,-1.013379,0.504252,-0.556672,0.119437,1.545638,-1.011144,-0.343707,-73.529205
3,cnty0012,-0.810314,0.612156,-0.563249,0.46172,0.393322,0.8736,-0.676884,0.017982,1.143867,...,-0.559798,-0.53823,0.14518,-0.487649,1.367196,0.176917,-1.886566,1.49764,0.867901,-93.428177
4,cnty0003,-0.063003,-0.254967,0.643265,2.217894,0.429902,1.054095,0.624055,1.037485,-0.754566,...,-0.610848,-0.195606,1.082975,-0.200524,-0.438538,-1.54788,-0.146857,1.458846,-0.724704,-7.727258


For each Random Forest model, saved two model files: sklearn (.pkl file extension) and onnx (.onnx file extension).  These files are indentified by the `county` data used to generate the RF model.  Here is a sample of saved model files.

In [8]:
os.listdir(MODELS_DIR)[:10]

['cnty0000.onnx',
 'cnty0000.pkl',
 'cnty0001.onnx',
 'cnty0001.pkl',
 'cnty0002.onnx',
 'cnty0002.pkl',
 'cnty0003.onnx',
 'cnty0003.pkl',
 'cnty0004.onnx',
 'cnty0004.pkl']

### Model Scoring
For this test selected 100 random records from the test data.  For each record performed the following:
```
# Process test batch
while there are input records:
    Read one record
    Based on 'county' value, load the RF model for that 'county'  
    Record time to load and make model useable
    Score the record
    Record time to score the record

# record collected metrics
Write collected run-time metrics to an external file.
```
No explicit caching of model objects were done in this test.

## Analysis of RF Tree Structure


In [9]:
# collect data on RF tree structure
tree_metrics = []
rf_models = glob.glob(os.path.join(MODELS_DIR, '*.pkl'))
for model in rf_models:
    # get file sizes
    fp_parts = os.path.splitext(model)
    metrics = {'model': fp_parts[0].split('/')[-1]}
    metrics['sklearn_file_size_mb'] = os.path.getsize(model) / (1024 * 1024)
    metrics['onnx_file_size_mb'] = os.path.getsize(fp_parts[0] + '.onnx') / (1024 * 1024)
    
    # extract tree structure
    with open(model, 'rb') as f:
        rf = pickle.load(f)
    metrics['sklearn_in_memory_mb'] = actualsize_mb(rf)
    metrics['number_of_trees'] = len(rf.estimators_)
    tree_depth = [tree.tree_.max_depth for tree in rf.estimators_]
    metrics['tree_min_depth'] = np.min(tree_depth)
    metrics['tree_max_depth'] = np.max(tree_depth)
    metrics['tree_mean_depth'] = np.mean(tree_depth)
    
    # get onnx in memory size
    onnx_rf = rt.InferenceSession(fp_parts[0] + '.onnx')
    metrics['onnx_in_memory_mb'] = actualsize_mb(onnx_rf)
    
    del rf
    del onnx_rf
        
    # collect metrics
    tree_metrics.append(metrics)

In [10]:
# overview of tree structure
tree_metrics_df = pd.DataFrame(tree_metrics)
tree_metrics_df

Unnamed: 0,model,sklearn_file_size_mb,onnx_file_size_mb,sklearn_in_memory_mb,number_of_trees,tree_min_depth,tree_max_depth,tree_mean_depth,onnx_in_memory_mb
0,cnty0000,31.296801,19.027223,54.7428,100,20,25,22.52,54.7457
1,cnty0001,30.882494,18.774771,54.7447,100,20,27,21.94,54.7463
2,cnty0002,30.513475,18.54988,54.7453,100,20,26,22.24,54.7469
3,cnty0003,31.265795,19.008325,54.7459,100,20,30,22.79,54.7475
4,cnty0004,30.445482,18.508421,54.7465,100,20,27,22.15,54.7481
5,cnty0005,31.025072,18.861608,54.7472,100,20,27,22.35,54.7492
6,cnty0006,30.592455,18.598022,54.7482,100,20,27,22.32,54.75
7,cnty0007,30.534838,18.562901,54.7496,100,19,25,21.91,54.7512
8,cnty0008,31.118578,18.918609,54.7502,100,19,29,22.42,54.7518
9,cnty0009,30.592455,18.597977,54.7508,100,19,27,22.53,54.7529


In [11]:
tree_metrics_of_interest = ['sklearn_file_size_mb', 'onnx_file_size_mb', 'sklearn_in_memory_mb', 'onnx_in_memory_mb',
                           'number_of_trees', 'tree_min_depth', 'tree_mean_depth', 'tree_max_depth']
tree_metrics_df[tree_metrics_of_interest].describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
sklearn_file_size_mb,20.0,30.897905,0.415385,30.369677,30.578051,30.805956,31.124498,31.940233
onnx_file_size_mb,20.0,18.784136,0.253127,18.462207,18.589208,18.728096,18.922213,19.419286
sklearn_in_memory_mb,20.0,54.751805,0.00551,54.7428,54.747025,54.7517,54.75545,54.7606
onnx_in_memory_mb,20.0,54.753645,0.005489,54.7457,54.748925,54.7536,54.757625,54.7622
number_of_trees,20.0,100.0,0.0,100.0,100.0,100.0,100.0,100.0
tree_min_depth,20.0,19.8,0.410391,19.0,20.0,20.0,20.0,20.0
tree_mean_depth,20.0,22.3625,0.284788,21.84,22.2175,22.37,22.53,22.89
tree_max_depth,20.0,27.0,1.65434,25.0,26.0,27.0,27.25,31.0


## Model Scoring Test

### Get data for test

In [12]:
test_df = pd.read_parquet(TEST_DATA)
test_df = test_df.sample(n=100, random_state=config['random_seed'])
test_df.reset_index(inplace=True)
test_df.shape

(100, 23)

In [13]:
test_df[['index','county', 'y']].head(10)

Unnamed: 0,index,county,y
0,12136,cnty0015,-31.640778
1,16812,cnty0004,123.967995
2,10072,cnty0011,-59.197975
3,5850,cnty0010,-154.407074
4,4320,cnty0014,5.997156
5,4429,cnty0003,37.982521
6,11744,cnty0011,-19.720573
7,12106,cnty0011,271.501892
8,4230,cnty0019,140.779358
9,16858,cnty0002,45.374973


In [14]:
metrics_of_interest = ['model_memory_size_mb', 'model_load_time', 'model_score_time', 'model_prediction_time']

### sklearn Model Scoring

In [15]:
%%time
sklearn_driver = BenchmarkDriver(
    model_type='sklearn',
    models_dir=MODELS_DIR,
    performance_fp=os.path.join(PERFORMANCE_DATA_DIR, 'sklearn_benchmark.csv'),
    test_scenario='county-level'
)

# iterate over each row and collect run-time performance statistics
for idx, row in test_df.iterrows():
    one_record = pd.DataFrame(row).T
    sklearn_driver.score_one_record(row['county'], row['index'],
            one_record.drop(['index', 'county', 'y'], axis='columns'))
    
sklearn_driver.close_performance_data()

CPU times: user 29.9 s, sys: 1.53 s, total: 31.5 s
Wall time: 31.4 s


In [16]:
# display collected performance metrics
sklearn_metrics_df = pd.read_csv(os.path.join(PERFORMANCE_DATA_DIR, 'sklearn_benchmark.csv'))
sklearn_metrics_df['model_prediction_time'] = sklearn_metrics_df['model_load_time'] + sklearn_metrics_df['model_score_time']
sklearn_metrics_df.head(10)

Unnamed: 0,scenario,record_id,model_type,county_id,model_memory_size_mb,model_load_time,model_score_time,prediction,model_prediction_time
0,county-level,12136,sklearn,cnty0015,54.9811,0.014623,0.023195,-116.880853,0.037818
1,county-level,16812,sklearn,cnty0004,55.0106,0.010837,0.023014,153.039992,0.033851
2,county-level,10072,sklearn,cnty0011,55.0119,0.011123,0.022389,-30.62064,0.033512
3,county-level,5850,sklearn,cnty0010,55.0124,0.011134,0.022604,-89.091659,0.033738
4,county-level,4320,sklearn,cnty0014,55.014,0.013166,0.021652,79.284982,0.034818
5,county-level,4429,sklearn,cnty0003,55.0146,0.013734,0.024009,31.401549,0.037743
6,county-level,11744,sklearn,cnty0011,55.0156,0.015233,0.02241,-5.722466,0.037642
7,county-level,12106,sklearn,cnty0011,55.0158,0.01334,0.021046,139.301654,0.034387
8,county-level,4230,sklearn,cnty0019,55.0166,0.013924,0.022233,103.776609,0.036157
9,county-level,16858,sklearn,cnty0002,55.0172,0.014255,0.021148,72.877299,0.035403


In [17]:
sklearn_metrics_df[metrics_of_interest].describe(percentiles=[.25, .5, .75, .9]).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,90%,max
model_memory_size_mb,100.0,55.039222,0.012878,54.9811,55.03275,55.0444,55.046825,55.05251,55.0546
model_load_time,100.0,0.012125,0.002055,0.01039,0.010934,0.01133,0.012521,0.014292,0.023453
model_score_time,100.0,0.022234,0.001567,0.019903,0.021357,0.021953,0.02269,0.024013,0.031575
model_prediction_time,100.0,0.034358,0.00272,0.031009,0.032746,0.033592,0.035108,0.037706,0.047701


### onnx scoring test

In [18]:
%%time
onnx_driver = BenchmarkDriver(
    model_type='onnx',
    models_dir=MODELS_DIR,
    performance_fp=os.path.join(PERFORMANCE_DATA_DIR, 'onnx_benchmark.csv'),
    test_scenario='county-level'
)

# iterate over each row and collect run-time performance statistics
for idx, row in test_df.iterrows():
    one_record = pd.DataFrame(row).T
    onnx_driver.score_one_record(row['county'], row['index'],
            one_record.drop(['index', 'county', 'y'], axis='columns').astype(np.float32).to_numpy())
    
onnx_driver.close_performance_data()

CPU times: user 1min 20s, sys: 1.8 s, total: 1min 21s
Wall time: 1min 17s


In [19]:
# display collected performance metrics
onnx_metrics_df = pd.read_csv(os.path.join(PERFORMANCE_DATA_DIR, 'onnx_benchmark.csv'))
onnx_metrics_df['model_prediction_time'] = onnx_metrics_df['model_load_time'] + onnx_metrics_df['model_score_time']
onnx_metrics_df.head(10)

Unnamed: 0,scenario,record_id,model_type,county_id,model_memory_size_mb,model_load_time,model_score_time,prediction,model_prediction_time
0,county-level,12136,onnx,cnty0015,55.1122,0.509519,0.000202,-116.880859,0.509721
1,county-level,16812,onnx,cnty0004,55.1124,0.461644,0.000213,153.039993,0.461857
2,county-level,10072,onnx,cnty0011,55.1126,0.480536,0.000236,-30.620642,0.480772
3,county-level,5850,onnx,cnty0010,55.1127,0.449586,0.000219,-89.091667,0.449805
4,county-level,4320,onnx,cnty0014,55.1132,0.459116,0.000245,79.284981,0.459361
5,county-level,4429,onnx,cnty0003,55.1134,0.462147,0.0002,31.401548,0.462346
6,county-level,11744,onnx,cnty0011,55.1135,0.459098,0.00019,-5.722466,0.459288
7,county-level,12106,onnx,cnty0011,55.1137,0.44655,0.000194,139.301636,0.446744
8,county-level,4230,onnx,cnty0019,55.1139,0.482862,0.000233,103.776588,0.483095
9,county-level,16858,onnx,cnty0002,55.114,0.468972,0.000223,72.877312,0.469195


In [20]:
onnx_metrics_df[metrics_of_interest].describe(percentiles=[.25, .5, .75, .9]).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,90%,max
model_memory_size_mb,100.0,55.12492,0.00798,55.1122,55.11695,55.1256,55.132625,55.13483,55.1365
model_load_time,100.0,0.46496,0.013159,0.44218,0.455571,0.463533,0.471368,0.482506,0.509519
model_score_time,100.0,0.000209,1.8e-05,0.000168,0.000199,0.000207,0.000219,0.00023,0.000287
model_prediction_time,100.0,0.465169,0.013158,0.442382,0.455764,0.463753,0.471577,0.482714,0.509721


### Differences in sklearn vs onnx predictions

In [21]:
differences = np.abs(sklearn_metrics_df['prediction'] - onnx_metrics_df['prediction'])
differences.describe()

count    1.000000e+02
mean     1.306674e-05
std      1.780947e-05
min      1.621246e-07
25%      1.949072e-06
50%      6.562620e-06
75%      1.712799e-05
max      1.052976e-04
Name: prediction, dtype: float64