# Prototype Bemchmark
For each record to be scored, the "county" specific model is read from external storage.  No model caching in memory.

* `model_load_time` (msec) is the time required to read a model file from external storage and deserialize the model object.
* `model_score_time` (msec) is the time to score one record.

## Metrics reported in this notebook are from synthetic data and **have not** been calibrated to representative dataset or model sizes.

## Notebook run-time enviornment
* **Hardware:** MacBook Pro Intel(2019), 16GB RAM, 1TB SSD drive
* **OS:** MacOS 11.6.1
* **Docker:** Docker for Desktop 4.2.0 (Mac)
* **Docker Image:** Base image: `jupyter/datascience-notebook:lab-3.2.5` with ONNX packages added

In [1]:
import glob
import os
import shutil
import pandas as pd
import numpy as np
import onnxruntime as rt
import pickle

## Setup for tests

In [2]:
# required to allow for import of project speccific utility functions
os.chdir('..')

In [3]:
# import project specific utiity functions
from utils.utils import BenchmarkDriver, load_config, actualsize_mb

In [4]:
# get configuration parameters
config = load_config('./config.yaml')
config

{'data_dir': '/Users/jim/Desktop/onnx_sandbox/data',
 'models_dir': '/Users/jim/Desktop/onnx_sandbox/models',
 'number_records': 100000,
 'number_features': 20,
 'number_informative': 14,
 'number_trees': 500,
 'fraction_for_test': 0.2,
 'number_counties': 20,
 'random_seed': 123}

In [5]:
TEST_DATA = os.path.join(config['data_dir'],'benchmark', 'test.parquet')
PERFORMANCE_DATA_DIR = os.path.join(config['data_dir'],'performance','testbed')
MODELS_DIR = os.path.join(config['models_dir'], 'benchmark')


In [6]:
# setup directory to collect performance data
shutil.rmtree(PERFORMANCE_DATA_DIR, ignore_errors=True)
os.makedirs(PERFORMANCE_DATA_DIR, exist_ok=True)

## Test Design

### Model Training
In an offline process, trained 20 Random Forest (RF) models using synthetic data generated by `sklearn.datasets.make_regression()` method.  These data sets have 20 explanatory variables (`X_00` to `X_19`) with a target variable called 'y'.  With the exception of setting `random_state` parameter to a known value, all other hyper-parameters were allowed to take their default value.

In [7]:
pd.read_parquet(os.path.join(TEST_DATA)).head()

Unnamed: 0,county,X_00,X_01,X_02,X_03,X_04,X_05,X_06,X_07,X_08,...,X_11,X_12,X_13,X_14,X_15,X_16,X_17,X_18,X_19,y
0,cnty0005,-0.047924,0.876722,-1.69104,2.626353,0.663601,0.667906,0.707864,1.312462,1.974233,...,-0.736528,-0.6912,-0.903873,-1.193744,0.687999,-0.800075,-1.300539,-0.639801,0.175235,93.457985
1,cnty0010,-0.401803,-0.685433,-0.823452,-0.191975,-0.232961,0.288845,0.599367,-1.502481,-0.917875,...,1.726174,0.693494,1.264482,1.459226,-0.508734,0.324457,1.48368,-0.582111,-0.202034,202.375458
2,cnty0000,-0.699637,0.310477,-0.535438,-0.361965,0.234813,-0.303082,-0.433491,-1.283665,0.634701,...,0.599885,-0.93146,-1.013379,0.504252,-0.556672,0.119437,1.545638,-1.011144,-0.343707,-73.529205
3,cnty0012,-0.810314,0.612156,-0.563249,0.46172,0.393322,0.8736,-0.676884,0.017982,1.143867,...,-0.559798,-0.53823,0.14518,-0.487649,1.367196,0.176917,-1.886566,1.49764,0.867901,-93.428177
4,cnty0003,-0.063003,-0.254967,0.643265,2.217894,0.429902,1.054095,0.624055,1.037485,-0.754566,...,-0.610848,-0.195606,1.082975,-0.200524,-0.438538,-1.54788,-0.146857,1.458846,-0.724704,-7.727258


For each Random Forest model, saved two model files: sklearn (.pkl file extension) and onnx (.onnx file extension).  These files are indentified by the `county` data used to generate the RF model.  Here is a sample of saved model files.

In [8]:
os.listdir(MODELS_DIR)[:10]

['cnty0000.onnx',
 'cnty0000.pkl',
 'cnty0001.onnx',
 'cnty0001.pkl',
 'cnty0002.onnx',
 'cnty0002.pkl',
 'cnty0003.onnx',
 'cnty0003.pkl',
 'cnty0004.onnx',
 'cnty0004.pkl']

### Model Scoring
For this test selected 100 random records from the test data.  For each record performed the following:
```
# Process test batch
while there are input records:
    Read one record
    Based on 'county' value, load the RF model for that 'county'  
    Record time to load and make model useable
    Record process memory RSS value
    Score the record
    Record time to score the record

# record collected metrics
Write collected run-time metrics to an external file.
```
No explicit caching of model objects were done in this test.

## Analysis of RF Tree Structure


In [9]:
# collect data on RF tree structure
tree_metrics = []
rf_models = glob.glob(os.path.join(MODELS_DIR, '*.pkl'))
for model in rf_models:
    # get file sizes
    fp_parts = os.path.splitext(model)
    metrics = {'model': fp_parts[0].split('/')[-1]}
    metrics['sklearn_file_size_mb'] = os.path.getsize(model) / (1024 * 1024)
    metrics['onnx_file_size_mb'] = os.path.getsize(fp_parts[0] + '.onnx') / (1024 * 1024)
    
    # extract tree structure
    with open(model, 'rb') as f:
        rf = pickle.load(f)
    metrics['number_of_trees'] = len(rf.estimators_)
    tree_depth = [tree.tree_.max_depth for tree in rf.estimators_]
    metrics['tree_min_depth'] = np.min(tree_depth)
    metrics['tree_max_depth'] = np.max(tree_depth)
    metrics['tree_mean_depth'] = np.mean(tree_depth)
    
    del rf
        
    # collect metrics
    tree_metrics.append(metrics)

In [10]:
# overview of tree structure
tree_metrics_df = pd.DataFrame(tree_metrics)
tree_metrics_df

Unnamed: 0,model,sklearn_file_size_mb,onnx_file_size_mb,number_of_trees,tree_min_depth,tree_max_depth,tree_mean_depth
0,cnty0000,31.296801,19.027223,100,20,25,22.52
1,cnty0001,30.882494,18.774771,100,20,27,21.94
2,cnty0002,30.513475,18.54988,100,20,26,22.24
3,cnty0003,31.265795,19.008325,100,20,30,22.79
4,cnty0004,30.445482,18.508421,100,20,27,22.15
5,cnty0005,31.025072,18.861608,100,20,27,22.35
6,cnty0006,30.592455,18.598022,100,20,27,22.32
7,cnty0007,30.534838,18.562901,100,19,25,21.91
8,cnty0008,31.118578,18.918609,100,19,29,22.42
9,cnty0009,30.592455,18.597977,100,19,27,22.53


In [11]:
tree_metrics_of_interest = ['sklearn_file_size_mb', 'onnx_file_size_mb',
                           'number_of_trees', 'tree_min_depth', 'tree_mean_depth', 'tree_max_depth']
tree_metrics_df[tree_metrics_of_interest].describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
sklearn_file_size_mb,20.0,30.897905,0.415385,30.369677,30.578051,30.805956,31.124498,31.940233
onnx_file_size_mb,20.0,18.784136,0.253127,18.462207,18.589208,18.728096,18.922213,19.419286
number_of_trees,20.0,100.0,0.0,100.0,100.0,100.0,100.0,100.0
tree_min_depth,20.0,19.8,0.410391,19.0,20.0,20.0,20.0,20.0
tree_mean_depth,20.0,22.3625,0.284788,21.84,22.2175,22.37,22.53,22.89
tree_max_depth,20.0,27.0,1.65434,25.0,26.0,27.0,27.25,31.0


## Model Scoring Test

### Get data for test

In [12]:
test_df = pd.read_parquet(TEST_DATA)
test_df = test_df.sample(n=100, random_state=config['random_seed'])
test_df.reset_index(inplace=True)
test_df.shape

(100, 23)

In [13]:
test_df[['index','county', 'y']].head(10)

Unnamed: 0,index,county,y
0,12136,cnty0015,-31.640778
1,16812,cnty0004,123.967995
2,10072,cnty0011,-59.197975
3,5850,cnty0010,-154.407074
4,4320,cnty0014,5.997156
5,4429,cnty0003,37.982521
6,11744,cnty0011,-19.720573
7,12106,cnty0011,271.501892
8,4230,cnty0019,140.779358
9,16858,cnty0002,45.374973


In [14]:
metrics_of_interest = ['model_process_rss_mb', 'model_load_time_ms', 'model_score_time_ms', 'model_prediction_time_ms']

### sklearn Model Scoring

In [15]:
%%time
sklearn_driver = BenchmarkDriver(
    model_type='sklearn',
    models_dir=MODELS_DIR,
    performance_fp=os.path.join(PERFORMANCE_DATA_DIR, 'sklearn_benchmark.csv'),
    test_scenario='county-level'
)

# iterate over each row and collect run-time performance statistics
for idx, row in test_df.iterrows():
    one_record = pd.DataFrame(row).T
    sklearn_driver.score_one_record(row['county'], row['index'],
            one_record.drop(['index', 'county', 'y'], axis='columns'))
    


CPU times: user 2.51 s, sys: 1.59 s, total: 4.1 s
Wall time: 4.17 s


In [16]:
# display collected performance metrics
sklearn_metrics_df = pd.read_csv(os.path.join(PERFORMANCE_DATA_DIR, 'sklearn_benchmark.csv'))
sklearn_metrics_df['model_prediction_time_ms'] = sklearn_metrics_df['model_load_time_ms'] + sklearn_metrics_df['model_score_time_ms']
sklearn_metrics_df.head(10)

Unnamed: 0,county_id,record_id,test_scenario,model_load_time_ms,model_process_rss_mb,model_score_time_ms,predicted_score,model_prediction_time_ms
0,cnty0015,12136,county-level,16.7756,280.324219,27.536,-116.880853,44.3116
1,cnty0004,16812,county-level,13.6132,277.683594,24.4199,153.039992,38.0331
2,cnty0011,10072,county-level,12.7818,278.523438,22.9721,-30.62064,35.7539
3,cnty0010,5850,county-level,12.61,278.574219,23.6123,-89.091659,36.2223
4,cnty0014,4320,county-level,12.5339,278.574219,23.7674,79.284982,36.3013
5,cnty0003,4429,county-level,12.2223,279.359375,25.6955,31.401549,37.9178
6,cnty0011,11744,county-level,10.5364,279.519531,26.8515,-5.722466,37.3879
7,cnty0011,12106,county-level,10.4079,279.542969,23.3873,139.301654,33.7952
8,cnty0019,4230,county-level,12.6411,280.351562,24.2326,103.776609,36.8737
9,cnty0002,16858,county-level,13.2203,277.867188,23.7277,72.877299,36.948


In [17]:
sklearn_metrics_df[metrics_of_interest].describe(percentiles=[.25, .5, .75, .9]).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,90%,max
model_process_rss_mb,100.0,277.165508,3.598184,270.007812,271.863281,279.369141,279.71875,279.757812,281.035156
model_load_time_ms,100.0,11.843735,1.590135,10.0021,10.574125,11.4846,12.7335,13.75386,17.6513
model_score_time_ms,100.0,24.244508,1.618232,20.0178,23.1835,24.1729,25.056525,26.68531,29.9177
model_prediction_time_ms,100.0,36.088243,2.662321,30.3834,34.058375,35.8493,37.68785,39.43934,44.3116


### onnx scoring test

In [18]:
%%time
onnx_driver = BenchmarkDriver(
    model_type='onnx',
    models_dir=MODELS_DIR,
    performance_fp=os.path.join(PERFORMANCE_DATA_DIR, 'onnx_benchmark.csv'),
    test_scenario='county-level'
)

# iterate over each row and collect run-time performance statistics
for idx, row in test_df.iterrows():
    one_record = pd.DataFrame(row).T
    onnx_driver.score_one_record(row['county'], row['index'],
            one_record.drop(['index', 'county', 'y'], axis='columns').astype(np.float32).to_numpy())


CPU times: user 57.2 s, sys: 1.76 s, total: 59 s
Wall time: 55.7 s


In [19]:
# display collected performance metrics
onnx_metrics_df = pd.read_csv(os.path.join(PERFORMANCE_DATA_DIR, 'onnx_benchmark.csv'))
onnx_metrics_df['model_prediction_time_ms'] = onnx_metrics_df['model_load_time_ms'] + onnx_metrics_df['model_score_time_ms']
onnx_metrics_df.head(10)

Unnamed: 0,county_id,record_id,test_scenario,model_load_time_ms,model_process_rss_mb,model_score_time_ms,predicted_score,model_prediction_time_ms
0,cnty0015,12136,county-level,723.483,455.210938,14.4311,-116.88086,737.9141
1,cnty0004,16812,county-level,643.0511,462.628906,0.1937,153.04,643.2448
2,cnty0011,10072,county-level,650.2063,463.265625,0.2074,-30.620642,650.4137
3,cnty0010,5850,county-level,615.2095,470.375,0.1924,-89.09167,615.4019
4,cnty0014,4320,county-level,628.8072,462.816406,0.2343,79.28498,629.0415
5,cnty0003,4429,county-level,626.4542,471.441406,1.1803,31.401548,627.6345
6,cnty0011,11744,county-level,515.1161,465.683594,0.2175,-5.722467,515.3336
7,cnty0011,12106,county-level,474.2158,471.097656,0.1847,139.30164,474.4005
8,cnty0019,4230,county-level,634.9107,472.910156,0.2684,103.77659,635.1791
9,cnty0002,16858,county-level,590.2439,474.558594,9.6054,72.87731,599.8493


In [20]:
onnx_metrics_df[metrics_of_interest].describe(percentiles=[.25, .5, .75, .9]).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,90%,max
model_process_rss_mb,100.0,469.339023,4.351273,455.210938,467.236328,470.074219,472.373047,473.409375,486.964844
model_load_time_ms,100.0,515.360931,55.115653,463.5628,482.256825,490.78245,515.446775,612.46479,723.483
model_score_time_ms,100.0,0.796268,2.332304,0.1698,0.194575,0.203,0.234625,0.26669,14.4311
model_prediction_time_ms,100.0,516.157199,56.046973,463.8293,482.466975,491.1588,515.65735,614.73653,737.9141


### Differences in sklearn vs onnx predictions

In [21]:
differences = np.abs(sklearn_metrics_df['predicted_score'] - onnx_metrics_df['predicted_score'])
differences.describe()

count    1.000000e+02
mean     1.332589e-05
std      1.782682e-05
min      1.535873e-07
25%      2.125662e-06
50%      7.193430e-06
75%      1.752127e-05
max      1.057077e-04
Name: predicted_score, dtype: float64