# ONNX Scoring Testbed
For each record to be scored, the "county" specific model is read from external storage.  No model caching in memory.

* `model_load_time` (msec) is the time required to read a model file from external storage and deserialize the model object.
* `model_score_time` (msec) is the time to score one record.

## Metrics reported in this notebook are from synthetic data and **have not** been calibrated to representative dataset or model sizes.

## Notebook run-time enviornment
* **Hardware:** MacBook Pro Intel(2019), 16GB RAM, 1TB SSD drive
* **OS:** MacOS 11.6.1
* **Docker:** Docker for Desktop 4.2.0 (Mac)
* **Docker Image:** Base image: `jupyter/datascience-notebook:lab-3.2.5` with ONNX packages added

In [1]:
import glob
import os
import shutil
import pandas as pd
import numpy as np
import onnxruntime as rt
import pickle

## Setup for tests

In [2]:
# required to allow for import of project speccific utility functions
os.chdir('..')

In [3]:
# import project specific utiity functions
from utils.utils import load_config
from utils.benchmark import BenchmarkDriver

In [4]:
# get configuration parameters
config = load_config('./config.yaml')
config

{'data_dir': '/Users/jim/Desktop/onnx_sandbox/data/testbed',
 'models_dir': '/Users/jim/Desktop/onnx_sandbox/models/testbed',
 'number_records': 100000,
 'number_features': 20,
 'number_informative': 14,
 'number_trees': 100,
 'fraction_for_test': 0.2,
 'number_counties': 20,
 'random_seed': 123}

In [5]:
TEST_DATA = os.path.join(config['data_dir'], 'test.parquet')
PERFORMANCE_DATA = './testbed/performance_data/runtime_metrics.csv'
MODELS_DIR = config['models_dir']


## Test Design

### Model Training
In an offline process, trained 20 Random Forest (RF) models using synthetic data generated by `sklearn.datasets.make_regression()` method.  These data sets have 20 explanatory variables (`X_00` to `X_19`) with a target variable called 'y'.  With the exception of setting `random_state` parameter to a known value, all other hyper-parameters were allowed to take their default value.

In [6]:
pd.read_parquet(os.path.join(TEST_DATA)).head()

Unnamed: 0,county,X_00,X_01,X_02,X_03,X_04,X_05,X_06,X_07,X_08,...,X_11,X_12,X_13,X_14,X_15,X_16,X_17,X_18,X_19,y
0,cnty0017,0.520762,-0.658164,0.234148,-1.949334,-1.042321,0.206521,1.468614,0.402485,-0.329271,...,1.148844,0.637367,-1.228588,-0.052683,-1.478516,-2.90579,-0.816058,-0.353458,-1.47899,-656.045166
1,cnty0002,-1.719247,0.145194,0.682192,1.195691,-0.298329,0.274303,1.889344,-1.349296,-0.027038,...,1.903116,0.8696,0.350681,0.064682,-0.017302,0.296412,-0.155503,-0.43012,0.386999,174.201584
2,cnty0005,-0.572314,-0.771446,-0.112503,0.274605,0.546655,0.919573,-1.015905,0.091013,-0.798503,...,0.244685,0.07757,0.540681,0.368747,-0.5386,0.481323,0.122542,1.925718,1.102175,79.194344
3,cnty0000,0.55809,-0.571827,1.589408,0.340502,-1.313514,0.500834,-0.114611,0.248661,-1.692944,...,-0.037938,-0.481162,1.001708,-1.416378,-1.177559,-0.419277,0.349913,-1.670872,0.788682,25.678637
4,cnty0018,-0.38347,-0.381675,1.018075,0.127476,-0.138571,-0.958881,0.165615,-0.3857,0.624114,...,1.220423,-0.427295,-0.472333,-0.883766,0.192109,0.349223,0.846627,1.677837,0.582563,58.992062


For each Random Forest model, saved two model files: sklearn (.pkl file extension) and onnx (.onnx file extension).  These files are indentified by the `county` data used to generate the RF model.  Here is a sample of saved model files.

In [7]:
os.listdir(MODELS_DIR)[:10]

['cnty0000.onnx',
 'cnty0000.pkl',
 'cnty0001.onnx',
 'cnty0001.pkl',
 'cnty0002.onnx',
 'cnty0002.pkl',
 'cnty0003.onnx',
 'cnty0003.pkl',
 'cnty0004.onnx',
 'cnty0004.pkl']

### Model Scoring
For this test selected 100 random records from the test data.  For each record performed the following:
```
# Process test batch
while there are input records:
    Read one record
    Based on 'county' value, load the RF model for that 'county'  
    Record time to load and make model useable
    Score the record
    Record process memory RSS value
    Record time to score the record

# record collected metrics
Write collected run-time metrics to an external file.
```
No explicit caching of model objects were done in this test.

## Model Scoring Test

### Get data for test

In [8]:
test_df = pd.read_parquet(TEST_DATA)
test_df = test_df.sample(n=100, random_state=config['random_seed'])
test_df.reset_index(inplace=True)
test_df.shape

(100, 23)

In [9]:
test_df[['index','county', 'y']].head(10)

Unnamed: 0,index,county,y
0,12136,cnty0009,21.728182
1,16812,cnty0018,-191.266129
2,10072,cnty0019,18.93894
3,5850,cnty0015,57.200191
4,4320,cnty0008,-154.245728
5,4429,cnty0019,-46.112324
6,11744,cnty0016,-48.300835
7,12106,cnty0008,247.827332
8,4230,cnty0009,-219.337921
9,16858,cnty0003,-271.940826


### onnx scoring test

In [10]:
class SklearnBenchmarkDriver(BenchmarkDriver):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        
    def _retrieve_this_model(self, model_fp:str):
        sess = rt.InferenceSession(model_fp)
        return sess
    
    def _score_this_model(self, model, record: np.array):
        input_name = model.get_inputs()[0].name
        label_name = model.get_outputs()[0].name
        prediction = model.run([label_name], {input_name: record})[0]
        return prediction[0][0]

In [11]:
%%time
onnx_driver = SklearnBenchmarkDriver(
    model_object_type='onnx',
    models_dir=MODELS_DIR,
    performance_fp=PERFORMANCE_DATA,
    test_scenario='county-level'
)

# iterate over each row and collect run-time performance statistics
for idx, row in test_df.iterrows():
    one_record = pd.DataFrame(row).T
    onnx_driver.score_one_record(row['county'], row['index'],
            one_record.drop(['index', 'county', 'y'], axis='columns').astype(np.float32).to_numpy())


CPU times: user 56.2 s, sys: 1.87 s, total: 58.1 s
Wall time: 53 s


In [12]:
# display collected performance metrics
onnx_metrics_df = pd.read_csv(PERFORMANCE_DATA)
onnx_metrics_df['model_prediction_time_ms'] = onnx_metrics_df['model_load_time_ms'] + onnx_metrics_df['model_score_time_ms']
onnx_metrics_df.head(10)

Unnamed: 0,county_id,record_id,test_scenario,model_load_time_ms,model_score_time_ms,model_process_rss_mb,model_prediction_time_ms
0,cnty0009,12136,county-level,585.380886,0.207013,353.285156,585.587899
1,cnty0018,16812,county-level,487.222308,0.168164,365.585938,487.390472
2,cnty0019,10072,county-level,510.826965,0.175447,369.53125,511.002412
3,cnty0015,5850,county-level,495.326703,0.190891,376.660156,495.517594
4,cnty0008,4320,county-level,486.042901,0.203304,370.144531,486.246205
5,cnty0019,4429,county-level,487.024016,0.19115,375.929688,487.215166
6,cnty0016,11744,county-level,519.378651,0.182273,376.160156,519.560924
7,cnty0008,12106,county-level,472.21341,0.182652,377.0625,472.396062
8,cnty0009,4230,county-level,486.411338,0.205934,370.003906,486.617272
9,cnty0003,16858,county-level,501.234722,0.158876,369.238281,501.393598


In [13]:
metrics_of_interest = ['model_load_time_ms', 'model_score_time_ms', 'model_process_rss_mb', 'model_prediction_time_ms']
onnx_metrics_df[metrics_of_interest].describe(percentiles=[.25, .5, .75, .9]).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,90%,max
model_load_time_ms,100.0,489.703945,16.81481,464.796114,478.754264,486.500518,496.362193,507.374053,585.380886
model_score_time_ms,100.0,0.185971,0.01568,0.141758,0.176183,0.185152,0.19484,0.204683,0.23604
model_process_rss_mb,100.0,374.036641,5.085973,353.285156,370.241211,374.796875,377.667969,378.825391,386.0
model_prediction_time_ms,100.0,489.889916,16.817014,465.000514,478.935512,486.673017,496.563544,507.559025,585.587899
