# Reporting memory size


## Metrics reported in this notebook are from synthetic data and **have not** been calibrated to representative dataset or model sizes.


## Notebook run-time enviornment
* **Hardware:** MacBook Pro Intel(2019), 16GB RAM, 1TB SSD drive
* **OS:** MacOS 11.6.1
* **Docker:** Docker for Desktop 4.2.0 (Mac)
* **Docker Image:** Base image: `jupyter/datascience-notebook:lab-3.2.5` with ONNX packages added

## Key software versions

In [1]:
!python --version

Python 3.9.7


In [2]:
!conda list -n onnx_sandbox | grep "\(onnx\|scikit\|numpy\|pandas\)"

# packages in environment at /opt/conda/envs/onnx_sandbox:
numpy                     1.21.2           py39h20f2e39_0    defaults
numpy-base                1.21.2           py39h79a1101_0    defaults
onnx                      1.10.2           py39h8b1bc1a_2    conda-forge
onnxconverter-common      1.8.1              pyhd8ed1ab_0    conda-forge
onnxruntime               1.10.0           py39h15e0acf_2    conda-forge
pandas                    1.3.5            py39h8c16a72_0    defaults
scikit-learn              1.0.1            py39h51133e4_0    defaults
skl2onnx                  1.10.3             pyhd8ed1ab_0    conda-forge


## Import required libraries

In [3]:
import os
import pandas as pd
import numpy as np
import onnxruntime as rt
import pickle
import gc

## Setup on configuration for analysis

In [4]:
# required to allow for import of project speccific utility functions
os.chdir('..')

In [5]:
# import project specific utiity functions
from utils.utils import load_config, actualsize_mb, actualsize, deep_getsizeof

In [6]:
# get configuration parameters
config = load_config('./config.yaml')
config

{'data_dir': '/Users/jim/Desktop/onnx_sandbox/data',
 'models_dir': '/Users/jim/Desktop/onnx_sandbox/models',
 'number_records': 100000,
 'number_features': 20,
 'number_informative': 14,
 'number_trees': 500,
 'fraction_for_test': 0.2,
 'number_counties': 20,
 'random_seed': 123}

In [7]:
COUNTY_ID = 'cnty0000'
MODELS_DIR = config['models_dir']

## Repeated model load test

In [11]:
%%time

with open(os.path.join(MODELS_DIR, 'benchmark', COUNTY_ID+'.pkl'), 'rb') as f:
    rf_pkl_model = pickle.load(f)

print(f'type: {type(rf_pkl_model)}') 
print(f'actualsize: {actualsize(rf_pkl_model):,} bytes, deep_getsizeof: {deep_getsizeof(rf_pkl_model, set()):,} bytes')


type: <class 'sklearn.ensemble._forest.RandomForestRegressor'>
actualsize: 57,661,964 bytes, deep_getsizeof: 48 bytes
CPU times: user 345 ms, sys: 20.3 ms, total: 366 ms
Wall time: 365 ms


## Repeated model load test  - Delete model object after load

In [9]:
%%time
del rf_pkl_model
for i in range(20):
    with open(os.path.join(MODELS_DIR, 'testbed', COUNTY_ID+'.pkl'), 'rb') as f:
        rf_pkl_model = pickle.load(f)
    print(f'trial {i+1}, memory size: {actualsize(rf_pkl_model):,} bytes')
    del rf_pkl_model

trial 1, memory size: 56,928,411 bytes
trial 2, memory size: 56,928,411 bytes
trial 3, memory size: 56,928,411 bytes
trial 4, memory size: 56,928,718 bytes
trial 5, memory size: 56,928,718 bytes
trial 6, memory size: 56,928,718 bytes
trial 7, memory size: 56,928,718 bytes
trial 8, memory size: 56,928,718 bytes
trial 9, memory size: 56,928,718 bytes
trial 10, memory size: 56,928,718 bytes
trial 11, memory size: 56,928,718 bytes
trial 12, memory size: 56,928,718 bytes
trial 13, memory size: 56,928,718 bytes
trial 14, memory size: 56,928,718 bytes
trial 15, memory size: 56,928,718 bytes
trial 16, memory size: 56,928,718 bytes
trial 17, memory size: 56,928,718 bytes
trial 18, memory size: 56,928,718 bytes
trial 19, memory size: 56,928,718 bytes
trial 20, memory size: 56,928,718 bytes
CPU times: user 5.65 s, sys: 197 ms, total: 5.84 s
Wall time: 5.83 s


## Repeated model load test  - Delete model object after load & garbage collect

In [10]:
%%time
print(f'gc thresholds: {gc.get_threshold()}, gc counts: {gc.get_count()}')
gc.collect()
for i in range(20):
    with open(os.path.join(MODELS_DIR, 'testbed', COUNTY_ID+'.pkl'), 'rb') as f:
        rf_pkl_model = pickle.load(f)
    print(f'trial {i+1}, memory size: {actualsize(rf_pkl_model):,} bytes')
    del rf_pkl_model
    gc.collect()
print(f'gc thresholds: {gc.get_threshold()}, gc counts: {gc.get_count()}')

gc thresholds: (700, 10, 10), gc counts: (255, 2, 1)
trial 1, memory size: 56,932,774 bytes
trial 2, memory size: 56,932,774 bytes
trial 3, memory size: 56,932,774 bytes
trial 4, memory size: 56,932,774 bytes
trial 5, memory size: 56,932,774 bytes
trial 6, memory size: 56,932,774 bytes
trial 7, memory size: 56,932,774 bytes
trial 8, memory size: 56,932,774 bytes
trial 9, memory size: 56,932,774 bytes
trial 10, memory size: 56,932,774 bytes
trial 11, memory size: 56,932,774 bytes
trial 12, memory size: 56,932,774 bytes
trial 13, memory size: 56,932,774 bytes
trial 14, memory size: 56,932,774 bytes
trial 15, memory size: 56,932,774 bytes
trial 16, memory size: 56,932,774 bytes
trial 17, memory size: 56,935,198 bytes
trial 18, memory size: 56,935,198 bytes
trial 19, memory size: 56,935,198 bytes
trial 20, memory size: 56,935,198 bytes
gc thresholds: (700, 10, 10), gc counts: (27, 0, 0)
CPU times: user 6.61 s, sys: 227 ms, total: 6.84 s
Wall time: 6.85 s
