Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Convert Models and Tune Performance with OLive Python SDK

This notebook demos how to use OLive Python SDK to convert a model from other model framework to ONNX, and then tune performance for the converted ONNX model.

# 0. Prerequisites

1) Make sure you have [Docker](https://www.docker.com/get-started) installed and running. 

2) Install python dependency modules

In [1]:
import sys
!{sys.executable} -m pip install wget netron docker pandas onnx



You are using pip version 19.0.3, however version 20.0.2 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.


3) Download OLive Python SDK

In [2]:
import os
import wget

url = "https://raw.githubusercontent.com/microsoft/OLive/master/utils/"
sdk_files = ["onnxpipeline.py", "convert_test_data.py", "config.py"]
sdk_dir = "./python_sdk"
if not os.path.exists(sdk_dir):
    os.makedirs(sdk_dir)

for filename in sdk_files:
    target_file = os.path.join(sdk_dir, filename)
    if os.path.exists(target_file):
        os.remove(target_file)
    wget.download(url + filename, target_file)
    print("Downloaded", filename)

100% [..............................................................................] 19457 / 19457Downloaded onnxpipeline.py
100% [................................................................................] 2694 / 2694Downloaded convert_test_data.py
100% [..................................................................................] 709 / 709Downloaded config.py


4) Pull the latest `mcr.microsoft.com/onnxruntime/onnx-converter` and `mcr.microsoft.com/onnxruntime/perf-tuning` docker images from MCR. This should take several minutes.

In [3]:
# pull onnx-converter and perf-tuning docker images from mcr
!docker pull mcr.microsoft.com/onnxruntime/onnx-converter
!docker pull mcr.microsoft.com/onnxruntime/perf-tuning

Using default tag: latest
latest: Pulling from onnxruntime/onnx-converter
Digest: sha256:37479e01a7c4cd2e77c012f0fc3bb4e89e1b45b72c0d4bb22621286c5c02aa26
Status: Image is up to date for mcr.microsoft.com/onnxruntime/onnx-converter:latest
mcr.microsoft.com/onnxruntime/onnx-converter:latest
Using default tag: latest
latest: Pulling from onnxruntime/perf-tuning
Digest: sha256:8003f5ecd2e11c2fdad31610294982404954570c4ced0caed7b9b6f268e9388f
Status: Image is up to date for mcr.microsoft.com/onnxruntime/perf-tuning:latest
mcr.microsoft.com/onnxruntime/perf-tuning:latest


Now you're ready to use OLive to convert model and tune performance. 

## 1. Convert Model To ONNX

Convert the model of your choice to ONNX. The Python scripts will run OLive `onnx-converter` Docker image in the backend. It first converts the model, then tries to generate input test data if none provided, and finally runs the converted model alongside the original model for correctness check. For more information on the `onnx-converter` Docker image, please see [onnx-converter README.md](https://github.com/microsoft/OLive/blob/master/docker-images/onnx-converter/README.md)

### Prepare model and test data files

First you'll need to prepare your model and test data files. Supported model frameworks are - cntk, coreml, keras, scikit-learn, tensorflow and pytorch. 

For `onnx-converter`, test data are used for a quick verification of correctness for the converted model. We strongly recommend you provide your own test input and output files. However if no input files are provided, OLive will randomly generate dummy inputs for you if possible. Only one test data set is needed, but feel free to put in more data sets as they will be available for the next `perf-tuning` step to use. 

You can put your test data in one of - 

  1) Your input folder with your model from another framework.
  
  2) Your output folder created in advance to hold your converted ONNX model.
  
  3) Any other location. Need to specify the path with `--test_data_path` parameter to `onnx-converter`.
  
The best practice to put your input model file(s) and test data(optional) is **2)**. By putting test_data_sets in the "output" folder instead of the "input" folder, this approach avoids copying files in the backend. The folder structure will be as below:

    - your_input_folder
       - model_file(s)
    - your_output_folder_to_hold_onnx_file
       - test_data_set_0
           - input_0.pb
           - ...
           - output_0.pb
           - ...
       - test_data_set_1
           - ...
       ...
       - (your .onnx file after running "onnx-converter")



#### [OPTIONAL] Convert Test Data to ONNX pb 

ONNX .pb files are expected for test data. If you're more familiar with pickle files, we provide a convenient script to convert your pickle data to pb. Dump your input data to a single pickle file in the following dict format - 

    {
        "input_name_0": input_data_0,
        "input_name_1": input_data_1, 
        ...
    }
    
or if dumping output data - 

    {
        "output_name_0": output_data_0,
        "output_name_1": output_data_1, 
        ...
    }

Then use the [convert_test_data.py](https://github.com/microsoft/OLive/blob/master/utils/convert_test_data.py) to convert your pickle file to pb files.

Run convert_test_data.py to convert your pickle file. This script will read your pickle file and dump the data to a folder named "test_data_set_0" by default. Note that ONNX naming convention for test data folder is "test_data_*". Make sure to pass `--output_folder` with a folder name starting with `test_data_`. 

If `--is_input=True`, data will be generated to `input_*.pb`s. Set `--is_input` to false if you'd like to generate output pbs, in which data will be generated to `output_*.pb`s.

In [4]:
!./python_sdk/convert_test_data.py <your_input_pickle_file> --output_folder <output_folder> --is_input=True

The system cannot find the file specified.


Now you're ready to use OLive to convert model and tune performance. In this tutorial, we'll use [MNIST model from ONNX model zoo](https://github.com/onnx/models/tree/master/vision/classification/mnist) as an example to demo the OLive pipeline. Below are some code to download the model.

In [5]:
import sys

# Download and store the model to the desire directory. Modify these lines to point to your local model directory 
model_url = "https://onnxzoo.blob.core.windows.net/models/opset_8/mnist/mnist.tar.gz"
model_dir = "test_models"
absolute_model_dir = os.path.abspath(model_dir)
if not os.path.exists(model_dir):
    print("Creating model directory ", model_dir)
    os.makedirs(model_dir)

import tarfile
from urllib.request import urlretrieve

file_tmp = urlretrieve(model_url, filename=None)[0]

tar = tarfile.open(file_tmp)
tar.extractall(model_dir)
print("Model successfully downloaded and extracted in ", model_dir)

Creating model directory  test_models
Model successfully downloaded and extracted in  test_models


### Initiate onnxpipeline

After the model files are stored in a directory, we can pass the directory into onnxpipeline to start the model conversion. The parameters for onnxpipeline are listed below. 

    (1) local_directory: string
        Required. The path of local directory where would be mounted to the docker. All operations will be executed from this path.

    (2) mount_path: string
        Optional. The path where the local_directory will be mounted in the docker. Default is "/mnt/model".

    (3) print_logs: boolean
        Optional. Whether print the logs from the docker. Default is True.

    (4) convert_directory: string
        Optional. The directory path for converting model. Default is test/.    

    (5) convert_name: string
        Optional. The model name for converting model. Default is model.onnx.



In [6]:
sys.path.append('./python_sdk')
import onnxpipeline

# Initiate ONNX pipeline with your model path
pipeline = onnxpipeline.Pipeline(os.path.join(model_dir, 'mnist'))

# Check the configs
pipeline.config()

-----------config----------------
           Container information: <docker.client.DockerClient object at 0x00000241A77A6DA0>
 Local directory path for volume: C:\Users\ziyl.NORTHAMERICA\OLive\notebook/test_models\mnist
Volume directory path in dockers: /mnt/model
                     Result path: result
        Converted directory path: test
        Converted model filename: model.onnx
            Converted model path: test/model.onnx
        Print logs in the docker: True


### Convert Model From Various Frameworks to ONNX Format

OLive is capable of converting models from most major frameworks to ONNX. Different frameworks may require different inputs. Check the commented code below to see what parameters are required for each supported model framework.

In [7]:

model = pipeline.convert_model(model='model.onnx', model_type='onnx')

# test tensorflow savedmodel (no need to specify "model=" as it takes a directory as input)
# model = pipeline.convert_model(model_type='tensorflow')

# test tensorflow savedgraph and checkpoints
# model = pipeline.convert_model(model_type='tensorflow', model='saved_model.pb/meta', 
#                                model_input_names="input_name_1,input_name_2...", 
#                                model_output_names="output_name_1,output_name_2...")

# test pytorch
# model = pipeline.convert_model(model_type='pytorch', model='saved_model.pb', model_input_shapes='(1,3,224,224)')

# test cntk
# model = pipeline.convert_model(model_type='cntk', model='ResNet50_ImageNet_Caffe.model')

# test keras
# model = pipeline.convert_model(model_type='keras', model='keras_Average_ImageNet.keras')

# test sklearn
# model = pipeline.convert_model(model_type='scikit-learn', model='sklearn_svc.joblib', initial_types=("float_input", "FloatTensorType([1,4])"))

# test caffe
# model = pipeline.convert_model(model_type='caffe', model='bvlc_alexnet.caffemodel')




-------------

Model Conversion



Input model is already ONNX model. Skipping conversion.



-------------

MODEL INPUT GENERATION(if needed)



Test data .pb files found under /mnt/model/test_data_set_0. 

copying /mnt/model/test_data_set_0 to /mnt/model/test/test_data_set_0

Test data .pb files found under /mnt/model/test_data_set_1. 

copying /mnt/model/test_data_set_1 to /mnt/model/test/test_data_set_1

Test data .pb files found under /mnt/model/test_data_set_2. 

copying /mnt/model/test_data_set_2 to /mnt/model/test/test_data_set_2

Test data .pb files already exist. Skipping dummy input generation. 



-------------

MODEL CORRECTNESS VERIFICATION





Check the ONNX model for validity 

The ONNX model is valid.



The original model is already onnx. Skipping correctness test. 



-------------

MODEL CONVERSION SUMMARY (.json file generated at /mnt/model/test/output.json )



{'conversion_status': 'SUCCESS',

 'correctness_verified': 'SKIPPED',

 'error_message': '',

 'inpu

# 2. Performance Tuning

Now that you have an ONNX model and its test data, you can run performance tuning using OLive Python SDK `pipeline.perf_tuning` with your desired options. A complete list of available options are listed below. 

    (1) model: string

        Required. The path of the model that wants to be performed.

    (2) result: string

        Optional. The path of the result.

    (3) config: string (choices=["Debug", "MinSizeRel", "Release", "RelWithDebInfo"])

        Optional. Configuration to run. Default is "RelWithDebInfo".

    (4) test_mode: string (choices=["duration", "times"])

        Optional. Specifies the test mode. Value could be 'duration' or 'times'. Default is "times".

    (5) execution_provider: string (choices=["cpu", "cuda", "mkldnn"])

        Optional. help="Specifies the provider 'cpu','cuda','mkldnn'. Default is ''.

    (6) repeated_times: integer

        Optional. Specifies the repeated times if running in 'times' test mode. Default:20.

    (7) duration_times: integer

        Optional. Specifies the seconds to run for 'duration' mode. Default:10.

    (8) parallel: boolean

        Optional. Use parallel executor, default (without -x): sequential executor.

    (9) intra_op_num_threads: integer

        Optional. Sets the number of threads used to parallelize the execution within nodes, A value of 0 means ORT will pick a default. Must >=0.

    (10) inter_op_num_threads: integer

        Optional. Sets the number of threads used to parallelize the execution of the graph (across nodes), A value of 0 means ORT will pick a default. Must >=0.

    (11) top_n: integer

        Optional. Show percentiles for top n runs. Default:5.

    (12) runtime: boolean

        Optional. Use this boolean flag to enable GPU if you have one.

    (13) input_json: string

        Optional. Use JSON file as input parameters.

In [8]:
result = pipeline.perf_tuning(model=model)
#result = pipeline.perf_tuning()   # is ok, too

Setting intra_op_num_threads to 1

Session creation time cost:0.0043349 s

Total inference time cost:0.0011741 s

Total inference requests:20

Average inference time cost:0.058705 ms

Total inference run time:0.0011951 s

Setting intra_op_num_threads to 1

Session creation time cost:0.0044639 s

Total inference time cost:0.0079206 s

Total inference requests:20

Average inference time cost:0.39603 ms

Total inference run time:0.0079622 s

Setting intra_op_num_threads to 1

Session creation time cost:0.0047126 s

Total inference time cost:0.0087747 s

Total inference requests:20

Average inference time cost:0.438735 ms

Total inference run time:0.0088293 s

Setting intra_op_num_threads to 1

Session creation time cost:0.0107289 s

Total inference time cost:0.0020869 s

Total inference requests:20

Average inference time cost:0.104345 ms

Total inference run time:0.0021279 s

Setting intra_op_num_threads to 1

Session creation time cost:0.0050273 s

Total inference time cost:0.0101048 s


Setting intra_op_num_threads to 1

Session creation time cost:0.0065263 s

Total inference time cost:0.0184397 s

Total inference requests:200

Average inference time cost:0.0921985 ms

Total inference run time:0.0187502 s

Setting intra_op_num_threads to 1

Session creation time cost:0.0059525 s

Total inference time cost:0.0110159 s

Total inference requests:200

Average inference time cost:0.0550795 ms

Total inference run time:0.011183 s

Setting intra_op_num_threads to 1

Session creation time cost:0.0047322 s

Total inference time cost:0.0141941 s

Total inference requests:200

Average inference time cost:0.0709705 ms

Total inference run time:0.0144059 s

Setting intra_op_num_threads to 0

Session creation time cost:0.0041639 s

Total inference time cost:0.0158238 s

Total inference requests:200

Average inference time cost:0.079119 ms

Total inference run time:0.0160525 s

Setting intra_op_num_threads to 1

[00:02:04] /home/ziyl/onnxruntime/cmake/external/tvm/src/pass/arg_binde

In [9]:
pipeline.print_performance(result)
#pipeline.print_result() # is ok, too

mklml_2_OMP_threads_OMP_WAIT_POLICY_passive 0.055078999999999996 ms

mklml_2_OMP_threads_OMP_WAIT_POLICY_active 0.05899 ms

mklml_OMP_WAIT_POLICY_active 0.070385 ms

dnnl_OMP_WAIT_POLICY_active 0.07097 ms

dnnl 0.086905 ms

dnnl_2_OMP_threads_OMP_WAIT_POLICY_active 0.18291 ms

nuphar 0.07813999999999999 ms

nuphar_2_OMP_threads_OMP_WAIT_POLICY_active 0.07799 ms

nuphar_OMP_WAIT_POLICY_passive 0.092655 ms

cpu 0.079119 ms

cpu_openmp_2_OMP_threads_OMP_WAIT_POLICY_active 0.092198 ms

cpu_openmp_OMP_WAIT_POLICY_active 0.10434500000000001 ms

cpu_openmp_2_OMP_threads_OMP_WAIT_POLICY_passive 0.39603 ms

ngraph_2_OMP_threads_OMP_WAIT_POLICY_active error

ngraph_2_OMP_threads_OMP_WAIT_POLICY_passive error

ngraph error

ngraph_OMP_WAIT_POLICY_active error

ngraph_OMP_WAIT_POLICY_passive error



# 3. Result Visualization

After performance test, there would be a directory for results. 

This libray use Pandas.read_json to visualize JSON file. (orient is changeable.)

"latency.json" contains the raw data of results ordered by the average time. 

Use .latency to obtain the original latency JSON; Use .profiling to obtain original top 5 profiling JSON.

In [10]:
r = pipeline.get_result(result)
#r.latency
#r.profiling
print(result)

C:\Users\ziyl.NORTHAMERICA\OLive\notebook/test_models\mnist/result


# Print latency.json

Provide parameters for top 5 performace. Use the parameter "top" to visualize more results.

In [11]:
r.prints()

Unnamed: 0,name,avg,p90,p95,cpu_usage,gpu_usage,memory_util,code_snippet
0,mklml_2_OMP_threads_OMP_WAIT_POLICY_passive,0.055079,0.0748,0.0802,0.96812,0,0.15598,"OrderedDict([('execution_provider', 'mklml'), ..."
1,dnnl_OMP_WAIT_POLICY_active,0.07097,0.0934,0.1266,0.96048,0,0.15672,"OrderedDict([('execution_provider', 'dnnl'), (..."
2,nuphar,0.07814,0.1143,0.1291,0.96938,0,0.15771,"OrderedDict([('execution_provider', 'nuphar'),..."
3,cpu,0.079119,0.0991,0.1245,0.94437,0,0.15657,"OrderedDict([('execution_provider', 'cpu'), ('..."
4,cpu_openmp_2_OMP_threads_OMP_WAIT_POLICY_active,0.092198,0.1232,0.1347,0.97112,0,0.15656,"OrderedDict([('execution_provider', 'cpu_openm..."


# Print profiling.json

Only provide profiling JSON for top 5 performace by giving certain index of the result. The file name is profile_[name].json
    
    (1) index: integer
        Required. The index for top 5 profiling files.
    (2) top: integer
        The number for top Ops.


In [12]:
r.print_profiling(index=4, top=5)

Unnamed: 0,cat,pid,tid,dur,ts,ph,name,args
0,Node,93,93,960,14548,X,ReLU32_Output_0_nchwc_6,"{'provider': 'CPUExecutionProvider', 'op_name'..."
1,Node,93,93,491,15563,X,ReLU114_Output_0_nchwc_11,"{'provider': 'CPUExecutionProvider', 'op_name'..."
2,Node,93,93,440,16177,X,ReLU32_Output_0_nchwc_6,"{'provider': 'CPUExecutionProvider', 'op_name'..."
3,Node,93,93,416,17889,X,ReLU32_Output_0_nchwc_6,"{'provider': 'CPUExecutionProvider', 'op_name'..."
4,Node,93,93,414,18464,X,ReLU32_Output_0_nchwc_6,"{'provider': 'CPUExecutionProvider', 'op_name'..."


# Get code snippets

In [13]:
print(r.get_code(ep='cpu'))


import onnxruntime as ort
so = rt.SessionOptions()
so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
so.execution_mode = rt.ExecutionMode.ORT_SEQUENTIAL


session = rt.Session("/mnt/model/test/model.onnx", so, providers=["CPUExecutionProvider"])



# netron

In [15]:
# only workable for notebook in the local server 
import netron
netron.start("test_models/mnist/test/model.onnx", browse=False) # 'model.onnx'
from IPython.display import IFrame
IFrame('http://localhost:8080', width="100%", height=1000)

Serving 'test_models/mnist/test/model.onnx' at http://localhost:8080


In [16]:
netron.stop()


Stopping http://localhost:8080
