Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Convert Models and Tune Performance with OLive Python SDK

This notebook demos how to use OLive Python SDK to convert a model from other model framework to ONNX, and then tune performance for the converted ONNX model.

# 0. Prerequisites

1) Make sure you have [Docker](https://www.docker.com/get-started) installed and running. 

2) Install python dependency modules

In [65]:
import sys
!{sys.executable} -m pip install wget netron docker pandas onnx



You are using pip version 19.0.3, however version 20.0.2 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.




3) Download OLive Python SDK

In [51]:
import os
import wget

url = "https://raw.githubusercontent.com/microsoft/OLive/master/utils/"
sdk_files = ["onnxpipeline.py", "convert_test_data.py", "config.py"]
sdk_dir = "./python_sdk"
if not os.path.exists(sdk_dir):
    os.makedirs(sdk_dir)

for filename in sdk_files:
    target_file = os.path.join(sdk_dir, filename)
    if os.path.exists(target_file):
        os.remove(target_file)
    wget.download(url + filename, target_file)
    print("Downloaded", filename)

100% [..............................................................................] 19457 / 19457Downloaded onnxpipeline.py
100% [................................................................................] 2421 / 2421Downloaded convert_test_data.py
100% [..................................................................................] 709 / 709Downloaded config.py


3. Pull the latest `mcr.microsoft.com/onnxruntime/onnx-converter` and `mcr.microsoft.com/onnxruntime/perf-tuning` docker images from MCR. This should take several minutes.

In [52]:
# pull onnx-converter and perf-tuning docker images from mcr
!docker pull mcr.microsoft.com/onnxruntime/onnx-converter
!docker pull mcr.microsoft.com/onnxruntime/perf-tuning

Using default tag: latest
latest: Pulling from onnxruntime/onnx-converter
Digest: sha256:f279a9a704c14a1e32c0f798dfd0d2a483c7e56c08626cc47af8250987726ab2
Status: Image is up to date for mcr.microsoft.com/onnxruntime/onnx-converter:latest
mcr.microsoft.com/onnxruntime/onnx-converter:latest
Using default tag: latest
latest: Pulling from onnxruntime/perf-tuning
Digest: sha256:8003f5ecd2e11c2fdad31610294982404954570c4ced0caed7b9b6f268e9388f
Status: Image is up to date for mcr.microsoft.com/onnxruntime/perf-tuning:latest
mcr.microsoft.com/onnxruntime/perf-tuning:latest


Now you're ready to use OLive to convert model and tune performance. 

## 1. Convert Model To ONNX

Convert the model of your choice to ONNX. The Python scripts will run OLive `onnx-converter` Docker image in the backend. It first converts the model, then tries to generate input test data if none provided, and finally runs the converted model alongside the original model for correctness check. For more information on the `onnx-converter` Docker image, please see [onnx-converter README.md](https://github.com/microsoft/OLive/blob/master/docker-images/onnx-converter/README.md)

### Prepare model and test data files

Put your input model file(s) and test data(optional) in one folder in the following structure:

    - your_folder
       - test_data_set_0
           - input_0.pb
           - ...
           - output_0.pb
           - ...
       - model_file(s)


Supported model frameworks are - cntk, coreml, keras, scikit-learn, tensorflow and pytorch. 

As an example, we use [MNIST model from ONNX model zoo](https://github.com/onnx/models/tree/master/vision/classification/mnist) for the following tutorial. 

In [57]:
import sys

# Download and store the model to the desire directory. Modify these lines to point to your local model directory 
model_url = "https://onnxzoo.blob.core.windows.net/models/opset_8/mnist/mnist.tar.gz"
model_dir = "test_models"
if not os.path.exists(model_dir):
    os.makedirs(model_dir)

import tarfile
from urllib.request import urlretrieve

file_tmp = urlretrieve(model_url, filename=None)[0]

tar = tarfile.open(file_tmp)
tar.extractall(model_dir)

As for the test data, we strongly recommend you provide your own test input and output files. However if no input files are provided, OLive will randomly generate dummy inputs if possible. 

ONNX .pb files are expected. If you're more familiar with pickle files, dump your input data to a single pickle file in the following dict format - 

    {
        "input_name_0": input_data_0,
        "input_name_1": input_data_1, 
        ...
    }
or if dumping output data - 
    {
        "output_name_0": output_data_0,
        "output_name_1": output_data_1, 
        ...
    }

Then use the `convert_data_to_pb.py` script to convert your pickle file to pb files.

In [None]:
!./python/convert_data_to_pb.py <your_input_pickle_file> --output_folder <output_folder> --is_input=True

### Initiate onnxpipeline

After the model files are stored in a directory, we can pass the directory into onnxpipeline to start the model conversion. The parameters for onnxpipeline are listed below. 

    (1) local_directory: string
        Required. The path of local directory where would be mounted to the docker. All operations will be executed from this path.

    (2) mount_path: string
        Optional. The path where the local_directory will be mounted in the docker. Default is "/mnt/model".

    (3) print_logs: boolean
        Optional. Whether print the logs from the docker. Default is True.

    (4) convert_directory: string
        Optional. The directory path for converting model. Default is test/.    

    (5) convert_name: string
        Optional. The model name for converting model. Default is model.onnx.



In [59]:
sys.path.append('./python_sdk')
import onnxpipeline

# Initiate ONNX pipeline with your model path
pipeline = onnxpipeline.Pipeline(os.path.join(model_dir, 'mnist'))

# Check the configs
pipeline.config()

-----------config----------------
           Container information: <docker.client.DockerClient object at 0x000001E4EB9331D0>
 Local directory path for volume: C:\Users\ziyl.NORTHAMERICA\OLive\notebook/test_models\mnist
Volume directory path in dockers: /mnt/model
                     Result path: result
        Converted directory path: test
        Converted model filename: model.onnx
            Converted model path: test/model.onnx
        Print logs in the docker: True


### Convert Model From Various Frameworks to ONNX Format

OLive is capable of converting models from most major frameworks to ONNX. Different frameworks may require different inputs. Check the commented code below to see what parameters are required for each supported model framework.

In [60]:

model = pipeline.convert_model(model='model.onnx', model_type='onnx')

# test tensorflow savedmodel (no need to specify "model=" as it takes a directory as input)
# model = pipeline.convert_model(model_type='tensorflow')

# test tensorflow savedgraph and checkpoints
# model = pipeline.convert_model(model_type='tensorflow', model='saved_model.pb/meta', 
#                                model_input_names="input_name_1,input_name_2...", 
#                                model_output_names="output_name_1,output_name_2...")

# test pytorch
# model = pipeline.convert_model(model_type='pytorch', model='saved_model.pb', model_input_shapes='(1,3,224,224)')

# test cntk
# model = pipeline.convert_model(model_type='cntk', model='ResNet50_ImageNet_Caffe.model')

# test keras
# model = pipeline.convert_model(model_type='keras', model='keras_Average_ImageNet.keras')

# test sklearn
# model = pipeline.convert_model(model_type='scikit-learn', model='sklearn_svc.joblib', initial_types=("float_input", "FloatTensorType([1,4])"))

# test caffe
# model = pipeline.convert_model(model_type='caffe', model='bvlc_alexnet.caffemodel')




-------------

Model Conversion



Input model is already ONNX model. Skipping conversion.



-------------

MODEL INPUT GENERATION(if needed)



/mnt/model/test/model.onnx inputs: 

input name: Input3, shape: [1, 1, 28, 28], type: tensor(float)

Randomized input .pb file generated at  /mnt/model/test/test_data_set_0



-------------

MODEL CORRECTNESS VERIFICATION





Check the ONNX model for validity 

The ONNX model is valid.



The original model is already onnx. Skipping correctness test. 



-------------

MODEL CONVERSION SUMMARY (.json file generated at /mnt/model/test/output.json )



{'conversion_status': 'SUCCESS',

 'correctness_verified': 'SKIPPED',

 'error_message': '',

 'input_folder': '/mnt/model/test/test_data_set_0',

 'output_onnx_path': '/mnt/model/test/model.onnx'}



# 2. Performance Tuning

Now that you have an ONNX model and its test data, you can run performance tuning using OLive Python SDK `pipeline.perf_tuning` with your desired options. A complete list of available options are listed below. 

    (1) model: string

        Required. The path of the model that wants to be performed.

    (2) result: string

        Optional. The path of the result.

    (3) config: string (choices=["Debug", "MinSizeRel", "Release", "RelWithDebInfo"])

        Optional. Configuration to run. Default is "RelWithDebInfo".

    (4) test_mode: string (choices=["duration", "times"])

        Optional. Specifies the test mode. Value could be 'duration' or 'times'. Default is "times".

    (5) execution_provider: string (choices=["cpu", "cuda", "mkldnn"])

        Optional. help="Specifies the provider 'cpu','cuda','mkldnn'. Default is ''.

    (6) repeated_times: integer

        Optional. Specifies the repeated times if running in 'times' test mode. Default:20.

    (7) duration_times: integer

        Optional. Specifies the seconds to run for 'duration' mode. Default:10.

    (8) parallel: boolean

        Optional. Use parallel executor, default (without -x): sequential executor.

    (9) intra_op_num_threads: integer

        Optional. Sets the number of threads used to parallelize the execution within nodes, A value of 0 means ORT will pick a default. Must >=0.

    (10) inter_op_num_threads: integer

        Optional. Sets the number of threads used to parallelize the execution of the graph (across nodes), A value of 0 means ORT will pick a default. Must >=0.

    (11) top_n: integer

        Optional. Show percentiles for top n runs. Default:5.

    (12) runtime: boolean

        Optional. Use this boolean flag to enable GPU if you have one.

    (13) input_json: string

        Optional. Use JSON file as input parameters.

In [61]:
result = pipeline.perf_tuning(model=model)
#result = pipeline.perf_tuning()   # is ok, too

Setting intra_op_num_threads to 1

Session creation time cost:0.00315954 s

Total inference time cost:0.000848683 s

Total inference requests:20

Average inference time cost:0.0424342 ms

Total inference run time:0.000865692 s

Setting intra_op_num_threads to 1

Session creation time cost:0.00349797 s

Total inference time cost:0.00677852 s

Total inference requests:20

Average inference time cost:0.338926 ms

Total inference run time:0.00681363 s

Setting intra_op_num_threads to 1

Session creation time cost:0.00441003 s

Total inference time cost:0.000592256 s

Total inference requests:20

Average inference time cost:0.0296128 ms

Total inference run time:0.000607534 s

Setting intra_op_num_threads to 1

Session creation time cost:0.00282999 s

Total inference time cost:0.000880639 s

Total inference requests:20

Average inference time cost:0.044032 ms

Total inference run time:0.000896675 s

Setting intra_op_num_threads to 1

Session creation time cost:0.00280605 s

Total inference 

Setting intra_op_num_threads to 1

Session creation time cost:0.00294927 s

Total inference time cost:0.00769274 s

Total inference requests:200

Average inference time cost:0.0384637 ms

Total inference run time:0.00783142 s

Setting intra_op_num_threads to 1

Session creation time cost:0.00296032 s

Total inference time cost:0.00783238 s

Total inference requests:200

Average inference time cost:0.0391619 ms

Total inference run time:0.0079417 s

Setting intra_op_num_threads to 1

Session creation time cost:0.00314448 s

Total inference time cost:0.00724946 s

Total inference requests:200

Average inference time cost:0.0362473 ms

Total inference run time:0.00737046 s

Setting intra_op_num_threads to 0

Session creation time cost:0.00352852 s

Total inference time cost:0.00757413 s

Total inference requests:200

Average inference time cost:0.0378706 ms

Total inference run time:0.00769708 s

Setting intra_op_num_threads to 1

[16:35:24] /home/ziyl/onnxruntime/cmake/external/tvm/src/p

In [62]:
pipeline.print_performance(result)
#pipeline.print_result() # is ok, too

dnnl_OMP_WAIT_POLICY_active 0.036247 ms

dnnl 0.039872 ms

dnnl_2_OMP_threads_OMP_WAIT_POLICY_active 0.047617999999999994 ms

cpu 0.037870999999999995 ms

cpu_openmp 0.038464000000000005 ms

cpu_openmp_2_OMP_threads_OMP_WAIT_POLICY_active 0.042434 ms

cpu_openmp_OMP_WAIT_POLICY_active 0.044032 ms

mklml_OMP_WAIT_POLICY_active 0.039162 ms

mklml_2_OMP_threads_OMP_WAIT_POLICY_passive 0.038269000000000004 ms

mklml_2_OMP_threads_OMP_WAIT_POLICY_active 0.04024 ms

nuphar_OMP_WAIT_POLICY_passive 0.051186 ms

nuphar_2_OMP_threads_OMP_WAIT_POLICY_active 0.053251 ms

nuphar 0.055098 ms

ngraph_2_OMP_threads_OMP_WAIT_POLICY_active error

ngraph_2_OMP_threads_OMP_WAIT_POLICY_passive error

ngraph error

ngraph_OMP_WAIT_POLICY_active error

ngraph_OMP_WAIT_POLICY_passive error



# 3. Result Visualization

After performance test, there would be a directory for results. 

This libray use Pandas.read_json to visualize JSON file. (orient is changeable.)

"latency.json" contains the raw data of results ordered by the average time. 

Use .latency to obtain the original latency JSON; Use .profiling to obtain original top 5 profiling JSON.

In [63]:
r = pipeline.get_result(result)
#r.latency
#r.profiling
print(result)

C:\Users\ziyl.NORTHAMERICA\OLive\notebook/test_models\mnist/result


# Print latency.json

Provide parameters for top 5 performace. Use the parameter "top" to visualize more results.

In [64]:
r.prints()

Unnamed: 0,name,avg,p90,p95,cpu_usage,gpu_usage,memory_util,code_snippet
0,dnnl_OMP_WAIT_POLICY_active,0.036247,0.04734,0.05655,0.92775,0,0.22406,"OrderedDict([('execution_provider', 'dnnl'), (..."
1,cpu,0.037871,0.05427,0.06619,0.9315,0,0.22399,"OrderedDict([('execution_provider', 'cpu'), ('..."
2,cpu_openmp,0.038464,0.04973,0.05948,0.93689,0,0.22284,"OrderedDict([('execution_provider', 'cpu_openm..."
3,mklml_OMP_WAIT_POLICY_active,0.039162,0.0455,0.05308,0.87462,0,0.22381,"OrderedDict([('execution_provider', 'mklml'), ..."
4,nuphar_OMP_WAIT_POLICY_passive,0.051186,0.06522,0.08461,0.96937,0,0.22517,"OrderedDict([('execution_provider', 'nuphar'),..."


# Print profiling.json

Only provide profiling JSON for top 5 performace by giving certain index of the result. The file name is profile_[name].json
    
    (1) index: integer
        Required. The index for top 5 profiling files.
    (2) top: integer
        The number for top Ops.


In [9]:
r.print_profiling(index=4, top=5)

Unnamed: 0,cat,pid,tid,dur,ts,ph,name,args
0,Node,12569,12569,1404,34769,X,,"{'provider': 'CPUExecutionProvider', 'op_name'..."
1,Node,12569,12569,272,33080,X,distill_conv1_1_nchwc_9,"{'provider': 'CPUExecutionProvider', 'op_name'..."
2,Node,12569,12569,230,40891,X,distill_conv2/bn_1_nchwc_14,"{'provider': 'CPUExecutionProvider', 'op_name'..."
3,Node,12569,12569,215,56671,X,distill_conv2/bn_1_nchwc_14,"{'provider': 'CPUExecutionProvider', 'op_name'..."
4,Node,12569,12569,205,49507,X,distill_conv2/bn_1_nchwc_14,"{'provider': 'CPUExecutionProvider', 'op_name'..."


# Get code snippets

In [10]:
print(r.get_code(ep='cpu'))


import onnxruntime as ort
so = rt.SessionOptions()
so.set_graph_optimization_level(99)
so.enable_sequential_execution = False
so.session_thread_pool_size(0)
session = rt.Session("/mnt/model/test/model.onnx", so)



# netron

In [11]:
# only workable for notebook in the local server 
import netron
netron.start("model/test/model.onnx", browse=False) # 'model.onnx'
from IPython.display import IFrame
IFrame('http://localhost:8080', width="100%", height=1000)

Serving 'model/test/model.onnx' at http://localhost:8080


In [12]:
netron.stop()


Stopping http://localhost:8080
