Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# ONNX Pipeline

This repository shows how to deploy and use ONNX pipeline with dockers including convert model, generate input and performance test.

# Prerequisites

Pull dockers from Azure. It should take several minutes.

In [1]:
# !sh build.sh # For Linux
!build.sh # For Windows

Install the onnxpipeline SDK

In [2]:
import onnxpipeline

# Initiate ONNX pipeline with local directory "model"
pipeline = onnxpipeline.Pipeline('model')

# onnx
#pipeline = onnxpipeline.Pipeline('onnx')

# tensorflow
#pipeline = onnxpipeline.Pipeline('mnist/model')

# pytorch
#pipeline = onnxpipeline.Pipeline('pytorch')

# cntk 
#pipeline = onnxpipeline.Pipeline('cntk')

# keras
#pipeline = onnxpipeline.Pipeline('KerasToONNX')

# sklearn
#pipeline = onnxpipeline.Pipeline('sklearn')

# caffe
#pipeline = onnxpipeline.Pipeline('caffe')

# current directory
#pipeline = onnxpipeline.Pipeline()

# test mxnet fail
#pipeline = onnxpipeline.Pipeline('mxnet')


## Run parameters

(1) local_directory: string

    Required. The path of local directory where would be mounted to the docker. All operations will be executed from this path.

(2) mount_path: string

    Optional. The path where the local_directory will be mounted in the docker. Default is "/mnt/model".

(3) print_logs: boolean

    Optional. Whether print the logs from the docker. Default is True.
    
(4) convert_directory: string

    Optional. The directory path for converting model. Default is test/.    

(5) convert_name: string

    Optional. The model name for converting model. Default is model.onnx.
    


# Config information for ONNX pipeline

In [3]:
pipeline.config()

-----------config----------------
           Container information: <docker.client.DockerClient object at 0x000001B3542EB748>
 Local directory path for volume: E:\onnx-pipeline\notebook/model
Volume directory path in dockers: /mnt/model
                     Result path: result
        Converted directory path: test
        Converted model filename: model.onnx
            Converted model path: test/model.onnx
        Print logs in the docker: True


# Convert model to ONNX

This image is used to convert model from major model frameworks to onnx. Supported frameworks are - caffe, cntk, coreml, keras, libsvm, mxnet, scikit-learn, tensorflow and pytorch.


You can run the docker image with customized parameters.

In [4]:

model = pipeline.convert_model(model='mnist.onnx', model_type='onnx')

# test tensorflow
#model = pipeline.convert_model(model_type='tensorflow')

# test pytorch
#model = pipeline.convert_model(model_type='pytorch', model='saved_model.pb', model_input_shapes='(1,3,224,224)')

# test cntk
#model = pipeline.convert_model(model_type='cntk', model='ResNet50_ImageNet_Caffe.model')

# test keras
#model = pipeline.convert_model(model_type='keras', model='keras_Average_ImageNet.keras')

# test sklearn
#model = pipeline.convert_model(model_type='scikit-learn', model='sklearn_svc.joblib', initial_types=("float_input", "FloatTensorType([1,4])"))

# test caffe
#model = pipeline.convert_model(model_type='caffe', model='bvlc_alexnet.caffemodel')

# test mxnet
#model = pipeline.convert_model(model_type='mxnet', model='resnet.json', model_params='resnet.params', model_input_shapes='(1,3,224,224)')



-------------

Model Conversion



Input model is already ONNX model. Skipping conversion.



-------------

MODEL INPUT GENERATION(if needed)



Input.pb already exists. Skipping dummy input generation. 



-------------

MODEL CORRECTNESS VERIFICATION





Check the ONNX model for validity 

The ONNX model is valid.



The original model is already onnx. Skipping correctness test. 



-------------

MODEL CONVERSION SUMMARY (.json file generated at /mnt/model/test/output.json )



{'conversion_status': 'SUCCESS',

 'correctness_verified': 'SKIPPED',

 'error_message': '',

 'input_folder': '/mnt/model/test/test_data_set_0',

 'output_onnx_path': '/mnt/model/test/model.onnx'}



## Run parameters

(1) model: string

    Required. The path of the model that needs to be converted.
    
    **IMPORTANT Only support the model path which is under the mounting directory (while initialization by Pipeline()).

(2) output_onnx_path: string

    Required. The path to store the converted onnx model. Should end with ".onnx". e.g. output.onnx

(3) model_type: string

    Required. The name of original model framework. Available types are caffe, cntk, coreml, keras, libsvm, mxnet, scikit-learn, tensorflow and pytorch.

(4) model_inputs_names: string

    (tensorflow) Optional. The model's input names. Required for tensorflow frozen models and checkpoints.

(5) model_outputs_names: string

    (tensorflow) Optional. The model's output names. Required for tensorflow frozen models checkpoints.

(6) model_params: string 

    (mxnet) Optional. The params of the model if needed.

(7) model_input_shapes: list of tuple 

    (pytorch, mxnet) Optional. List of tuples. The input shape(s) of the model. Each dimension separated by ','.

(8) target_opset: String

    Optional. Specifies the opset for ONNX, for example, 7 for ONNX 1.2, and 8 for ONNX 1.3. Defaults to '7'.
    
(9) caffe_model_prototxt: string

    (caffe) Optional. The filename of deploy prototxt for the caffe madel. 

(10) initial_types: tuple (string, string)

    (scikit-learn) Optional. A tuple consist two strings. The first is data type and the second is the size of tensor type e.g., ('float_input', 'FloatTensorType([1,4])')

(11) input_json: string

    Optional. Use JSON file as input parameters.
    
    **IMPORTANT Only support the path which is under the mounting directory (while initialization by Pipeline()).
    
(12) model_inputs_names: string

    (tensorflow) Optional.
    
(13) model_outputs_names: string

    (tensorflow) Optional.    


# Performance test tool

You can run perf_tuning using command python perf_tuning.py [Your model path] [Output path on the docker]. You can use the same arguments as for onnxruntime_pert_test tool, e.g. -m for mode, -e to specify execution provider etc. By default it will try all providers available.

In [5]:
result = pipeline.perf_tuning(model=model)
#result = pipeline.perf_tuning()   # is ok, too

Setting thread pool size to 1

Total time cost:0.0523975

Total iterations:20

Average time cost:2.61987 ms

Setting thread pool size to 1

Total time cost:0.0191098

Total iterations:20

Average time cost:0.95549 ms

Setting thread pool size to 0

Total time cost:0.0180704

Total iterations:20

Average time cost:0.90352 ms

Setting thread pool size to 0

Total time cost:0.0534911

Total iterations:20

Average time cost:2.67455 ms

Setting thread pool size to 0

Total time cost:0.0211078

Total iterations:20

Average time cost:1.05539 ms

Setting thread pool size to 0

Total time cost:0.0505198

Total iterations:20

Average time cost:2.52599 ms

Setting thread pool size to 0

Total time cost:0.0175324

Total iterations:20

Average time cost:0.87662 ms

Setting thread pool size to 0

Total time cost:0.0521924

Total iterations:20

Average time cost:2.60962 ms

Setting thread pool size to 0

Total time cost:0.0198271

Total iterations:20

Average time cost:0.991355 ms

Setting thread poo

cpu_openmp_1_threads_OMP_WAIT_POLICY_passive 1.0827479999999998Setting thread pool size to 1

Total time cost:0.141779

Total iterations:200

Average time cost:0.708894 ms

Setting thread pool size to 0

Total time cost:0.146149

Total iterations:200

Average time cost:0.730747 ms

Setting thread pool size to 1

Total time cost:0.192491

Total iterations:200

Average time cost:0.962453 ms





/perf_tuning/bin/RelWithDebInfo/all_eps/onnxruntime_perf_tuning -e mkldnn -x 1 -o 3 -m times -r 200 /mnt/model/test/model.onnx /mnt/model/result/0dc072e1-8cd3-4e2d-9c52-c6ec3bc122a6

mkldnn_1_threads 0.7088939999999999



OMP_WAIT_POLICY=active

/perf_tuning/bin/RelWithDebInfo/all_eps/onnxruntime_perf_tuning -e mkldnn -o 3 -m times -r 200 /mnt/model/test/model.onnx /mnt/model/result/d231523e-0633-43d1-84bf-e6d9b3c684d9

mkldnn_openmp_OMP_WAIT_POLICY_active 0.730747



/perf_tuning/bin/RelWithDebInfo/all_eps/onnxruntime_perf_tuning -x 1 -o 3 -m times -r 200 /mnt/model/test/model.onnx /mnt/model/re

# Run parameters

(1) model: string

    Required. The path of the model that wants to be performed.
    
(2) result: string

    Optional. The path of the result.
    
(3) config: string (choices=["Debug", "MinSizeRel", "Release", "RelWithDebInfo"])

    Optional. Configuration to run. Default is "RelWithDebInfo".
    
(4) mode: string (choices=["duration", "times"])

    Optional. Specifies the test mode. Value could be 'duration' or 'times'. Default is "times".

(5) execution_provider: string (choices=["cpu", "cuda", "mkldnn"])

    Optional. help="Specifies the provider 'cpu','cuda','mkldnn'. Default is ''.
    
(6) repeated_times: integer

    Optional. Specifies the repeated times if running in 'times' test mode. Default:20.
    
(7) duration_times: integer

    Optional. Specifies the seconds to run for 'duration' mode. Default:10.
    
(8) parallel: boolean

    Optional. Use parallel executor, default (without -x): sequential executor.
    
(9) threadpool_size: integer

    Optional. Threadpool size if parallel executor (--parallel) is enabled. Default is the number of cores.
    
(10) num_threads: integer

    Optional. OMP_NUM_THREADS value.
    
(11) top_n: integer

    Optional. Show percentiles for top n runs. Default:5.
    
(12) runtime: boolean

    Optional. Use this boolean flag to enable GPU if you have one.
    
(13) input_json: string

    Optional. Use JSON file as input parameters.
   

In [6]:
pipeline.print_performance(result)
#pipeline.print_result() # is ok, too

mkldnn_1_threads 0.7088939999999999 ms

mkldnn 0.67115 ms

mkldnn_parallel_1_threads 3.48409 ms

mkldnn_openmp_OMP_WAIT_POLICY_active 0.730747 ms

mkldnn_openmp 0.645965 ms

mkldnn_openmp_1_threads 0.7412500000000001 ms

mklml 0.9007419999999999 ms

mklml_1_threads 0.9554900000000001 ms

mklml_parallel_1_threads 2.619875 ms

cpu_1_threads 0.962453 ms

cpu 0.8616550000000001 ms

cpu_parallel_1_threads 3.01262 ms

cpu_openmp_1_threads_OMP_WAIT_POLICY_passive 1.0827479999999998 ms

cpu_openmp_OMP_WAIT_POLICY_active 0.95086 ms

cpu_openmp_1_threads 0.991355 ms

ngraph_parallel_1_threads error

ngraph_1_threads error

ngraph error



# Result Visualization

After performance test, there would be a directory for results. 

This libray use Pandas.read_json to visualize JSON file. (orient is changeable.)

"latency.json" contains the raw data of results ordered by the average time. 

Use .latency to obtain the original latency JSON; Use .profiling to obtain original top 5 profiling JSON.

In [7]:
r = pipeline.get_result(result)
#r.latency
#r.profiling
print(result)

E:\onnx-pipeline\notebook/model/result


# Print latency.json

Provide parameters for top 5 performace. Use the parameter "top" to visualize more results.

In [8]:
r.prints()

Unnamed: 0,name,avg,p90,p95,cpu_usage,gpu_usage,memory_util,code_snippet
0,mkldnn_1_threads,0.708894,0.9633,1.0281,0.94483,0,0.29085,"OrderedDict([('execution_provider', 'mkldnn'),..."
1,mkldnn_openmp_OMP_WAIT_POLICY_active,0.730747,0.9531,1.0109,0.95975,0,0.29121,"OrderedDict([('execution_provider', 'mkldnn_op..."
2,mklml,0.900742,1.06,1.1543,0.91482,0,0.28926,"OrderedDict([('execution_provider', 'mklml'), ..."
3,cpu_1_threads,0.962453,1.13,1.3482,0.89764,0,0.29135,"OrderedDict([('execution_provider', 'cpu'), ('..."
4,cpu_openmp_1_threads_OMP_WAIT_POLICY_passive,1.082748,1.5606,1.6307,0.94206,0,0.29095,"OrderedDict([('execution_provider', 'cpu_openm..."


# Print profiling.json

Only provide profiling JSON for top 5 performace by giving certain index of the result. The file name is profile_[name].json
(1) index: integer
    Required. The index for top 5 profiling files.
(2) top: integer
    The number for top Ops.


In [9]:
r.print_profiling(index=4, top=5)

Unnamed: 0,cat,pid,tid,dur,ts,ph,name,args
0,Node,12367,12367,1815,63814,X,sequential/dense/MatMul,"{'provider': 'CPUExecutionProvider', 'op_name'..."
1,Node,12367,12367,652,73670,X,sequential/dense/MatMul,"{'provider': 'CPUExecutionProvider', 'op_name'..."
2,Node,12367,12367,614,74898,X,sequential/dense/MatMul,"{'provider': 'CPUExecutionProvider', 'op_name'..."
3,Node,12367,12367,608,80834,X,sequential/dense/MatMul,"{'provider': 'CPUExecutionProvider', 'op_name'..."
4,Node,12367,12367,604,82160,X,sequential/dense/MatMul,"{'provider': 'CPUExecutionProvider', 'op_name'..."


# Get code snippets

In [10]:
print(r.get_code(ep='cpu'))


import onnxruntime as ort
so = rt.SessionOptions()
so.set_graph_optimization_level(3)
so.enable_sequential_execution = False
so.session_thread_pool_size(0)
session = rt.Session("/mnt/model/test/model.onnx", so)



# netron

In [11]:
# only workable for notebook in the local server 
import netron
netron.start("model/test/model.onnx", browse=False) # 'model.onnx'
from IPython.display import IFrame
IFrame('http://localhost:8080', width="100%", height=1000)

Serving 'model/test/model.onnx' at http://localhost:8080


In [20]:
netron.stop()


Stopping http://localhost:8080
