Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Convert Models and Tune Performance with OLive Docker Images

This notebook demos how to use OLive Docker images `onnx-converter` and `perf-tuning` to convert a model from another model framework to ONNX, and then tune performance for the converted ONNX model.

# 0. Prerequisites

1) Make sure you have [Docker](https://www.docker.com/get-started) installed and running. 

2) Familiar with basic Docker terminologies and commands such as volumes.

3) Pull the latest `onnx-converter` and `perf-tuning` docker images from MCR. This should take several minutes.

In [1]:
# pull onnx-converter and perf-tuning docker images from mcr
!docker pull mcr.microsoft.com/onnxruntime/onnx-converter
!docker pull mcr.microsoft.com/onnxruntime/perf-tuning

Using default tag: latest
latest: Pulling from onnxruntime/onnx-converter
Digest: sha256:37479e01a7c4cd2e77c012f0fc3bb4e89e1b45b72c0d4bb22621286c5c02aa26
Status: Image is up to date for mcr.microsoft.com/onnxruntime/onnx-converter:latest
mcr.microsoft.com/onnxruntime/onnx-converter:latest
Using default tag: latest
latest: Pulling from onnxruntime/perf-tuning
Digest: sha256:8003f5ecd2e11c2fdad31610294982404954570c4ced0caed7b9b6f268e9388f
Status: Image is up to date for mcr.microsoft.com/onnxruntime/perf-tuning:latest
mcr.microsoft.com/onnxruntime/perf-tuning:latest


4) Install dependencies to run this notebook.

In [2]:
import sys
!{sys.executable} -m pip install wget netron onnx



You are using pip version 19.0.3, however version 20.0.2 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.





## 1. Convert Model To ONNX

The first step in OLive is to convert a model of your choice to ONNX using `onnx-converter` Docker image. The Docker image first converts the model to ONNX, then tries to generate input test data if none provided, and finally runs the converted model alongside the original model for correctness check. 

### Prepare model and test data files

First you'll need to prepare your model and test data files. Supported model frameworks are - cntk, coreml, keras, scikit-learn, tensorflow and pytorch. 

For `onnx-converter`, test data are used for a quick verification of correctness for the converted model. We strongly recommend you provide your own test input and output files. However if no input files are provided, OLive will randomly generate dummy inputs for you if possible. Only one test data set is needed, but feel free to put in more data sets as they will be available for the next `perf-tuning` step to use. 

You can put your test data in one of - 

  1) Your input folder with your model from another framework.
  
  2) Your output folder created in advance to hold your converted ONNX model.
  
  3) Any other location. Need to specify the path with `--test_data_path` parameter to `onnx-converter`.
  
The best practice to put your input model file(s) and test data(optional) is **2)**. By putting test_data_sets in the "output" folder instead of the "input" folder, this approach avoids copying files in the backend. The folder structure will be as below:

    - your_input_folder
       - model_file(s)
    - your_output_folder_to_hold_onnx_file
       - test_data_set_0
           - input_0.pb
           - ...
           - output_0.pb
           - ...
       - test_data_set_1
           - ...
       ...
       - (your .onnx file after running "onnx-converter")



#### [OPTIONAL] Convert Test Data to ONNX pb 

ONNX .pb files are expected for test data. If you're more familiar with pickle files, we provide a convenient script to convert your pickle data to pb. Dump your input data to a single pickle file in the following dict format - 

    {
        "input_name_0": input_data_0,
        "input_name_1": input_data_1, 
        ...
    }
    
or if dumping output data - 

    {
        "output_name_0": output_data_0,
        "output_name_1": output_data_1, 
        ...
    }

Then use the [convert_test_data.py](https://github.com/microsoft/OLive/blob/master/utils/convert_test_data.py) to convert your pickle file to pb files.


In [3]:
import os
import wget

# Download convert_test_data.py
url = "https://raw.githubusercontent.com/microsoft/OLive/master/utils/convert_test_data.py"
script_dir = "scripts"
if not os.path.exists(script_dir):
    os.makedirs(script_dir)

script_file = os.path.join(script_dir, 'convert_test_data.py')
if os.path.exists(script_file):
    os.remove(script_file)
wget.download(url, script_file)
print("Downloaded", script_file)

  0% [                                                                                ]    0 / 2694100% [................................................................................] 2694 / 2694Downloaded scripts\convert_test_data.py


Run convert_test_data.py to convert your pickle file. This script will read your pickle file and dump the data to a folder named "test_data_set_0" by default. Note that ONNX naming convention for test data folder is "test_data_*". Make sure to pass `--output_folder` with a folder name starting with `test_data_`. 

If `--is_input=True`, data will be generated to `input_*.pb`s. Set `--is_input` to false if you'd like to generate output pbs, in which data will be generated to `output_*.pb`s.

In [None]:
!./scripts/convert_test_data.py <your_input_pickle_file> --output_folder <output_folder (/test_data_set_0)> --is_input=True

Now you're ready to use OLive to convert model and tune performance. In this tutorial, we'll use [MNIST model from ONNX model zoo](https://github.com/onnx/models/tree/master/vision/classification/mnist) as an example to demo the OLive pipeline. Below are some code to download the model.

In [5]:
import sys

# Download and store the model to the desire directory. Modify these lines to point to your local model directory 
model_url = "https://onnxzoo.blob.core.windows.net/models/opset_8/mnist/mnist.tar.gz"
model_dir = "test_models"
absolute_model_dir = os.path.abspath(model_dir)
if not os.path.exists(model_dir):
    print("Creating model directory ", model_dir)
    os.makedirs(model_dir)

import tarfile
from urllib.request import urlretrieve

file_tmp = urlretrieve(model_url, filename=None)[0]

tar = tarfile.open(file_tmp)
tar.extractall(model_dir)
print("Model successfully downloaded and extracted in ", model_dir)

Model successfully downloaded and extracted in  test_models


### Run `onnx-converter` Docker Image

For the docker image to access your model and test data, you'll need to mount a local directory to docker using the docker parameter `-v`. By using `-v <your_local_directory>:/mnt`, you will be able to share all your files under `<your_local_directory>` to the `/mnt` folder in a running Docker container. Note `your_local_directory` has to be an absolute path. Detailed concept and usage are explained [here](https://docs.docker.com/engine/reference/run/#volume-shared-filesystems).

In [6]:
# Get the absolute model dir for the local directory to share
shared_dir = os.path.abspath(model_dir)
# Your input model relative to absolute_model_dir
input_model = "mnist/model.onnx"
# Specify the output folder and converted model name (relative to shared_dir)
output_onnx_path = "output/model.onnx"
# Change model_type to tensorflow, pytorchcntk, coreml, keras or scikit-learn
model_type = "onnx"
print("Folder shared with docker is ", shared_dir)
print("Converted ONNX model will be stored at ", os.path.join(shared_dir, output_onnx_path))

Folder shared with docker is  C:\Users\ziyl.NORTHAMERICA\OLive\notebook\test_models
Converted ONNX model will be stored at  C:\Users\ziyl.NORTHAMERICA\OLive\notebook\test_models\output/model.onnx


For different model framework, different parameters are needed for the `onnx-converter` to run. Detailed information on what parameters are needed for your specific model framework, please check onnx-converter [README.md](https://github.com/microsoft/OLive/blob/master/docker-images/onnx-converter/README.md).

You can also add a `--test_data_path` parameter to specify your own test data folder (the parent folder to your "test_data_\*" folders) if your test data lie in neither the directory of your input model nor your `--output_onnx_path`. 

In [7]:
!docker run -v {shared_dir}:/mnt \
    mcr.microsoft.com/onnxruntime/onnx-converter \
    --model /mnt/{input_model} \
    --output_onnx_path /mnt/{output_onnx_path} \
    --model_type {model_type}


-------------
Model Conversion

Input model is already ONNX model. Skipping conversion.

-------------
MODEL INPUT GENERATION(if needed)

Test data .pb files found under /mnt/mnist/test_data_set_0. 




copying /mnt/mnist/test_data_set_0 to /mnt/output/test_data_set_0
Test data .pb files found under /mnt/mnist/test_data_set_1. 
copying /mnt/mnist/test_data_set_1 to /mnt/output/test_data_set_1
Test data .pb files found under /mnt/mnist/test_data_set_2. 
copying /mnt/mnist/test_data_set_2 to /mnt/output/test_data_set_2
Test data .pb files already exist. Skipping dummy input generation. 

-------------
MODEL CORRECTNESS VERIFICATION


Check the ONNX model for validity 
The ONNX model is valid.

The original model is already onnx. Skipping correctness test. 

-------------
MODEL CONVERSION SUMMARY (.json file generated at /mnt/output/output.json )

{'conversion_status': 'SUCCESS',
 'correctness_verified': 'SKIPPED',
 'error_message': '',
 'input_folder': '/mnt/output/test_data_set_0',
 'output_onnx_path': '/mnt/output/model.onnx'}


### Visualize the converted ONNX model

By using Netron model visualization tool, we can check out the newly converted ONNX model. 

In [12]:
import netron
netron.start(os.path.join(shared_dir, output_onnx_path), browse=False)
from IPython.display import IFrame
IFrame('http://localhost:8080', width="100%", height=1000)

Serving 'C:\Users\ziyl.NORTHAMERICA\OLive\notebook\test_models\output/model.onnx' at http://localhost:8080


In [13]:
# Stop Netron server
netron.stop()


Stopping http://localhost:8080


Now your model has been successfully converted. The model file and its test data are stored in the `output_onnx_path` folder. The files in this folder are ready to serve as inputs for the following `perf-tuning` step. 

## 2. Performance Tuning

The `perf-tuning` docker image tunes the best settings for an ONNX model to run in ONNX Runtime. It sweeps combinations of threads, environment variables, and execution providers for the best performance numbers. Top 3 results will be rendered for each available execution provider.  

### Prepare inputs

You'll need to store your ONNX model file as well as its test data in the same folder in the following structure - 

    -- local_directory_to_your_models
        -- ModelDir
            -- model.onnx
            -- test_data_set_0
                -- input_0.pb
                -- output_0.pb
                -- ...
            -- test_data_set_1            
                -- ...
            -- ...
            
Test data is required in this step. If you follow the conversion step in the notebook to this point, the output folder for the `onnx-converter` is already in this folder structure and can be directly used as input for `perf-tuning`.

### Run `perf-tuning` Docker Image


A few things to note when running the `perf-tuning` Docker image: 

 - Currently supported execution providers(EPs) are cpu, cpu_openmp, dnnl, mklml, cuda, tensorrt, ngraph, and nuphar. Add `-e cpu,dnnl,...`(no spaces between the EPs) to select the execution providers you'd like to tune. By default, all available EPs will be tuned.
 - EPs such as CUDA and TensorRT require GPU. To use those EPs, make sure you have GPU in your local machine, and add `--gpus all` to Docker (BEFORE `mcr.microsoft.com/onnxruntime/perf-tuning`) to leverage your GPUs. Otherwise GPU based execution providers will be skipped. 
 - Just like `onnx-converter`, `perf-tuning` also needs users to share their local directories to the Docker container using the `-v` command. 
 - Other available commands for `perf-tuning` are documented [here](https://github.com/microsoft/OLive/tree/master/docker-images/perf-tuning). 


In [10]:
# Define some variables 
onnx_model = output_onnx_path        # reuse output of the "convert" step here. Adjust as neccessary
result_dir = "result"             # output folder to hold your results

In [11]:
!docker run -v {shared_dir}:/mnt \
    mcr.microsoft.com/onnxruntime/perf-tuning \
    --model /mnt/{onnx_model} \
    --result /mnt/{result_dir}

Setting intra_op_num_threads to 1
Session creation time cost:0.0081595 s
Total inference time cost:0.0034581 s
Total inference requests:20
Average inference time cost:0.172905 ms
Total inference run time:0.003502 s
Setting intra_op_num_threads to 1
Session creation time cost:0.0045487 s
Total inference time cost:0.007971 s
Total inference requests:20
Average inference time cost:0.39855 ms
Total inference run time:0.0080087 s
Setting intra_op_num_threads to 1
Session creation time cost:0.0050911 s
Total inference time cost:0.0017617 s
Total inference requests:20
Average inference time cost:0.088085 ms
Total inference run time:0.0017969 s
Setting intra_op_num_threads to 1
Session creation time cost:0.0046958 s
Total inference time cost:0.0013464 s
Total inference requests:20
Average inference time cost:0.06732 ms
Total inference run time:0.001369 s
Setting intra_op_num_threads to 1
Session creation time cost:0.0041199 s
Total inference time cost:0.0091266 s
Total inference requests:20
Av

/home/ziyl/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:107 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /home/ziyl/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:101 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=32722 ; hostname=f923670d0d86 ; expr=cudaSetDevice(device_id_); 


/home/ziyl/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:107 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /home/ziyl/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc:101 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 35: CUDA driver version is insuf

mklml_OMP_WAIT_POLICY_passive 0.059179999999999996

OMP_WAIT_POLICY=active
OMP_NUM_THREADS=2
/perf_tuning/bin/RelWithDebInfo/all_eps/onnxruntime_perf_test -e dnnl -x 1 -o 99 -m times -r 20 /mnt/output/model.onnx /mnt/result/f0ace92b-7170-4bc4-ba7b-ead0dc018af2
dnnl_2_OMP_threads_OMP_WAIT_POLICY_active 0.08124

OMP_WAIT_POLICY=passive
OMP_NUM_THREADS=2
/perf_tuning/bin/RelWithDebInfo/all_eps/onnxruntime_perf_test -e dnnl -x 1 -o 99 -m times -r 20 /mnt/output/model.onnx /mnt/result/48a4edcf-6eef-45f0-b22a-88b4dd176f12
dnnl_2_OMP_threads_OMP_WAIT_POLICY_passive 0.48917000000000005

/perf_tuning/bin/RelWithDebInfo/all_eps/onnxruntime_perf_test -e dnnl -x 1 -o 99 -m times -r 20 /mnt/output/model.onnx /mnt/result/286f89f6-8760-47ee-823d-fdcd0c332e4f
dnnl 0.0628

OMP_WAIT_POLICY=active
/perf_tuning/bin/RelWithDebInfo/all_eps/onnxruntime_perf_test -e dnnl -x 1 -o 99 -m times -r 20 /mnt/output/model.onnx /mnt/result/36651ee5-fea9-4212-89ee-2944709e636d
dnnl_OMP_WAIT_POLICY_active 0.056555

OMP_

### Check your results

Besides prints from the docker run, a couple of files are stored to the result folder you specified:

 - `latencies.txt` has a brief summary of the best results and settings for each execution providers. 
 - `latencies.json` has more detailed information such as python code snippets of how to reproduce the performance results, p90 and p95 performance numbers, on the those good combinations. 
 - `profile_[ep].json` files are profiling files for the best setting from each execution providers. 