OLive Command Line Tool

This repository shows how to deploy and use OLive by running commands.

Prerequisites

Download OLive package here and install with command pip install onnxruntime_olive-0.4.0-py3-none-any.whl

ONNX Runtime package can be installed with

pip install --extra-index-url https://olivewheels.azureedge.net/test onnxruntime_openvino_dnnl==1.11.0 for cpu

or

pip install --extra-index-url https://olivewheels.azureedge.net/test onnxruntime_gpu_tensorrt==1.11.0 for gpu

How to use

User can call olive convert or olive optimize with related arguments. OLive will manage python package installation, such as ONNX Runtime or TensorFlow or PyTorch.

User can also run olive convert or olive optimize with use_docker or use_conda options, if docker or conda is installed. In this way, OLive will run service in container or in a new conda environment.

For olive optimize, user can set --use_gpu to run optimization service with ONNX Runtime gpu package.

For Model Framework Conversion

Converts models from PyTorch and TensorFlow model frameworks to ONNX, and tests the converted models' correctness.

Here are arguments for OLive Conversion:

Argument	Detail	Example
model_path	(required) model path for conversion	test.pb
model_framework	(required) model original framework	tensorflow
model_root_path	(optional) model path for conversion, only for PyTorch model	D:\model\src
input_names	(optional) comma-separated list of names of input nodes of model	title_lengths:0,title_encoder:0,ratings:0
output_names	(optional) comma-separated list of names of output nodes of model	output_identity:0,loss_identity:0
input_shapes	(optional) list of shapes of each input node. The order of the input shapes should be the same as the order of input names	[[1,7],[1,7],[1,7]]
output_shapes	(optional) list of shapes of each output node. The order of the output shapes should be the same as the order of output names	[[1,64],[1,64]]
input_types	(optional) comma-separated list of types of input nodes. The order of the input types should be the same as the order of input names	float32,float32,float32
output_types	(optional) comma-separated list of types of output nodes. The order of the output types should be the same as the order of output names	float32,float32
onnx_opset	(optional) target opset version for conversion	11
onnx_model_path	(optional) ONNX model path as conversion output	test.onnx
sample_input_data_path	(optional) path to sample input data	sample_input_data.npz
framework_version	(optional) original framework version	1.14

There are two ways to call OLive convert with cmd.

Test model and sample input data can be downloaded here: model, sample test data .

With all inline arguments. For example:

olive convert 
--model_path full_doran_frozen.pb 
--model_framework tensorflow 
--framework_version 1.13 
--input_names title_lengths:0,title_encoder:0,ratings:0,query_lengths:0,passage_lengths:0,features:0,encoder:0,decoder:0,Placeholder:0 
--output_names output_identity:0,loss_identity:0 
--sample_input_data_path doran.npz

With conversion config file with all needed arguments:

{
  "model_path": "full_doran_frozen.pb",
  "input_names": ["title_lengths:0", "title_encoder:0", "ratings:0", "query_lengths:0", "passage_lengths:0", "features:0", "encoder:0", "decoder:0", "Placeholder:0"],
  "output_names": ["output_identity:0", "loss_identity:0"],
  "sample_input_data_path": "doran.npz"
}

olive convert --conversion_config cvt_config.json --model_framework tensorflow --framework_version 1.13 where cvt_config.json file includes related configurations.

Then the ONNX model will be saved in onnx_model_path, by default res.onnx

For ONNX Model Inference Optimization

Tunes different execution providers, inference session options, and environment variable options for the ONNX model with ONNX Runtime. Selects and outputs the option combinations with the best performance for lantency or throughput.

Here are arguments for OLive optimization:

Argument	Detail	Example
model_path	(required) model path for optimization	PytorchBertSquad.onnx
throughput_tuning_enabled	(optional)whether tune model for optimal throughput
max_latency_percentile	(required for throughput tuning) throughput max latency pct tile	0.95
max_latency_ms	(required for throughput tuning) max latency in pct tile in millisecond	50
dynamic_batching_size	(specified for throughput tuning) max batchsize for dynamic batching	1
threads_num	(specified for throughput tuning) threads num for throughput optimization	4
min_duration_sec	(specified for throughput tuning) minimum duration for each run in second	10
openmp_enabled	(optional) whether the onnxruntime package is built with OpenMP
result_path	(optional) result directory for OLive optimization	olive_opt_result
input_names	(optional) comma-separated list of names of input nodes of model. Required for ONNX model with dynamic inputs' size	input_ids,input_mask,segment_ids
output_names	(optional) comma-separated list of names of output nodes of model	scores
input_shapes	(optional) list of shapes of each input node. The order of the input shapes should be the same as the order of input names. Required for ONNX model with dynamic inputs' size	[[1,7],[1,7],[1,7]]
providers_list	(optional) providers used for perftuning	cpu,dnnl
trt_fp16_enabled	(optional) whether enable fp16 mode for TensorRT
quantization_enabled	(optional) whether enable quantization optimization or not
transformer_enabled	(optional) whether enable transformer optimization or not
transformer_args	(optional) onnxruntime transformer optimizer args	"--model_type bert"
sample_input_data_path	(optional) path to sample input data	sample_input_data.npz
concurrency_num	(optional) tuning process concurrency number	2
kmp_affinity	(optional) bind OpenMP* threads to physical processing units	respect,none
omp_max_active_levels	(optional) maximum number of nested active parallel regions	1
inter_thread_num_list	(optional) list of inter thread number for perftuning	1,2,4
intra_thread_num_list	(optional) list of intra thread number for perftuning	1,2,4
execution_mode_list	(optional) list of execution mode for perftuning	parallel,sequential
ort_opt_level_list	(optional) onnxruntime optimization level	all
omp_wait_policy_list	(optional) list of OpenMP wait policy for perftuning	active
warmup_num	(optional) warmup times for latency measurement	20
test_num	(optional) repeat test times for latency measurement	200

There are two ways to call OLive optimize with cmd.

Test model can be downloaded here.

To optimize ONNX model latency:

With all inline arguments. For example:

olive optimize 
--model_path TFBertForQuestionAnswering.onnx 
--input_names attention_mask,input_ids,token_type_ids 
--input_shapes [[1,7],[1,7],[1,7]] 
--quantization_enabled

With optimization config file with all needed arguments:

{
    "model_path": "TFBertForQuestionAnswering.onnx",
    "input_names": ["attention_mask", "input_ids", "token_type_ids"],
    "input_shapes": [[1,7],[1,7],[1,7]],
    "quantization_enabled": true,
    "providers_list": ["cpu", "dnnl"],
    "omp_max_active_levels": ["1"],
    "kmp_affinity": ["respect,none"],
    "concurrency_num": 2
}

olive optimize --optimization_config opt_config.json where opt_config.json file includes related configurations.

To optimize ONNX model throughput:

With all inline arguments. For example:

olive optimize 
--model_path TFBertForQuestionAnswering.onnx 
--input_names attention_mask,input_ids,token_type_ids 
--input_shapes [[-1,7],[-1,7],[-1,7]] 
--throughput_tuning_enabled 
--max_latency_percentile 0.95 
--max_latency_ms 100 
--threads_num 1 
--dynamic_batching_size 4 
--min_duration_sec 10

With optimization config file with all needed arguments:

{
    "model_path": "TFBertForQuestionAnswering.onnx",
    "input_names": ["attention_mask", "input_ids", "token_type_ids"],
    "input_shapes": [[-1,7],[-1,7],[-1,7]],
    "throughput_tuning_enabled": true,
    "max_latency_percentile": 0.95,
    "max_latency_ms": 100,
    "threads_num": 1,
    "dynamic_batching_size": 4,
    "min_duration_sec": 10
}

olive optimize --optimization_config opt_config.json where opt_config.json file includes related configurations.

Then the result JSON file and optimized model will be stored in result_path, by default olive_opt_result

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

OLive Command Line Tool

Prerequisites

How to use

For Model Framework Conversion

For ONNX Model Inference Optimization

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

OLive Command Line Tool

Prerequisites

How to use

For Model Framework Conversion

For ONNX Model Inference Optimization