Inference speed ONNX vs. PB

**Describe the bug**
I am experimenting with a few models and platforms, and it seems that I am seeing variation on inference speed depending on the model format.  In both cases I am performing inference on the GPU, and inserting a python timer before and after the sess.run inferencing method.  

In one case, I am seeing that the ONNX model inferences 60-70 ms slower than the .pb model.

**System information**
- OS Platform: Windows 10
- ONNX Runtime installed from (source or binary):
- ONNX Runtime version: 1.2.0, gpu version
- Python version: 3.7
- Visual Studio version (if applicable): 2019
- CUDA/cuDNN version: 10.1
- Tool to convert pb to onnx: tf2onnx 1.5.5
- Version of onnx: 1.6.0
- Version of TensorFlow: 1.14.0, gpu version

**Additional context**
I used the following command to create an ONNX model from a tensorflow frozen graph model (.pb).

python -m tf2onnx.convert --graphdef frozengraph.pb --output model.onnx --opset 11 --inputs input_layer:0 --outputs output_layer:0

**Additional Questions**
- Are there additional optimization commands I would need to specify?
- If I use the tool Netron to view the model graphs, I do notice that the onnx model seems to add a lot of transposes throughout.  I feel that this could be the source of the speed slowdown.
- Perhaps associated with the transposes, I am noticing that the dimensions listed for layers in the onnx model are reversed compared to the dimensions listed in the pb model.  Is this something inherently different between the two formats?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inference speed ONNX vs. PB #849

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inference speed ONNX vs. PB #849

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions