Skip to content

Inference speed ONNX vs. PB #849

@solarflarefx

Description

@solarflarefx

Describe the bug
I am experimenting with a few models and platforms, and it seems that I am seeing variation on inference speed depending on the model format. In both cases I am performing inference on the GPU, and inserting a python timer before and after the sess.run inferencing method.

In one case, I am seeing that the ONNX model inferences 60-70 ms slower than the .pb model.

System information

  • OS Platform: Windows 10
  • ONNX Runtime installed from (source or binary):
  • ONNX Runtime version: 1.2.0, gpu version
  • Python version: 3.7
  • Visual Studio version (if applicable): 2019
  • CUDA/cuDNN version: 10.1
  • Tool to convert pb to onnx: tf2onnx 1.5.5
  • Version of onnx: 1.6.0
  • Version of TensorFlow: 1.14.0, gpu version

Additional context
I used the following command to create an ONNX model from a tensorflow frozen graph model (.pb).

python -m tf2onnx.convert --graphdef frozengraph.pb --output model.onnx --opset 11 --inputs input_layer:0 --outputs output_layer:0

Additional Questions

  • Are there additional optimization commands I would need to specify?
  • If I use the tool Netron to view the model graphs, I do notice that the onnx model seems to add a lot of transposes throughout. I feel that this could be the source of the speed slowdown.
  • Perhaps associated with the transposes, I am noticing that the dimensions listed for layers in the onnx model are reversed compared to the dimensions listed in the pb model. Is this something inherently different between the two formats?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions