-
Notifications
You must be signed in to change notification settings - Fork 462
Closed
Description
Describe the bug
I am experimenting with a few models and platforms, and it seems that I am seeing variation on inference speed depending on the model format. In both cases I am performing inference on the GPU, and inserting a python timer before and after the sess.run inferencing method.
In one case, I am seeing that the ONNX model inferences 60-70 ms slower than the .pb model.
System information
- OS Platform: Windows 10
- ONNX Runtime installed from (source or binary):
- ONNX Runtime version: 1.2.0, gpu version
- Python version: 3.7
- Visual Studio version (if applicable): 2019
- CUDA/cuDNN version: 10.1
- Tool to convert pb to onnx: tf2onnx 1.5.5
- Version of onnx: 1.6.0
- Version of TensorFlow: 1.14.0, gpu version
Additional context
I used the following command to create an ONNX model from a tensorflow frozen graph model (.pb).
python -m tf2onnx.convert --graphdef frozengraph.pb --output model.onnx --opset 11 --inputs input_layer:0 --outputs output_layer:0
Additional Questions
- Are there additional optimization commands I would need to specify?
- If I use the tool Netron to view the model graphs, I do notice that the onnx model seems to add a lot of transposes throughout. I feel that this could be the source of the speed slowdown.
- Perhaps associated with the transposes, I am noticing that the dimensions listed for layers in the onnx model are reversed compared to the dimensions listed in the pb model. Is this something inherently different between the two formats?
Metadata
Metadata
Assignees
Labels
No labels