Skip to content
This repository has been archived by the owner on Oct 13, 2021. It is now read-only.

Fine tuned Keras VGGNet16 shows no performance advantages. #638

Open
INF800 opened this issue Oct 20, 2020 · 5 comments
Open

Fine tuned Keras VGGNet16 shows no performance advantages. #638

INF800 opened this issue Oct 20, 2020 · 5 comments

Comments

@INF800
Copy link

INF800 commented Oct 20, 2020

image

This is the comparision of raw VGG16 keras model inference time and the same model on onnx runtime. Why don't I see any performance advantages?

There is extremely small improvement

Replicate results by running this notebook on colab CPU

@Narasimha1997
Copy link

Whenever measuring the performance of AI models please note this :

  1. More CPU cores will not make the model faster, unless the framework supports concurrent execution of layers. On CPU only machine, you can improve the inference speed by loading in more models across multiple cores. ( A model can at max take 100% of single core, even if you have 15 remaining cores, it'll not be used).
  2. GPUs on the other hand can make model inference faster because they have capabilities to parallelize layer operations and matrix multiplications on CUDA cores. If you have more CUDA cores, more faster the inference will be. This is irrespective of any DL framework as they all use cuDNN bindings. GPUs can execute batches of inputs at once because of the nature of GPU hardware design.

So thumb rule => CPU : Concurrency :: GPU : Batching

All these optimizations will obviously will not make a model faster on CPU because the utilisation will never exploit multiple cores.

@INF800
Copy link
Author

INF800 commented Oct 21, 2020

Hey @Narasimha1997, I do not understand why onnx does not make models faster. Huggingface uses onnx to run large pretained networks on CPU. So, can't I replicate the same using keras-onnx? Or do I have to use onnx models converted from pytorch models?

@jiafatom
Copy link
Collaborator

When you use onnxruntime to evaluate performance (say run 100 times), please skip the first few runs (for example, 10 times) of evaluations. Especially for the first run, onnxruntime need do some extra work, so it costs much more time than usual.

@INF800
Copy link
Author

INF800 commented Oct 23, 2020

Hey @jiafatom, The results were smashing for lenet-type architecture (upto 177 times fast) using your method. But VGGNet shows NO improvement. Updated the notebook.

@jiafatom
Copy link
Collaborator

For this perf issue, I feel that the converter already does its job well, and this is onnxruntime issue. You may need reach onnxruntime repo and post the question there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants