Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Googlenet running very slow #271

Closed
vibhuagrawal14 opened this issue Oct 3, 2018 · 19 comments
Closed

Googlenet running very slow #271

vibhuagrawal14 opened this issue Oct 3, 2018 · 19 comments

Comments

@vibhuagrawal14
Copy link

vibhuagrawal14 commented Oct 3, 2018

An implementation of GoogLeNet which takes about 0.05 seconds in MATLAB to classify an image, is taking about 0.95 seconds in onnx-tf. As mentioned in issue #254 , I have tried strict=False, but there was no change in performance.

To Reproduce

Attaching code here:

import numpy as np
import scipy.io as sio
import scipy
import tensorflow as tf
import cv2
import onnx
from onnx_tf.backend import prepare
model = onnx.load('D:\Vibhu\googlenet9.onnx')
tf_rep = prepare(model,strict=False)

mat_contents = sio.loadmat('WDS7PSPCF0S11.IM0_131.mat')
mat_contents=mat_contents['img']
img=mat_contents
img=cv2.resize(img,(224,224))
img = np.moveaxis(img, -1,0)
tf_rep.run(img[np.newaxis,:,:,:])

.mat file here: https://drive.google.com/open?id=1P5LpxosVRwbz4bGe4oBZflPYDzB9RD0z

onnx model file here: https://drive.google.com/open?id=1XTFLNQgg7gPeWYBxUa_dmTHrjvvlpdLF

get_version.py from util folder gives an error saying

ModuleNotFoundError: No module named 'onnx'

  • Python version: 3.6.5

Using pip freeze for versions:

  • ONNX version: 1.3.0
  • ONNX-TF version: 1.2.0
  • Tensorflow version: 1.11.0
  • Tensorflow-gpu version: 1.11.0

Am I missing anything?

@fumihwh
Copy link
Collaborator

fumihwh commented Oct 3, 2018

Does your time (0.95s) include the init part tf_rep = prepare(model,strict=False)?

@vibhuagrawal14
Copy link
Author

No, the time of 0.95 seconds is only for the statement tf_rep.run(img[np.newaxis,:,:,:]).

@tjingrant
Copy link
Collaborator

It's mostly tensorflow trying to setup GPU context. Below is the GPU trace from nvvp.

If we keep the same tensorflow session and do 100 consecutive run on your googlenet, the time grows to 2.23s for 100 calls, averaging 22.3 ms each.

image

@vibhuagrawal14
Copy link
Author

So, how should I proceed? Sorry, I don't have a lot of background with the technicalities in tensorflow. What I need to do is classify about 350 images. How do I keep a single tensorflow session while looping over the image set? Thanks.

@tjingrant
Copy link
Collaborator

Please bear with my ignorance but why is 0.95s not good enough? 350*0.95s is barely 6 min.

If you really want, try increase the batch size dimension when you export the model from matlab and then batch the images together into a large mini-batch of size 350.

@vibhuagrawal14
Copy link
Author

6 min for 350 images would be really bad for my specific application. The classification task isn't too demanding computationally, and I would like to keep the time taken at a minimum. I will try what you suggested.

But still. I would really like to know how I can do consecutive runs of the Googlenet with the same tensorflow session. Thanks!

@tjingrant
Copy link
Collaborator

It's not possible with the current onnx_tf API, I basically modified tf_rep to test out my hypothesis that most of the time is wasted in GPU setup.

To allow persistent tf session across multiple runs, we would need to update our API. This is not very difficult, but may take some time for us to think through the consequences and implications (e.g., how to test this new API is also a bit of a headache, since our current test suite is already huge...).

@vibhuagrawal14
Copy link
Author

Thank you for your inputs. Would it be possible to share the modified API that you created for testing out your hypothesis? Even if it isn't fully tested, it would be of great help to me and I can work on it further.

@fumihwh
Copy link
Collaborator

fumihwh commented Oct 4, 2018

@vibhuagrawal14
Current idea is let tf_rep to hold sess. For example, add tf_rep.sess = tf.Session(graph=tf_rep.graph) to backend.py L163.
And in run, use self.sess instead of tf.Session()

@fumihwh
Copy link
Collaborator

fumihwh commented Oct 4, 2018

@vibhuagrawal14 Try refed PR.

@fumihwh
Copy link
Collaborator

fumihwh commented Oct 10, 2018

@vibhuagrawal14 Any updates?

@vibhuagrawal14
Copy link
Author

Hey. Sorry for the late reply. I tried the changes you suggested and the performance improved considerably. I am down from ~300 seconds to ~50 seconds. This is still 10x slower than what I get in MATLAB (4-5 seconds) but I think I can manage. Do you have any more ideas about further improving the performance?

@fumihwh
Copy link
Collaborator

fumihwh commented Oct 23, 2018

@vibhuagrawal14
Current you use batch size 1: float[1,5,224,224], with is inefficient for tf.
If you can increase batch size, the performance should be better.

ref: https://arxiv.org/pdf/1605.07678.pdf
Figure 3: Inference time vs. batch size.

@vibhuagrawal14
Copy link
Author

Quick update. Running the network from the exported graph (using tf_rep.export_graph) is very efficient. Takes about 7 seconds.

@anuar12
Copy link

anuar12 commented Mar 8, 2019

@vibhuagrawal14 hey, I am facing an issue that when I call run() the model is loaded every time, which is I think the same as yours.
Can you specify what exactly you did to make it work? Is it just making sure the tf.session is re-used?

@vibhuagrawal14
Copy link
Author

@anuar12 As mentioned by @fumihwh above, just add tf_rep.sess = tf.Session(graph=tf_rep.graph) to backend.py L163. And in run, use self.sess instead of tf.Session(). That did the trick for me.

@batrlatom
Copy link

Maybe out of topic, but you could export the onnx model to the tensorflow pb file and load it as you would do it with normal model. I was able to convert Pytorch model and run it in TF as fast as native one.

@azraelkuan
Copy link
Contributor

@batrlatom can u share your code?

 with tf.Session(graph=tf_graph, config=config) as sess:
        for _ in range(100):
            s = time.time()
            output = sess.run(output_tensor, feed_dict={
                input_tensor: inputs,
                condition_tensor: conditions
            })
            print('real time: {}'.format(output.shape[-1] / (hparams.sample_rate * (time.time() - s))))

this runs about 4 times slower than pytorch...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants