Googlenet running very slow #271

vibhuagrawal14 · 2018-10-03T19:30:33Z

An implementation of GoogLeNet which takes about 0.05 seconds in MATLAB to classify an image, is taking about 0.95 seconds in onnx-tf. As mentioned in issue #254 , I have tried strict=False, but there was no change in performance.

To Reproduce

Attaching code here:

import numpy as np
import scipy.io as sio
import scipy
import tensorflow as tf
import cv2
import onnx
from onnx_tf.backend import prepare
model = onnx.load('D:\Vibhu\googlenet9.onnx')
tf_rep = prepare(model,strict=False)

mat_contents = sio.loadmat('WDS7PSPCF0S11.IM0_131.mat')
mat_contents=mat_contents['img']
img=mat_contents
img=cv2.resize(img,(224,224))
img = np.moveaxis(img, -1,0)
tf_rep.run(img[np.newaxis,:,:,:])

.mat file here: https://drive.google.com/open?id=1P5LpxosVRwbz4bGe4oBZflPYDzB9RD0z

onnx model file here: https://drive.google.com/open?id=1XTFLNQgg7gPeWYBxUa_dmTHrjvvlpdLF

get_version.py from util folder gives an error saying

ModuleNotFoundError: No module named 'onnx'

Python version: 3.6.5

Using pip freeze for versions:

ONNX version: 1.3.0
ONNX-TF version: 1.2.0
Tensorflow version: 1.11.0
Tensorflow-gpu version: 1.11.0

Am I missing anything?

The text was updated successfully, but these errors were encountered:

fumihwh · 2018-10-03T22:45:53Z

Does your time (0.95s) include the init part tf_rep = prepare(model,strict=False)?

vibhuagrawal14 · 2018-10-03T22:47:09Z

No, the time of 0.95 seconds is only for the statement tf_rep.run(img[np.newaxis,:,:,:]).

tjingrant · 2018-10-04T03:17:10Z

It's mostly tensorflow trying to setup GPU context. Below is the GPU trace from nvvp.

If we keep the same tensorflow session and do 100 consecutive run on your googlenet, the time grows to 2.23s for 100 calls, averaging 22.3 ms each.

vibhuagrawal14 · 2018-10-04T03:22:29Z

So, how should I proceed? Sorry, I don't have a lot of background with the technicalities in tensorflow. What I need to do is classify about 350 images. How do I keep a single tensorflow session while looping over the image set? Thanks.

tjingrant · 2018-10-04T04:30:29Z

Please bear with my ignorance but why is 0.95s not good enough? 350*0.95s is barely 6 min.

If you really want, try increase the batch size dimension when you export the model from matlab and then batch the images together into a large mini-batch of size 350.

vibhuagrawal14 · 2018-10-04T04:34:44Z

6 min for 350 images would be really bad for my specific application. The classification task isn't too demanding computationally, and I would like to keep the time taken at a minimum. I will try what you suggested.

But still. I would really like to know how I can do consecutive runs of the Googlenet with the same tensorflow session. Thanks!

tjingrant · 2018-10-04T04:43:39Z

It's not possible with the current onnx_tf API, I basically modified tf_rep to test out my hypothesis that most of the time is wasted in GPU setup.

To allow persistent tf session across multiple runs, we would need to update our API. This is not very difficult, but may take some time for us to think through the consequences and implications (e.g., how to test this new API is also a bit of a headache, since our current test suite is already huge...).

vibhuagrawal14 · 2018-10-04T05:28:41Z

Thank you for your inputs. Would it be possible to share the modified API that you created for testing out your hypothesis? Even if it isn't fully tested, it would be of great help to me and I can work on it further.

fumihwh · 2018-10-04T06:03:38Z

@vibhuagrawal14
Current idea is let tf_rep to hold sess. For example, add tf_rep.sess = tf.Session(graph=tf_rep.graph) to backend.py L163.
And in run, use self.sess instead of tf.Session()

fumihwh · 2018-10-04T07:40:57Z

@vibhuagrawal14 Try refed PR.

fumihwh · 2018-10-10T22:44:04Z

@vibhuagrawal14 Any updates?

vibhuagrawal14 · 2018-10-22T19:35:59Z

Hey. Sorry for the late reply. I tried the changes you suggested and the performance improved considerably. I am down from ~300 seconds to ~50 seconds. This is still 10x slower than what I get in MATLAB (4-5 seconds) but I think I can manage. Do you have any more ideas about further improving the performance?

fumihwh · 2018-10-23T01:05:35Z

@vibhuagrawal14
Current you use batch size 1: float[1,5,224,224], with is inefficient for tf.
If you can increase batch size, the performance should be better.

ref: https://arxiv.org/pdf/1605.07678.pdf
Figure 3: Inference time vs. batch size.

vibhuagrawal14 · 2018-11-10T00:27:59Z

Quick update. Running the network from the exported graph (using tf_rep.export_graph) is very efficient. Takes about 7 seconds.

anuar12 · 2019-03-08T02:59:37Z

@vibhuagrawal14 hey, I am facing an issue that when I call run() the model is loaded every time, which is I think the same as yours.
Can you specify what exactly you did to make it work? Is it just making sure the tf.session is re-used?

vibhuagrawal14 · 2019-03-09T13:31:08Z

@anuar12 As mentioned by @fumihwh above, just add tf_rep.sess = tf.Session(graph=tf_rep.graph) to backend.py L163. And in run, use self.sess instead of tf.Session(). That did the trick for me.

batrlatom · 2019-08-22T20:35:17Z

Maybe out of topic, but you could export the onnx model to the tensorflow pb file and load it as you would do it with normal model. I was able to convert Pytorch model and run it in TF as fast as native one.

azraelkuan · 2019-09-02T10:44:34Z

@batrlatom can u share your code?

 with tf.Session(graph=tf_graph, config=config) as sess:
        for _ in range(100):
            s = time.time()
            output = sess.run(output_tensor, feed_dict={
                input_tensor: inputs,
                condition_tensor: conditions
            })
            print('real time: {}'.format(output.shape[-1] / (hparams.sample_rate * (time.time() - s))))

this runs about 4 times slower than pytorch...

batrlatom · 2019-09-02T15:01:35Z

@azraelkuan Take a look at
pth to onnx:
https://gist.github.com/batrlatom/ab02f057553819b0a1afd74ba7005a4a

onnx to pb:
https://gist.github.com/batrlatom/f67bbe8c26a916ad3134630fddc2018b

run pb:
https://gist.github.com/batrlatom/e938b5e17b04ba84123bc1a8fa1a0b8e

fumihwh mentioned this issue Oct 4, 2018

reuse sess in backend #273

Open

vibhuagrawal14 closed this as completed Nov 10, 2018

tjingrant mentioned this issue Jan 16, 2019

Performance issues with onnx_tf #353

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Googlenet running very slow #271

Googlenet running very slow #271

vibhuagrawal14 commented Oct 3, 2018 •

edited

Loading

fumihwh commented Oct 3, 2018

vibhuagrawal14 commented Oct 3, 2018

tjingrant commented Oct 4, 2018

vibhuagrawal14 commented Oct 4, 2018

tjingrant commented Oct 4, 2018

vibhuagrawal14 commented Oct 4, 2018

tjingrant commented Oct 4, 2018

vibhuagrawal14 commented Oct 4, 2018

fumihwh commented Oct 4, 2018

fumihwh commented Oct 4, 2018

fumihwh commented Oct 10, 2018

vibhuagrawal14 commented Oct 22, 2018

fumihwh commented Oct 23, 2018

vibhuagrawal14 commented Nov 10, 2018

anuar12 commented Mar 8, 2019

vibhuagrawal14 commented Mar 9, 2019

batrlatom commented Aug 22, 2019

azraelkuan commented Sep 2, 2019

batrlatom commented Sep 2, 2019

Googlenet running very slow #271

Googlenet running very slow #271

Comments

vibhuagrawal14 commented Oct 3, 2018 • edited Loading

fumihwh commented Oct 3, 2018

vibhuagrawal14 commented Oct 3, 2018

tjingrant commented Oct 4, 2018

vibhuagrawal14 commented Oct 4, 2018

tjingrant commented Oct 4, 2018

vibhuagrawal14 commented Oct 4, 2018

tjingrant commented Oct 4, 2018

vibhuagrawal14 commented Oct 4, 2018

fumihwh commented Oct 4, 2018

fumihwh commented Oct 4, 2018

fumihwh commented Oct 10, 2018

vibhuagrawal14 commented Oct 22, 2018

fumihwh commented Oct 23, 2018

vibhuagrawal14 commented Nov 10, 2018

anuar12 commented Mar 8, 2019

vibhuagrawal14 commented Mar 9, 2019

batrlatom commented Aug 22, 2019

azraelkuan commented Sep 2, 2019

batrlatom commented Sep 2, 2019

vibhuagrawal14 commented Oct 3, 2018 •

edited

Loading