Analyse (and enhance) performances on multiple predictions #67

bitnick10 · 2019-05-24T11:04:38Z

I have a demo model with 39 inputs.it takes 0.5s to predict 10000 data using keras.with onnx-go it takes 5s to predict.

	for i := 0; i < 10000; i++ {
		model.SetInput(0, input)
		err = backend.Run()
		model.GetOutputTensors()
	}

Am I make some mistake here?

The text was updated successfully, but these errors were encountered:

owulveryck · 2019-05-24T11:27:42Z

Hello,

TL;DR: no you are not, onnx-go is not (yet) optimized for performances.

Longer answer:
onnx-go is not optimized for performances (yet). In your case, it may be because of the way the Gorgonia backend itself is implemented. Actually, launching a backed.Run() initiates a lot of stuff under the hood (a gorgonia.TapeMachine is created each time for example). It is something that we are aware of, and hiding all of this within the Run() method should allow performance tuning without breaking the API (for example. there could be a reset method to recycle some elements, or create a worker pool of machines to execute the code - this is just ideas.)

Anyway, this is pure speculation and to enhance we need to know where the bottleneck is and to measure the performances. Your example could be a perfect starting point to start to analyze and to do some measurements.
Maybe we can turn your example into a _test.go and run some bench and analysis.

Do you mind sharing your complete example with us?

On top of that, we can also try to run your tests in concurrency via some goroutines to see how it behaves. The concurrency is also a goal as @blackrez and I would like to be able to run onnx-go inside web service.

bitnick10 · 2019-05-24T11:56:28Z

@owulveryck , it's ok to share the model and data.It's only some study.Give me a email address or something?

owulveryck · 2019-05-24T19:54:58Z

Is it huge? can you copy/paste the python code here?

bitnick10 · 2019-05-24T22:21:23Z

I write a simple one

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
import keras.engine.saving
import keras.models
import math
import numpy as np
import stopwatch
import onnxmltools
data_size = 100000
x_train = np.array(np.random.rand(data_size,39),dtype='float32')
y_train = np.zeros(shape=(data_size,2),dtype='float32')

model = Sequential()
model.add(Dense(units=22,input_shape=(39,), activation='tanh'))
model.add(Dense(units=22, activation='tanh'))
model.add(Dense(units=2, activation='tanh'))
sgd = SGD(lr=0.01, momentum=0.9)
model.compile(loss='categorical_crossentropy',optimizer=sgd,)

sw = stopwatch.Stopwatch()
sws=[]
for i in range(2):
    sw = stopwatch.Stopwatch()
    sw.reset();
    sw.stop();
    sws.append(sw)

for i in range(100):
    sws[0].start()
    model.train_on_batch(x_train, y_train)
    sws[0].stop()
    sws[1].start()
    y_predict=model.predict(x_train,batch_size=len(x_train))
    sws[1].stop()
    print("{0} {1:.3f} {2:.3f} ".format(i,sws[0].duration/(i+1),sws[1].duration/(i+1)))

onnx_model = onnxmltools.convert_keras(model, target_opset=7)
onnxmltools.save_model(onnx_model,"model.onnx")

Hope helps

owulveryck · 2019-05-28T09:01:03Z

A basic performance analysis has started in issue #68

owulveryck · 2019-05-29T19:59:44Z

I made a very simple test on my machine without actually checking the results; Gorgonia is winning against Tensorflow (if my test is ok):

Python

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
import keras.engine.saving
import keras.models
import math
import numpy as np
import time
model = Sequential()
import onnxmltools
data_size = 100000
x_train = np.array(np.random.rand(data_size,39),dtype='float32')
y_train = np.zeros(shape=(data_size,2),dtype='float32')

model.add(Dense(units=22,input_shape=(39,), activation='tanh'))
model.add(Dense(units=22, activation='tanh'))
model.add(Dense(units=2, activation='tanh'))
sgd = SGD(lr=0.01, momentum=0.9)
model.compile(loss='categorical_crossentropy',optimizer=sgd,)

model.train_on_batch(x_train, y_train)
total = 0
for i in range(100):
    x_test = np.array(np.random.rand(data_size,39),dtype='float32')

    start = time.time()
    y_predict=model.predict(x_test)
    end = time.time()
    total += (end - start)
    print(total / (i+1))

onnx_model = onnxmltools.convert_keras(model, target_opset=7)
onnxmltools.save_model(onnx_model,"model.onnx")

The prediction is on average 870ms

Go

This code should run something similar:

       datasize := 100000
        backend := gorgonnx.NewGraph()
        model := onnx.NewModel(backend)
        b, err := ioutil.ReadFile(os.Args[1])
        if err != nil {
                log.Fatal(err)
        }
        err = model.UnmarshalBinary(b)
        if err != nil {
                log.Fatal(err)
        }
        var d time.Duration
        for i := 0; i < 100; i++ {
                input := tensor.New(tensor.WithShape(datasize, 39), tensor.Of(tensor.Float32), tensor.WithBacking(tensor.Random(tensor.Float32, datasize*39)))
                model.SetInput(0, input)
                t := time.Now()
                err = backend.Run()
                if err != nil {
                        log.Fatal(err)
                }
                d += time.Since(t)
                fmt.Println(time.Duration(float64(d) / float64(i+1)))
        }

Gives on average 320ms (and around 200ms with a patch that i will commit to the tensor package soon).

bitnick10 · 2019-05-29T20:37:43Z

try this keras predict ,it's faster
y_predict=model.predict(x_train,batch_size=len(x_train))

owulveryck · 2019-05-30T07:17:42Z

A patch has been commited to the tensor.Tensor package; it really changes the performances and memory consumption.
Can you update your go workspace and give a new try?

bitnick10 · 2019-05-30T08:13:20Z

onnx-go improved.my ratio is 0.07s(keras) vs 0.3s

owulveryck · 2019-05-30T09:56:36Z

@bitnick10 , would you be kind enough to provide the python code and the Go code you are actually executing to get your results? This would allow us to run the exact same test as you do.

Maybe a gist.github.com can do the job.

Thanks

bitnick10 · 2019-05-30T11:15:25Z

I copied keras of mine and goalng of yours.I changed this line that your keras code without it y_predict=model.predict(x_train,batch_size=len(x_train)) which can make keras fast a lot if you add batch_size=len(x_train) and keras settings like this

 {
    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "tensorflow",
    "image_data_format": "channels_last"
}

owulveryck · 2019-05-30T12:18:01Z

You're right, setting the batch_size improve the performances of kheras and make it more efficient.
One thing I've noticed is that the tensor.Random(tensor.Float32, datasize*39) of the Go file generates numbers that are not between 0 and 1; this makes the computation of the Tanh expensive.
I've tried with a fix small value, and I am falling down to a 100ms in average for Go while I have around 50ms for the Python version.

I made profiling, and the next move will be to enhance the broadcasting (see issue #68):

Or maybe we can try to generate a model that does not "compact the tensor" and use broadcasting for testing. But I don't know if it's possible.

owulveryck · 2019-06-09T14:02:33Z

This PR from the tensor package gives good results;
I am closing this issue. A new issue can be raised to work on performances again.

owulveryck added the Gorgonia / Gorgonnx This issue is related to the Gorgonia backend label May 24, 2019

owulveryck changed the title ~~predict slow~~ Analyse (and enhance) performances on multiple predictions May 24, 2019

blackrez self-assigned this May 26, 2019

owulveryck mentioned this issue May 29, 2019

Benchmarks #69

Merged

owulveryck closed this as completed Jun 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyse (and enhance) performances on multiple predictions #67

Analyse (and enhance) performances on multiple predictions #67

bitnick10 commented May 24, 2019

owulveryck commented May 24, 2019

bitnick10 commented May 24, 2019

owulveryck commented May 24, 2019

bitnick10 commented May 24, 2019 •

edited

Loading

owulveryck commented May 28, 2019

owulveryck commented May 29, 2019

bitnick10 commented May 29, 2019 •

edited

Loading

owulveryck commented May 30, 2019

bitnick10 commented May 30, 2019

owulveryck commented May 30, 2019

bitnick10 commented May 30, 2019 •

edited

Loading

owulveryck commented May 30, 2019

owulveryck commented Jun 9, 2019

Analyse (and enhance) performances on multiple predictions #67

Analyse (and enhance) performances on multiple predictions #67

Comments

bitnick10 commented May 24, 2019

owulveryck commented May 24, 2019

bitnick10 commented May 24, 2019

owulveryck commented May 24, 2019

bitnick10 commented May 24, 2019 • edited Loading

owulveryck commented May 28, 2019

owulveryck commented May 29, 2019

Python

Go

bitnick10 commented May 29, 2019 • edited Loading

owulveryck commented May 30, 2019

bitnick10 commented May 30, 2019

owulveryck commented May 30, 2019

bitnick10 commented May 30, 2019 • edited Loading

owulveryck commented May 30, 2019

owulveryck commented Jun 9, 2019

bitnick10 commented May 24, 2019 •

edited

Loading

bitnick10 commented May 29, 2019 •

edited

Loading

bitnick10 commented May 30, 2019 •

edited

Loading