Skip to content
This repository has been archived by the owner on May 31, 2024. It is now read-only.

Analyse (and enhance) performances on multiple predictions #67

Closed
bitnick10 opened this issue May 24, 2019 · 13 comments
Closed

Analyse (and enhance) performances on multiple predictions #67

bitnick10 opened this issue May 24, 2019 · 13 comments
Assignees
Labels
Gorgonia / Gorgonnx This issue is related to the Gorgonia backend

Comments

@bitnick10
Copy link

I have a demo model with 39 inputs.it takes 0.5s to predict 10000 data using keras.with onnx-go it takes 5s to predict.

	for i := 0; i < 10000; i++ {
		model.SetInput(0, input)
		err = backend.Run()
		model.GetOutputTensors()
	}

Am I make some mistake here?

@owulveryck
Copy link
Owner

Hello,

TL;DR: no you are not, onnx-go is not (yet) optimized for performances.

Longer answer:
onnx-go is not optimized for performances (yet). In your case, it may be because of the way the Gorgonia backend itself is implemented. Actually, launching a backed.Run() initiates a lot of stuff under the hood (a gorgonia.TapeMachine is created each time for example). It is something that we are aware of, and hiding all of this within the Run() method should allow performance tuning without breaking the API (for example. there could be a reset method to recycle some elements, or create a worker pool of machines to execute the code - this is just ideas.)

Anyway, this is pure speculation and to enhance we need to know where the bottleneck is and to measure the performances. Your example could be a perfect starting point to start to analyze and to do some measurements.
Maybe we can turn your example into a _test.go and run some bench and analysis.

Do you mind sharing your complete example with us?

On top of that, we can also try to run your tests in concurrency via some goroutines to see how it behaves. The concurrency is also a goal as @blackrez and I would like to be able to run onnx-go inside web service.

@owulveryck owulveryck added the Gorgonia / Gorgonnx This issue is related to the Gorgonia backend label May 24, 2019
@owulveryck owulveryck changed the title predict slow Analyse (and enhance) performances on multiple predictions May 24, 2019
@bitnick10
Copy link
Author

@owulveryck , it's ok to share the model and data.It's only some study.Give me a email address or something?

@owulveryck
Copy link
Owner

Is it huge? can you copy/paste the python code here?

@bitnick10
Copy link
Author

bitnick10 commented May 24, 2019

I write a simple one

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
import keras.engine.saving
import keras.models
import math
import numpy as np
import stopwatch
import onnxmltools
data_size = 100000
x_train = np.array(np.random.rand(data_size,39),dtype='float32')
y_train = np.zeros(shape=(data_size,2),dtype='float32')

model = Sequential()
model.add(Dense(units=22,input_shape=(39,), activation='tanh'))
model.add(Dense(units=22, activation='tanh'))
model.add(Dense(units=2, activation='tanh'))
sgd = SGD(lr=0.01, momentum=0.9)
model.compile(loss='categorical_crossentropy',optimizer=sgd,)

sw = stopwatch.Stopwatch()
sws=[]
for i in range(2):
    sw = stopwatch.Stopwatch()
    sw.reset();
    sw.stop();
    sws.append(sw)

for i in range(100):
    sws[0].start()
    model.train_on_batch(x_train, y_train)
    sws[0].stop()
    sws[1].start()
    y_predict=model.predict(x_train,batch_size=len(x_train))
    sws[1].stop()
    print("{0} {1:.3f} {2:.3f} ".format(i,sws[0].duration/(i+1),sws[1].duration/(i+1)))

onnx_model = onnxmltools.convert_keras(model, target_opset=7)
onnxmltools.save_model(onnx_model,"model.onnx")

Hope helps

@blackrez blackrez self-assigned this May 26, 2019
@owulveryck
Copy link
Owner

A basic performance analysis has started in issue #68

@owulveryck owulveryck mentioned this issue May 29, 2019
@owulveryck
Copy link
Owner

I made a very simple test on my machine without actually checking the results; Gorgonia is winning against Tensorflow (if my test is ok):

Python

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
import keras.engine.saving
import keras.models
import math
import numpy as np
import time
model = Sequential()
import onnxmltools
data_size = 100000
x_train = np.array(np.random.rand(data_size,39),dtype='float32')
y_train = np.zeros(shape=(data_size,2),dtype='float32')

model.add(Dense(units=22,input_shape=(39,), activation='tanh'))
model.add(Dense(units=22, activation='tanh'))
model.add(Dense(units=2, activation='tanh'))
sgd = SGD(lr=0.01, momentum=0.9)
model.compile(loss='categorical_crossentropy',optimizer=sgd,)

model.train_on_batch(x_train, y_train)
total = 0
for i in range(100):
    x_test = np.array(np.random.rand(data_size,39),dtype='float32')

    start = time.time()
    y_predict=model.predict(x_test)
    end = time.time()
    total += (end - start)
    print(total / (i+1))

onnx_model = onnxmltools.convert_keras(model, target_opset=7)
onnxmltools.save_model(onnx_model,"model.onnx")

The prediction is on average 870ms

Go

This code should run something similar:

       datasize := 100000
        backend := gorgonnx.NewGraph()
        model := onnx.NewModel(backend)
        b, err := ioutil.ReadFile(os.Args[1])
        if err != nil {
                log.Fatal(err)
        }
        err = model.UnmarshalBinary(b)
        if err != nil {
                log.Fatal(err)
        }
        var d time.Duration
        for i := 0; i < 100; i++ {
                input := tensor.New(tensor.WithShape(datasize, 39), tensor.Of(tensor.Float32), tensor.WithBacking(tensor.Random(tensor.Float32, datasize*39)))
                model.SetInput(0, input)
                t := time.Now()
                err = backend.Run()
                if err != nil {
                        log.Fatal(err)
                }
                d += time.Since(t)
                fmt.Println(time.Duration(float64(d) / float64(i+1)))
        }

Gives on average 320ms (and around 200ms with a patch that i will commit to the tensor package soon).

@bitnick10
Copy link
Author

bitnick10 commented May 29, 2019

try this keras predict ,it's faster
y_predict=model.predict(x_train,batch_size=len(x_train))

@owulveryck
Copy link
Owner

A patch has been commited to the tensor.Tensor package; it really changes the performances and memory consumption.
Can you update your go workspace and give a new try?

@bitnick10
Copy link
Author

onnx-go improved.my ratio is 0.07s(keras) vs 0.3s

@owulveryck
Copy link
Owner

@bitnick10 , would you be kind enough to provide the python code and the Go code you are actually executing to get your results? This would allow us to run the exact same test as you do.

Maybe a gist.github.com can do the job.

Thanks

@bitnick10
Copy link
Author

bitnick10 commented May 30, 2019

I copied keras of mine and goalng of yours.I changed this line that your keras code without it y_predict=model.predict(x_train,batch_size=len(x_train)) which can make keras fast a lot if you add batch_size=len(x_train) and keras settings like this

 {
    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "tensorflow",
    "image_data_format": "channels_last"
}

@owulveryck
Copy link
Owner

You're right, setting the batch_size improve the performances of kheras and make it more efficient.
One thing I've noticed is that the tensor.Random(tensor.Float32, datasize*39) of the Go file generates numbers that are not between 0 and 1; this makes the computation of the Tanh expensive.
I've tried with a fix small value, and I am falling down to a 100ms in average for Go while I have around 50ms for the Python version.

I made profiling, and the next move will be to enhance the broadcasting (see issue #68):

small_numbers

Or maybe we can try to generate a model that does not "compact the tensor" and use broadcasting for testing. But I don't know if it's possible.

@owulveryck
Copy link
Owner

This PR from the tensor package gives good results;
I am closing this issue. A new issue can be raised to work on performances again.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Gorgonia / Gorgonnx This issue is related to the Gorgonia backend
Projects
None yet
Development

No branches or pull requests

3 participants