Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk Prediction for multi input in C# #20148

Closed
roushrsh opened this issue Mar 29, 2024 · 5 comments
Closed

Bulk Prediction for multi input in C# #20148

roushrsh opened this issue Mar 29, 2024 · 5 comments
Labels
ep:CUDA issues related to the CUDA execution provider platform:windows issues related to the Windows platform

Comments

@roushrsh
Copy link

roushrsh commented Mar 29, 2024

Describe the issue

4090 gpu, 13900k CPU. How can I predict in bulk using sess.run in C#?

Python appears to run an order of magnitude faster, as I can bulk predict in tensorflow doing model(input) or model.predict(input).

See reproduce below, thanks

To reproduce

Hi,

My Model takes in 2 1D array images and 5 features.

     var inputsB = new List<NamedOnnxValue> { namedInputValue1B, namedInputValue2B,   namedInputValue4,namedInputValue5,namedInputValue6,
namedInputValue7,namedInputValue8};
      listTopredOn.Add(inputsB);

I set up:
int gpuDeviceId = 0; // The GPU device ID to execute on
using var gpuSessionOptoins = SessionOptions.MakeSessionOptionWithCudaProvider(gpuDeviceId);

 InferenceSession globalSessionX;
 globalSessionX = new InferenceSession(ONNX_MODEL_PATH, gpuSessionOptoins);

///then I go through all of them, and predict one at a time

 for (int i = 0; i < listTopredOn.Count; i++)
 {

     var ff1 = globalSessionX.Run(listTopredOn[i])[0].AsTensor<float>();
     var  resultValue1 = ff1.GetValue(0);
     var  prediction11 = Convert.ToDouble(resultValue1);
 }

This takes much longer on GPU (4090) than on my CPU. Whereas when doing my prediction on python it's over an order of magnitude faster (yes this is after I do an initial prediction so the model is setup). I imagine this is because it's predicting one at a time doing this, and doing so requires i/o gpu to cpu calculations which are extremely slow when repeated.

How can I change my input or code so the onnx sess does a bulk prediction in C#?
I believe if I had just one imagine (1, 512, 512, 3) I could just change the first to (1500,512,512,3), but I'm not sure what to do in my case.

Thanks so much!

Urgency

No response

Platform

Windows

OS Version

10

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.17.1

ONNX Runtime API

C#

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

11

@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider platform:windows issues related to the Windows platform labels Mar 29, 2024
@yuslepukhin
Copy link
Member

The input/output dimensions of your model dictate what you can process.
This does not change with the API you choose to use, the model remains the same so is the implementation behind the API.

You also need to stop leaking memory in a loop. Please, read https://onnxruntime.ai/docs/tutorials/csharp/basic_csharp.html

@roushrsh
Copy link
Author

I guess I'm asking how to predict in batches, rather than one at a time in a loop. Thanks.

@roushrsh
Copy link
Author

I currently save the model

that has a last layer:
model4 = keras.models.Model(inputs=[spectra1, spectra2, ratioAProtS,ratioBProtS,isoS,hitA,hitB,sizeS], outputs=output)

as so:

import tf2onnx
import tensorflow as tf
import onnx
input_signature = ([tf.TensorSpec((1, (1001)), tf.float32) ,tf.TensorSpec((1, (1001)), tf.float32)
,tf.TensorSpec((1, (1001)), tf.float32) ,tf.TensorSpec((1),
tf.float32),tf.TensorSpec((1), tf.float32) ,
tf.TensorSpec((1), tf.float32) ,
tf.TensorSpec((1), tf.float32) ,
tf.TensorSpec((1), tf.float32) ,tf.TensorSpec((1), tf.float32) ])

onnx_model, _ = tf2onnx.convert.from_keras(model4 , input_signature, opset=18)
onnx.save(onnx_model, "../../Desktop/testModel.onnx")

how could I change it for multibatch? thanks

@yuslepukhin
Copy link
Member

yuslepukhin commented Mar 29, 2024

There are no special capabilities for bulk predictions. If you model is constructed in a way that it would accept the shapes in your example, then you feed that data. C# is just a thin layer on top of the native library.

There is no difference between python, C# in terms of the capabilities.

@roushrsh
Copy link
Author

Beautiful, @yuslepukhin . Thanks. I figured it out.

For future people, what yuslepukhin says is key. There's no difference.

You need a mix of these two threads:

#9867
onnx/onnx#2182

Just need to change to N, and then provide what you need.

Got it working in python, should be able to in C# now. Will report back if not, but otherwise closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider platform:windows issues related to the Windows platform
Projects
None yet
Development

No branches or pull requests

2 participants