-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Describe the issue
I am running phi3-mini-int4 using the usual onnxruntime c# API and it is 2x as slow as when I use the genai code. I am using DirectML c# managed API and am testing it with sequence_length=1 each iteration and using bound inputs and outputs. Basically I am just calling this in a loop, and not changing the input each time for testing but it is still not as fast as genai:
session.RunWithBinding(runOptions, binding);
So in that sense I can say well done for making genai so fast. 🙂
On the other hand, I wonder if you can share the settings or source code for things like sessionOptions and so on. GenAI is good but I really need to use the full capability of onnxruntime API. Since I believe GenAI is built on top of onnxruntime, it would be nice to be able to see the source code for this so I can make my app using onnxruntime API as fast as the GenAI code.
I am using the managed onnxruntime library from nuget 1.19.1 and it is using the DirectML.dll which was installed with genai.
Thanks for any help you can give.
To reproduce
running a phi-3 model using genai code and then trying to run the same model using onnxruntime c# api
Urgency
No response
Platform
Windows
OS Version
10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.19.1
ONNX Runtime API
C#
Architecture
X64
Execution Provider
DirectML
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Yes