Skip to content

[Performance] Why does genai run 2x as fast as vanilla managed onnxruntime?  #21847

@elephantpanda

Description

@elephantpanda

Describe the issue

I am running phi3-mini-int4 using the usual onnxruntime c# API and it is 2x as slow as when I use the genai code. I am using DirectML c# managed API and am testing it with sequence_length=1 each iteration and using bound inputs and outputs. Basically I am just calling this in a loop, and not changing the input each time for testing but it is still not as fast as genai:
session.RunWithBinding(runOptions, binding);

So in that sense I can say well done for making genai so fast. 🙂

On the other hand, I wonder if you can share the settings or source code for things like sessionOptions and so on. GenAI is good but I really need to use the full capability of onnxruntime API. Since I believe GenAI is built on top of onnxruntime, it would be nice to be able to see the source code for this so I can make my app using onnxruntime API as fast as the GenAI code.

I am using the managed onnxruntime library from nuget 1.19.1 and it is using the DirectML.dll which was installed with genai.

Thanks for any help you can give.

To reproduce

running a phi-3 model using genai code and then trying to run the same model using onnxruntime c# api

Urgency

No response

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.19.1

ONNX Runtime API

C#

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    .NETPull requests that update .net codeapi:CSharpissues related to the C# APIep:DMLissues related to the DirectML execution providerperformanceissues related to performance regressionsstaleissues that have not been addressed in a while; categorized by a bot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions