[Performance] Why does genai run 2x as fast as vanilla managed onnxruntime? 

### Describe the issue

I am running phi3-mini-int4 using the usual onnxruntime c# API and it is 2x as slow as when I use the [genai code](https://github.com/microsoft/onnxruntime-genai). I am using DirectML c# managed API and am testing it with sequence_length=1 each iteration and  using bound inputs and outputs. Basically I am just calling this in a loop, and not changing the input each time for testing but it is still not as fast as genai:
`session.RunWithBinding(runOptions, binding);`

So in that sense I can say well done for making genai so fast. 🙂

On the other hand, I wonder if you can share the settings or source code for things like sessionOptions and so on. GenAI is good but I really need to use the full capability of onnxruntime API. Since I believe GenAI is built on top of onnxruntime, it would be nice to be able to see the source code for this so I can make my app using onnxruntime API as fast as the GenAI code.

I am using the managed **onnxruntime library from nuget 1.19.1** and it is using the DirectML.dll which was installed with genai. 

Thanks for any help you can give.

### To reproduce

running a phi-3 model using genai code and then trying to run the same model using onnxruntime c# api

### Urgency

_No response_

### Platform

Windows

### OS Version

10

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

1.19.1

### ONNX Runtime API

C#

### Architecture

X64

### Execution Provider

DirectML

### Execution Provider Library Version

_No response_

### Model File

_No response_

### Is this a quantized model?

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] Why does genai run 2x as fast as vanilla managed onnxruntime? #21847

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance] Why does genai run 2x as fast as vanilla managed onnxruntime? #21847

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions