Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same model succeeded in Python but failed in C# with DML EP #13429

Closed
yqzhishen opened this issue Oct 25, 2022 · 7 comments
Closed

Same model succeeded in Python but failed in C# with DML EP #13429

yqzhishen opened this issue Oct 25, 2022 · 7 comments
Assignees
Labels
api:CSharp issues related to the C# API ep:DML issues related to the DirectML execution provider

Comments

@yqzhishen
Copy link

yqzhishen commented Oct 25, 2022

Describe the issue

The same ONNX model can run in Python but cannot run in C# with DirectML execution provider, with the following error message:

Microsoft.ML.OnnxRuntime.OnnxRuntimeException : [ErrorCode:RuntimeException] Non-zero status code returned while running ReduceMean node. Name:'Reduce...

Microsoft.ML.OnnxRuntime.OnnxRuntimeException : [ErrorCode:RuntimeException] Non-zero status code returned while running ReduceMean node. Name:'ReduceMean_0' Status Message: 
   在 Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess(IntPtr nativeStatus) 位置 D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\NativeApiStatus.shared.cs:行号 31
   在 Microsoft.ML.OnnxRuntime.InferenceSession.RunImpl(RunOptions options, IntPtr[] inputNames, IntPtr[] inputValues, IntPtr[] outputNames, DisposableList`1 cleanupList) 位置 D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\InferenceSession.shared.cs:行号 694
   在 Microsoft.ML.OnnxRuntime.InferenceSession.Run(IReadOnlyCollection`1 inputs, IReadOnlyCollection`1 outputNames, RunOptions options) 位置 D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\InferenceSession.shared.cs:行号 226
   在 Microsoft.ML.OnnxRuntime.InferenceSession.Run(IReadOnlyCollection`1 inputs, IReadOnlyCollection`1 outputNames) 位置 D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\InferenceSession.shared.cs:行号 208
   在 Microsoft.ML.OnnxRuntime.InferenceSession.Run(IReadOnlyCollection`1 inputs) 位置 D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\InferenceSession.shared.cs:行号 197
   在 Crepe.Onnx.Tests.Tests.TestInfer() 位置 E:\OpenVPI\Crepe.Onnx\Crepe.Onnx.Tests\Tests.cs:行号 46

The error messages above appears in every time when I ran this model in C# on DirectML. No errors are produced if I use default CPU or CUDA providers, or if I run this model in Python.

Expected behaviors

The model should run with no errors both in Python and C#.

Device

The model fails both on an Intel(R) UHD Graphics 630 and a GTX 1050Ti Max Q in C#, and runs just fine on both devices in Python.

Model

tiny_model.zip

To reproduce

The following Python program runs without errors:

import numpy as np
import onnxruntime as ort


frames = np.random.random((512, 1024)).astype(np.float32)
frames.fill(1.)
options = ort.SessionOptions()
options.enable_mem_pattern = False
options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
session = ort.InferenceSession('tiny.onnx', sess_options=options, providers=['DmlExecutionProvider'])
session.run(None, {'frames': frames})

But the following C# program produces the above error:

var frames = new DenseTensor<float>(new[] {512, 1024});
frames.Fill(1f);
var options = new SessionOptions();
options.EnableMemoryPattern = false;
options.ExecutionMode = ExecutionMode.ORT_SEQUENTIAL;
options.AppendExecutionProvider_DML();
var input = new[]
{
    NamedOnnxValue.CreateFromTensor("frames", frames)
};
using (var session = new InferenceSession("Assets/tiny.onnx", options))
using (var output = session.Run(input))
{
    output.ToArray()[0].AsTensor<float>().ToDenseTensor();
}

Urgency

None, but it's quite wired that ONNXRuntime behaves the different in the two languages.

Platform

Windows

OS Version

Windows 10 21H2 19044.2130

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.12.1

ONNX Runtime API

C#

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

1.9.0

@github-actions github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Oct 25, 2022
@yqzhishen yqzhishen changed the title Same model succeeded in Python but failed in C# on DirectML Same model succeeded in Python but failed in C# with DML EP Oct 25, 2022
@github-actions github-actions bot added the ep:DML issues related to the DirectML execution provider label Oct 25, 2022
@yuslepukhin yuslepukhin self-assigned this Oct 26, 2022
@fdwr
Copy link
Contributor

fdwr commented Oct 27, 2022

Dmitri's initial investigation indicates a packaging problem rather than a C# binding or DirectML issue. Evidently when building the C# program via Visual Studio, the build step is not unpacking the DLL from the Nuget package (and since the program is not finding DirectML.dll in the application directory, it falls back to the older system version, which didn't support those parameters). Manually copying the DirectML.dll from inside the Nuget package alongside the application made it work again, confirming that was the issue. Stay tuned...

@yqzhishen
Copy link
Author

Dmitri's initial investigation indicates a packaging problem rather than a C# binding or DirectML issue. Evidently when building the C# program via Visual Studio, the build step is not unpacking the DLL from the Nuget package (and since the program is not finding DirectML.dll in the application directory, it falls back to the older system version, which didn't support those parameters). Manually copying the DirectML.dll from inside the Nuget package alongside the application made it work again, confirming that was the issue. Stay tuned...

Thanks for the explanations, but manually copying DirectML.dll does not seem to work on my machine. By the way, I've got no idea whether I copied the right DLLs to the application directory. Here is the program I built myself for testing this issue, which was targeted to .net framework 4.8:
Onnx.Tests.DirectML.zip
Run with Onnx.Tests.DirectML.exe <device_id> to run the code above to reproduce the error on the specific DirectML device.
I hope this can help you debugging.

@yuslepukhin
Copy link
Member

Stay tuned for specific recommendations.

@yuslepukhin
Copy link
Member

Long story short. There is a bug with Microsoft.AI.DirectML package. In the projects that are targeted to Any CPU the direct ML DLLs are not copied alongside with the binary you are building.

Install and restore 3 packages see the picture below.

image

The workaround currently is to target a specific arch. And those DirectML appears automatically where they should be. Otherwise, you just extract those DLLs manually.

I am using VS 2022. Go to your binary Project properties/Build and select Platform Target other than Any CPU. I selected x64. In that case the DirectML dlls

image

Your original program that you gave when reporting the issue runs as intended.

DirectML now loads from the directory where your binary resides. The rules where the system looks for dependencies can be complicated, but the directory where the binary resides is looked up first.

image

@yuslepukhin yuslepukhin added api:CSharp issues related to the C# API and removed ep:CUDA issues related to the CUDA execution provider labels Oct 27, 2022
@yuslepukhin
Copy link
Member

A correction is in order. Microsoft.AI.DirectML is a native package, thus Any CPU managed configuration does not apply. However, the native Platform Target has to be set. DirectML is consumed by onnxruntime, which is an x64 binary in this example, so the Platform Target must set as appropriate.

@yuslepukhin
Copy link
Member

yuslepukhin commented Oct 27, 2022

The fact that DirectML.dll ships as a system component makes it too easy for applications to load the wrong library at runtime. Therefore, one must make sure that the correct library is used during development and runtime.

@yqzhishen
Copy link
Author

A correction is in order. Microsoft.AI.DirectML is a native package, thus Any CPU managed configuration does not apply. However, the native Platform Target has to be set. DirectML is consumed by onnxruntime, which is an x64 binary in this example, so the Platform Target must set as appropriate.

Something really wired is that when I use NUnit to run the program in a unit test, it fails even if I already selected the x64 target platform. However, when I run it directly in a console application or run it in a class library referenced by a console application, your solution works just fine. Maybe there is something to do with NUnit which I have not figured out. Anyway, it will not continue to bother me in my future development. Thanks a lot for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:CSharp issues related to the C# API ep:DML issues related to the DirectML execution provider
Projects
None yet
Development

No branches or pull requests

3 participants