Same model succeeded in Python but failed in C# with DML EP #13429

yqzhishen · 2022-10-25T02:37:34Z

Describe the issue

The same ONNX model can run in Python but cannot run in C# with DirectML execution provider, with the following error message:

Microsoft.ML.OnnxRuntime.OnnxRuntimeException : [ErrorCode:RuntimeException] Non-zero status code returned while running ReduceMean node. Name:'Reduce...

Microsoft.ML.OnnxRuntime.OnnxRuntimeException : [ErrorCode:RuntimeException] Non-zero status code returned while running ReduceMean node. Name:'ReduceMean_0' Status Message: 
   在 Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess(IntPtr nativeStatus) 位置 D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\NativeApiStatus.shared.cs:行号 31
   在 Microsoft.ML.OnnxRuntime.InferenceSession.RunImpl(RunOptions options, IntPtr[] inputNames, IntPtr[] inputValues, IntPtr[] outputNames, DisposableList`1 cleanupList) 位置 D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\InferenceSession.shared.cs:行号 694
   在 Microsoft.ML.OnnxRuntime.InferenceSession.Run(IReadOnlyCollection`1 inputs, IReadOnlyCollection`1 outputNames, RunOptions options) 位置 D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\InferenceSession.shared.cs:行号 226
   在 Microsoft.ML.OnnxRuntime.InferenceSession.Run(IReadOnlyCollection`1 inputs, IReadOnlyCollection`1 outputNames) 位置 D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\InferenceSession.shared.cs:行号 208
   在 Microsoft.ML.OnnxRuntime.InferenceSession.Run(IReadOnlyCollection`1 inputs) 位置 D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\InferenceSession.shared.cs:行号 197
   在 Crepe.Onnx.Tests.Tests.TestInfer() 位置 E:\OpenVPI\Crepe.Onnx\Crepe.Onnx.Tests\Tests.cs:行号 46

The error messages above appears in every time when I ran this model in C# on DirectML. No errors are produced if I use default CPU or CUDA providers, or if I run this model in Python.

Expected behaviors

The model should run with no errors both in Python and C#.

Device

The model fails both on an Intel(R) UHD Graphics 630 and a GTX 1050Ti Max Q in C#, and runs just fine on both devices in Python.

Model

tiny_model.zip

To reproduce

The following Python program runs without errors:

import numpy as np
import onnxruntime as ort


frames = np.random.random((512, 1024)).astype(np.float32)
frames.fill(1.)
options = ort.SessionOptions()
options.enable_mem_pattern = False
options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
session = ort.InferenceSession('tiny.onnx', sess_options=options, providers=['DmlExecutionProvider'])
session.run(None, {'frames': frames})

But the following C# program produces the above error:

var frames = new DenseTensor<float>(new[] {512, 1024});
frames.Fill(1f);
var options = new SessionOptions();
options.EnableMemoryPattern = false;
options.ExecutionMode = ExecutionMode.ORT_SEQUENTIAL;
options.AppendExecutionProvider_DML();
var input = new[]
{
    NamedOnnxValue.CreateFromTensor("frames", frames)
};
using (var session = new InferenceSession("Assets/tiny.onnx", options))
using (var output = session.Run(input))
{
    output.ToArray()[0].AsTensor<float>().ToDenseTensor();
}

Urgency

None, but it's quite wired that ONNXRuntime behaves the different in the two languages.

Platform

Windows

OS Version

Windows 10 21H2 19044.2130

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.12.1

ONNX Runtime API

C#

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

1.9.0

The text was updated successfully, but these errors were encountered:

fdwr · 2022-10-27T02:39:10Z

Dmitri's initial investigation indicates a packaging problem rather than a C# binding or DirectML issue. Evidently when building the C# program via Visual Studio, the build step is not unpacking the DLL from the Nuget package (and since the program is not finding DirectML.dll in the application directory, it falls back to the older system version, which didn't support those parameters). Manually copying the DirectML.dll from inside the Nuget package alongside the application made it work again, confirming that was the issue. Stay tuned...

yqzhishen · 2022-10-27T04:45:05Z

Dmitri's initial investigation indicates a packaging problem rather than a C# binding or DirectML issue. Evidently when building the C# program via Visual Studio, the build step is not unpacking the DLL from the Nuget package (and since the program is not finding DirectML.dll in the application directory, it falls back to the older system version, which didn't support those parameters). Manually copying the DirectML.dll from inside the Nuget package alongside the application made it work again, confirming that was the issue. Stay tuned...

Thanks for the explanations, but manually copying DirectML.dll does not seem to work on my machine. By the way, I've got no idea whether I copied the right DLLs to the application directory. Here is the program I built myself for testing this issue, which was targeted to .net framework 4.8:
Onnx.Tests.DirectML.zip
Run with Onnx.Tests.DirectML.exe <device_id> to run the code above to reproduce the error on the specific DirectML device.
I hope this can help you debugging.

yuslepukhin · 2022-10-27T17:18:07Z

Stay tuned for specific recommendations.

yuslepukhin · 2022-10-27T18:15:36Z

Long story short. There is a bug with Microsoft.AI.DirectML package. In the projects that are targeted to Any CPU the direct ML DLLs are not copied alongside with the binary you are building.

Install and restore 3 packages see the picture below.

The workaround currently is to target a specific arch. And those DirectML appears automatically where they should be. Otherwise, you just extract those DLLs manually.

I am using VS 2022. Go to your binary Project properties/Build and select Platform Target other than Any CPU. I selected x64. In that case the DirectML dlls

Your original program that you gave when reporting the issue runs as intended.

DirectML now loads from the directory where your binary resides. The rules where the system looks for dependencies can be complicated, but the directory where the binary resides is looked up first.

yuslepukhin · 2022-10-27T20:31:48Z

A correction is in order. Microsoft.AI.DirectML is a native package, thus Any CPU managed configuration does not apply. However, the native Platform Target has to be set. DirectML is consumed by onnxruntime, which is an x64 binary in this example, so the Platform Target must set as appropriate.

yuslepukhin · 2022-10-27T20:46:57Z

The fact that DirectML.dll ships as a system component makes it too easy for applications to load the wrong library at runtime. Therefore, one must make sure that the correct library is used during development and runtime.

yqzhishen · 2022-10-28T04:35:56Z

A correction is in order. Microsoft.AI.DirectML is a native package, thus Any CPU managed configuration does not apply. However, the native Platform Target has to be set. DirectML is consumed by onnxruntime, which is an x64 binary in this example, so the Platform Target must set as appropriate.

Something really wired is that when I use NUnit to run the program in a unit test, it fails even if I already selected the x64 target platform. However, when I run it directly in a console application or run it in a class library referenced by a console application, your solution works just fine. Maybe there is something to do with NUnit which I have not figured out. Anyway, it will not continue to bother me in my future development. Thanks a lot for your help.

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Oct 25, 2022

yqzhishen changed the title ~~Same model succeeded in Python but failed in C# on DirectML~~ Same model succeeded in Python but failed in C# with DML EP Oct 25, 2022

github-actions bot added the ep:DML issues related to the DirectML execution provider label Oct 25, 2022

yuslepukhin self-assigned this Oct 26, 2022

yuslepukhin mentioned this issue Oct 27, 2022

Low Spec Computer do not operate the 'heavy onnx model(big size)' when using DirectML Mode #13453

Closed

yuslepukhin added api:CSharp issues related to the C# API and removed ep:CUDA issues related to the CUDA execution provider labels Oct 27, 2022

yuslepukhin mentioned this issue Oct 27, 2022

DML EP cannot load some onnx files. #12538

Closed

yqzhishen closed this as completed Oct 30, 2022

yuslepukhin mentioned this issue Jan 12, 2023

Fails to load this unet.onnx from Stable Diffusion (DirectML problem?) #14268

Closed

This was referenced Jan 25, 2023

[Build] Windows 10 - Can't build DML backend with Nuget. #14376

Closed

[Feature Request] Expose GetAvailableProviders in c# bindings #14378

Closed

canaxx mentioned this issue Jan 27, 2023

ORT 1.14 Release Candidate available for testing #14431

Closed

creaitr mentioned this issue Feb 27, 2023

DML EP cannot load some quantized onnx files. #14835

Open

divideconcept mentioned this issue Jul 4, 2023

Crash with certain models in C++ DirectML, but not in Python DirectML #16564

Closed

thnak mentioned this issue Sep 6, 2023

Softmax node error with AMD gpu in C# api #17307

Closed

decahedron1 mentioned this issue Sep 27, 2023

[DirectML] CreateSession failed pykeio/ort#94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same model succeeded in Python but failed in C# with DML EP #13429

Same model succeeded in Python but failed in C# with DML EP #13429

yqzhishen commented Oct 25, 2022 •

edited

Loading

fdwr commented Oct 27, 2022

yqzhishen commented Oct 27, 2022

yuslepukhin commented Oct 27, 2022

yuslepukhin commented Oct 27, 2022

yuslepukhin commented Oct 27, 2022

yuslepukhin commented Oct 27, 2022 •

edited

Loading

yqzhishen commented Oct 28, 2022

Same model succeeded in Python but failed in C# with DML EP #13429

Same model succeeded in Python but failed in C# with DML EP #13429

Comments

yqzhishen commented Oct 25, 2022 • edited Loading

Describe the issue

Expected behaviors

Device

Model

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

fdwr commented Oct 27, 2022

yqzhishen commented Oct 27, 2022

yuslepukhin commented Oct 27, 2022

yuslepukhin commented Oct 27, 2022

yuslepukhin commented Oct 27, 2022

yuslepukhin commented Oct 27, 2022 • edited Loading

yqzhishen commented Oct 28, 2022

yqzhishen commented Oct 25, 2022 •

edited

Loading

yuslepukhin commented Oct 27, 2022 •

edited

Loading