Added DML and CUDA provider support in onnxruntime-node #16050

dakenf · 2023-05-23T15:48:40Z

Description

I've added changes to support CUDA and DML (only on Windows, on other platforms it will throw an error)

Motivation and Context

It fixes this feature request #14127 which is tracked here #14529

I was working on StableDiffusion implementation for node.js and it is very slow on CPU, so GPU support is essential.

Here is a working demo with a patched and precompiled version https://github.com/dakenf/stable-diffusion-nodejs

dakenf · 2023-05-23T15:49:41Z

@microsoft-github-policy-service agree

fs-eire · 2023-05-23T22:01:09Z

@snnn are the existing tasks in Zip-*-Packaging-Pipeline generates DLLs for cuda and DML?

dakenf · 2023-05-23T22:10:50Z

@fs-eire it does not require cuda libs. Will just throw JS error "onnxruntime_providers_cuda.dll(so)" failed to load" if not installed on user's system when using "cuda" execution provider. But it needs bundled DirectML.dll because the one included with windows is outdated. And it's quite small in size compared to cuda provider (11mb vs about 400 for cuda), so i guess having it to work out of the box is good.

And to have cuda user will need to install cuda and cudnn anyway.

snnn · 2023-05-23T22:39:23Z

@snnn are the existing tasks in Zip-*-Packaging-Pipeline generates DLLs for cuda and DML?

Yes

js/node/src/session_options_helper.cc

fs-eire · 2023-05-25T19:32:26Z

I think it should be good for the code part. My concern is that we should also update our CI so that the published npm package onnxruntime-node contains necessary DLLs for CUDA/DML EP on windows.

in current onnxruntime-node package, we have those files under /bin/napi-v3/win32/x64/:

onnxruntime.dll
onnxruntime_binding.node
onnxruntime_providers_shared.dll

In my understanding, we need to add the following files as well:

onnxruntime_providers_cuda.dll
DirectML.dll

and not sure if we need a different version of onnxruntime.dll/onnxruntime_providers_shared.dll (is it same for CPU/GPU ?)

snnn · 2023-05-25T19:56:06Z

DML EP is not a pluggable EP. It is in onnxruntime.dll. And it is not compatible with CUDA EP.

fs-eire · 2023-05-25T21:55:30Z

I didn't know what DML EP is not compatible with CUDA EP. Does this mean there is no way to build with both DML and CUDA support? If so, do we need to prepare 2 different onnxruntime.dll files under different folder to support DML and CUDA?

dakenf · 2023-05-26T01:44:10Z

After your conversation I've run some more builds/tests on Windows and WSL2 (and realized that on Win it did not even build correctly because of include paths, sorry)

So here are key things:

onnxruntime.dll(so) must be built with CUDA on Linux and with CUDA+DML on Windows before bundling with NPM package
Only DirectML.dll is required to be bundled with npm package (libonnxruntime_providers_shared.dll/so and libonnxruntime_providers_cuda.dll/so can be safely omitted). CPU provider will work without any errors.
~~onnxruntime must be linked statically on Windows unless a workaround found (will explain below)~~ Found workaround for it.

Now the explanation

If you don't build with CUDA, it won't support it. Same with DML. It can be built with both of them on Win
Lib dependencies table

Use case	Requrements
Windows/Linux CPU provider	Nothing changed and works as expected
Windows DML provider	A fresh version of DirectML.dll required (which will be bundled). Very hard for user to update (you cannot write into System32)
Linux CUDA provider	By default it will throw an error "libonnxruntime_providers_shared.so: cannot open shared object file: No such file or directory", so user must download onnxruntime-linux-gpu libraries and put it somewhere like /usr/local/. User would need to install about 2gb of CUDA/cuDNN libs anyway, so it should not be a big deal. CUDA provider lib is about 400mb for each platform and bundling 2 versions (win and linux) would cost 800mb
Windows CUDA provider	I think it will be a very rare case, but same as above. Need to install CUDA/cuDNN first. Also, onnxruntime-windows-gpu libs should be put in PATH or node_modules\onnxruntime-node\bin\napi-v3\win32\x64

There are options to improve user experience
a. Check if cuda libs are not present and show a user-friendly message with instructions
b. Create a script like npx onnxruntime-node download-cuda-provider that will download one for user's architecture
c. Keep the script from point b (e.g. for Docker builds so it would not need to download) and also check/download when InferenceSession with 'cuda' provider is created

The biggest issue, which I hoped was fixed in the latest code. For some reason, dynamically linked onnxruntime always loads DirectML.dll from system32 (you can check this "GPU - DirectML" setting is broken in the latest release occ-ai/obs-backgroundremoval#272 (comment) ). In summary, we cannot ask user to create a .manifest file for node.js to override dll search paths. And I've tried SetDllDirectory and loading the lib directly from nodejs module dir before initialization but it did not help. So i propose a quite fragile method to link it statically (you can see manual linking in CMakeLists.txt lines 56-91)
It can be more robust if runtime is built without "--build_shared_lib" flag (so extra dependencides won't be required to be linked), but that would require a separate building pipeline for Windows node.js binding. However, if it's already being built in a separate one, then just a flag and linking libs could be changed.

Let me know if you have a workaround for point 3, or if you want me to remove static linking for some testing. Or any other feedback. Thanks

P.S. in the worst case it can work only with CUDA provider until the issue with DirectML.dll loading is resolved, but having DML will simplify user experience on Windows by a magnitude

…ode on Windows

fdwr · 2023-05-26T02:37:09Z

@dakenf:

For some reason, dynamically linked onnxruntime always loads DirectML.dll from system32

🤨 As a quick sanity check, I tried this minimal example with ORT 1.15.0 and DML 1.12.0, and it's definitely loading both onnxruntime.dll and the redist directml.dll (from the build folder rather than the older system32 version).

Though, I have all my DLL's and the .exe in the same directory, and maybe the issue here is that the .exe and plugin .dll's exist in different directories? Maybe LoadLibrary appears to favor the system32 path only because it fails to find DirectML.dll in the .exe path, and the system DirectML.dll is later in the search path. This makes me wonder what most other current customers using DML are doing, because this is the first I've heard of this challenge (unless everybody else just puts all the DLL's into the .exe path too because they have complete control over their distribution and binary paths) 🤔.

dakenf · 2023-05-26T02:43:31Z

🤨 As a quick sanity check, I tried this minimal example with ORT 1.15.0 and DML 1.12.0, and it's definitely loading both onnxruntime.dll and the redist directml.dll (from the build folder rather than the older system32 version).

Yeah, i guess the problem is that node.js interpreter lies somewhere in Program Files and node binding lib is loaded from "node_modules/onnxruntime-node/.." in target JS project directory. Same issue with OBS plugin i've linked. I've tried LoadLibrary and LoadLibraryEx with exact path but it did not help :(

UPD: the node binding loads all libraries itself as they are linked during the build, but i thought if i try to load it manually before making any calls to ONNX runtime, it would force to use already loaded one

snnn · 2023-05-26T03:11:08Z

When nodejs call LoadLibraryEx, did they specify LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR?

dakenf · 2023-05-26T04:04:27Z

@snnn

When nodejs call LoadLibraryEx, did they specify LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR?

It does not use LoadLibraryEx to load, i've just tried to load manually to see if it helps

@fdwr I've found a workaround. Link DirectML with node binding AND call it with some random data before initializing ONNX runtime

#ifdef _WIN32
  // this will load and call DirectML.dll to enforce using version from the binding directory
  const IID MY_IID = { 0x12345678, 0x1234, 0x1234, { 0x12, 0x34, 0x56, 0x78, 0x9A, 0xBC, 0xDE, 0xF0 } };
  DMLCreateDevice1(nullptr, DML_CREATE_DEVICE_FLAG_NONE, DML_FEATURE_LEVEL_4_0, MY_IID, nullptr);
#endif
  Ort::InitApi();

That way it loads DirectML.dll from the node binding folder. And runtime uses the already loaded library even when linked dynamically. I've tried just linking before but as no functions were used, it did not work. Thanks for your time :)

fdwr · 2023-05-26T05:24:55Z

@fdwr I've found a workaround. Link DirectML with node binding AND call it with some random data before initializing ONNX runtime

@dakenf: Does calling LoadLibrary beforehand and holding the module handle not achieve the same? I can see why that work-around works (well, not entirely, because I haven't looked at how DELAYLOAD works under the hood), but passing a bogus GUID 🤨 and a nullptr D3D device will cause debug spew (not that an end user will see it, but any developer will see in the debug output A null D3D12 device was provided to DMLCreateDevice, which is invalid).

dakenf · 2023-05-26T15:32:27Z

@fdwr

yes, you are right. it dit not work before because i assumed

HMODULE hModule = GetModuleHandle(NULL);
GetModuleFileName(hModule, buffer, MAX_PATH);

would return current module path. But it was returning the path to node.js interpreter as it should by design.
So it was loading library from system32

Instead i should have used something like

GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS |
                            GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
                         (LPCSTR) &ExportFunction, &hm);

I guess sometimes it's better to stop and come back with a fresh look.

fdwr

That sounds like a more robust approach. Added some comments, but still deferring to Yulong and Guenther for actual approval, since I do not own/know this code.

js/node/src/directml_load_helper.cc

fdwr

Thanks Arthur. Minor comments - easier than my previous one. (still deferring to owners like Yulong and Guenther for actual sign-off)

js/node/src/directml_load_helper.cc

js/node/src/inference_session_wrap.cc

dakenf · 2023-06-01T21:11:08Z

@fdwr can you also help me resolving this warning? it does not affect anything but just outputs to console when DML provider is used

[W:onnxruntime:, session_state.cc:1169 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
[W:onnxruntime:, session_state.cc:1171 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

Should i change options for ORT memory allocator when using DML provider or just ignore it?

fdwr · 2023-06-01T21:29:14Z

@fdwr can you also help me resolving this warning? it does not affect anything but just outputs to console when DML provider is used

What opset is your .onnx model? That means there is some operator which the DML EP doesn't support, or a newer version of an existing operator. It's a perf concern, because some operators are falling back to the CPU (incurring GPU<->CPU synchronization), but the model will still run. So I wouldn't worry about it for your change, but it would be useful to run onnxruntime_perf_test.exe -v -I -e dml -r 1 yourmodel.onnx to see which ops they are (it's the console output spew).

dakenf · 2023-06-01T21:46:27Z

What opset is your .onnx model?

I've used python converter for StableDiffusion https://huggingface.co/aislamov/stable-diffusion-2-1-base-onnx

perf_test with -v flag returned quite a lot of nodes (i havent included full output, it has lots of pages)

js/node/CMakeLists.txt

Fixed parsing sessionOptions when model passed as a Buffer

dakenf · 2023-08-25T19:29:22Z

I've also fixed an error that prevented sessionOptions from being parsed when model was created with 4 arguments

snnn · 2023-08-25T21:09:12Z

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

snnn · 2023-08-25T21:09:23Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows ARM64 QNN CI Pipeline

azure-pipelines · 2023-08-25T21:09:43Z

Azure Pipelines successfully started running 7 pipeline(s).

azure-pipelines · 2023-08-25T21:09:59Z

Azure Pipelines successfully started running 9 pipeline(s).

dakenf · 2023-08-26T01:03:52Z

Thanks everyone

BTW, @fdwr DML gives quite meaningful error messages
Non-zero status code returned while running MultiHeadAttention node. Name:'MultiHeadAttention' Status Message: C:\Users\me\projects\my\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2448)\onnxruntime.dll!00007FFFA64C38FC: (caller: 00007FFFA64C4A44) Exception(3) tid(8164) 80070057 The parameter is incorrect.

anyway, still better than CUDA
Non-zero status code returned while running MultiHeadAttention node. Name:'MultiHeadAttention' Status Message: packed QKV format is not implemented for current GPU. Please disable it in fusion options.

Actually joking since it's from different test cases and most likely my test cases are not correct

fdwr · 2023-08-26T02:08:13Z

DML gives quite meaningful error messages

@dakenf: 😉 Not really, I agree, but the error messages are much more informative in the ORT debug build if you enable the Direct3D debug layer: Start / Run / dxcpl.exe. e.g.:

I'm not sure how it works within ORT Node, but for a C++ process, you can see the diagnostic output in Visual Studio's output subwindow.

dakenf · 2023-08-26T02:50:11Z

Well, i guess there's a reasonable explaination as when i've got "Windows Internals Book" about 10 years ago with ms action pack (or some kind of other promotion for free, don't exactly remember) i got quite a lot of "ah, that's why"

In ort node you open a cmd and call node something.js. And something.js imports onnxruntime-node/dist/index.js that loads the DLL and makes a JS bridge between c++ and JS code for the library. So most likely there's a way to debug i'm not going to explore :D i've got quite enough with xcode finding why 64bit wasm with threads failed to run

) ### Description The yaml file changes made in #16050 do not really work. Currently the pipeline is failing with error: ``` Error: Not found SourceFolder: C:\a\_work\5\b\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib ``` So, I will revert the yaml changes first to bring the pipeline back. Some people are waiting for our nightly packages. Test run: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=351104&view=results ### Motivation and Context

MountainAndMorning · 2023-09-22T07:19:11Z

Is the cuda support added to the onnxruntime-node? I am using onnxruntime-node v 1.6.0 in the electron. It seems that the cuda and directml providers don't work.

TareHimself · 2023-09-27T03:08:22Z

Same here

MountainAndMorning · 2023-10-09T08:36:27Z

Since the npm package has not been updated yet, the onnxruntime-node still only supports cpu. Is there any progress on the npm package update?@dakenf

gokaybiz · 2023-12-15T22:42:10Z

Still waiting for release!!

wesbos · 2024-01-25T03:47:32Z

Looks like none of the platforms have had a release yet. Any idea when this will happen?

fs-eire · 2024-01-25T06:14:00Z

Looks like none of the platforms have had a release yet. Any idea when this will happen?

I have finished test for DML on windows and still pending on testing CUDA on linux.

Once I get a dev package published I will update it here.

) ### Description I've added changes to support CUDA and DML (only on Windows, on other platforms it will throw an error) ### Motivation and Context It fixes this feature request microsoft#14127 which is tracked here microsoft#14529 I was working on StableDiffusion implementation for node.js and it is very slow on CPU, so GPU support is essential. Here is a working demo with a patched and precompiled version https://github.com/dakenf/stable-diffusion-nodejs ---------

…rosoft#17441) ### Description The yaml file changes made in microsoft#16050 do not really work. Currently the pipeline is failing with error: ``` Error: Not found SourceFolder: C:\a\_work\5\b\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib ``` So, I will revert the yaml changes first to bring the pipeline back. Some people are waiting for our nightly packages. Test run: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=351104&view=results ### Motivation and Context

xenova · 2024-05-06T12:53:11Z

Any updates? :) @fs-eire

fs-eire · 2024-05-06T19:09:02Z

Any updates? :) @fs-eire

Oh I missed this thread. So it should be working now at version 1.17.3-rev.1. Please have a try and let me know if you run into any issue.

) ### Description I've added changes to support CUDA and DML (only on Windows, on other platforms it will throw an error) ### Motivation and Context It fixes this feature request microsoft#14127 which is tracked here microsoft#14529 I was working on StableDiffusion implementation for node.js and it is very slow on CPU, so GPU support is essential. Here is a working demo with a patched and precompiled version https://github.com/dakenf/stable-diffusion-nodejs ---------

dakenf added 2 commits May 23, 2023 19:41

Added DML and CUDA provider support in onnxruntime-node

69574ac

Merge branch 'microsoft:main' into main

9610d4a

Fixed cuda/dml backend registration

856b380

tianleiwu assigned fs-eire May 23, 2023

dakenf mentioned this pull request May 23, 2023

Text-to-image with StableDiffusion and gpu acceleration in node.js xenova/transformers.js#121

Closed

fdwr reviewed May 25, 2023

View reviewed changes

js/node/src/session_options_helper.cc Outdated Show resolved Hide resolved

Fixed windows build, linking onnxruntime statically for onnxruntime-n…

c69af47

…ode on Windows

Loading DirectML.dll directly in onnxruntime-node

9ec654d

Loading DirectML.dll directly in onnxruntime-node

012ae31

fdwr reviewed May 27, 2023

View reviewed changes

js/node/src/directml_load_helper.cc Outdated Show resolved Hide resolved

js/node/src/directml_load_helper.cc Outdated Show resolved Hide resolved

js/node/src/directml_load_helper.cc Outdated Show resolved Hide resolved

Error handling while loading DirectML.dll in onnxruntime-node

88173a0

fdwr reviewed Jun 1, 2023

View reviewed changes

js/node/src/directml_load_helper.cc Outdated Show resolved Hide resolved

js/node/src/directml_load_helper.cc Show resolved Hide resolved

js/node/src/directml_load_helper.cc Outdated Show resolved Hide resolved

js/node/src/directml_load_helper.cc Outdated Show resolved Hide resolved

snnn reviewed Jun 1, 2023

View reviewed changes

js/node/src/inference_session_wrap.cc Show resolved Hide resolved

snnn reviewed Aug 25, 2023

View reviewed changes

js/node/CMakeLists.txt Outdated Show resolved Hide resolved

dakenf added 2 commits August 25, 2023 23:20

More linter fixes, conditional DirectML.dll

b3534cf

Fixed parsing sessionOptions when model passed as a Buffer

Merge branch 'main' into main

a62f7af

fs-eire approved these changes Aug 25, 2023

View reviewed changes

snnn approved these changes Aug 25, 2023

View reviewed changes

snnn merged commit c262879 into microsoft:main Aug 25, 2023
66 checks passed

snnn mentioned this pull request Sep 7, 2023

Revert the yaml file changes in "Nodejs_Packaging_CPU" build job #17441

Merged

dakenf mentioned this pull request Sep 13, 2023

[Bug] type object 'TasksManager' has no attribute '_TASKS_TO_AUTOMODELS' xenova/transformers.js#297

Closed

dakenf mentioned this pull request Nov 2, 2023

Model rename and Model load issue dakenf/diffusers.js#6

Open

xenova mentioned this pull request Mar 14, 2024

[WIP] 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 xenova/transformers.js#545

Draft

18 tasks

Added DML and CUDA provider support in onnxruntime-node #16050

Added DML and CUDA provider support in onnxruntime-node #16050

Conversation

dakenf commented May 23, 2023 • edited

Description

Motivation and Context

dakenf commented May 23, 2023

fs-eire commented May 23, 2023

dakenf commented May 23, 2023 • edited

snnn commented May 23, 2023

fs-eire commented May 25, 2023

snnn commented May 25, 2023

fs-eire commented May 25, 2023

dakenf commented May 26, 2023 • edited

fdwr commented May 26, 2023

dakenf commented May 26, 2023 • edited

snnn commented May 26, 2023

dakenf commented May 26, 2023 • edited

fdwr commented May 26, 2023 • edited

dakenf commented May 26, 2023 • edited

fdwr left a comment

Choose a reason for hiding this comment

fdwr left a comment • edited

Choose a reason for hiding this comment

dakenf commented Jun 1, 2023

fdwr commented Jun 1, 2023 • edited

dakenf commented Jun 1, 2023 • edited

dakenf commented Aug 25, 2023

snnn commented Aug 25, 2023

snnn commented Aug 25, 2023

azure-pipelines bot commented Aug 25, 2023

azure-pipelines bot commented Aug 25, 2023

dakenf commented Aug 26, 2023

fdwr commented Aug 26, 2023 • edited

dakenf commented Aug 26, 2023

MountainAndMorning commented Sep 22, 2023

TareHimself commented Sep 27, 2023

MountainAndMorning commented Oct 9, 2023

gokaybiz commented Dec 15, 2023

wesbos commented Jan 25, 2024

fs-eire commented Jan 25, 2024 • edited

xenova commented May 6, 2024

fs-eire commented May 6, 2024

dakenf commented May 23, 2023 •

edited

dakenf commented May 23, 2023 •

edited

dakenf commented May 26, 2023 •

edited

dakenf commented May 26, 2023 •

edited

dakenf commented May 26, 2023 •

edited

fdwr commented May 26, 2023 •

edited

dakenf commented May 26, 2023 •

edited

fdwr left a comment •

edited

fdwr commented Jun 1, 2023 •

edited

dakenf commented Jun 1, 2023 •

edited

fdwr commented Aug 26, 2023 •

edited

fs-eire commented Jan 25, 2024 •

edited