Contribution: JLL for support for additional platforms #12

stemann · 2022-01-29T16:03:05Z

I've worked on getting a BinaryBuilder set-up for building onnxruntime for all BB-supported platforms: https://github.com/IHPSystems/onnxruntime_jll_builder

The current master only builds for CPU.

Need to figure out how to deploy both CPU-only and the CUDA-dependent libraries (e.g. as two jll's), but there is a WIP branch: https://github.com/IHPSystems/onnxruntime_jll_builder/tree/feature/cuda

My main aim has been to get ONNX Runtime with TensorRT support on Nvidia Jetson (aarch64), but an automated deployment of those binaries will likely require some form of re-packaging of other binaries - which is why the build_tarballs.jl script is in its own repo. and not in Yggdrasil (yet).

The text was updated successfully, but these errors were encountered:

jw3126 · 2022-01-30T18:08:25Z

Wow cool, thanks for the info! Once gpu + Yggdrasil is ready, I would love to use that as a source of onnxruntime binaries.

stemann · 2022-01-31T22:17:46Z

Excellent :-)

WIP: TensorRT in Yggdrasil JuliaPackaging/Yggdrasil#4347

Since Pkg does not (yet) support conditional dependencies, I am thinking it might be better to have separate JLL's for each Execution Provider, and only download them on demand (using lazy artifacts and/or Require.jl)? E.g.:

CUDA
TensorRT
oneDNN
MiGraphX
CoreML

jw3126 · 2022-02-01T08:32:20Z

Awesome! I also think download on demand is the way to go. If this only adds jll packages as dependencies, I think I would go without Requires.jl, just lazy artifacts.

jw3126 · 2022-03-22T08:00:16Z

@stemann thanks again for tackling this, Yggdraasil+jll would be much cleaner than my current approach. Is there any progress on the CUDA onnxruntime?

stemann · 2022-03-23T06:50:12Z

There is - though I have (had to) split my effort between ONNXRuntime and Torch, so the pace has definitely gone down.

The good news is that TensorRT is now registered and available to be used as a dependency - and I’ve just managed to battle CMake into finding CUDA in the BB cross-compilation env. with CUDA_full. So it shouldn’t be too hard to get ONNXRuntime building with CUDA now, cf. Yggdrasil#4554.

jw3126 · 2022-03-23T07:35:17Z

I’ve just managed to battle CMake into finding CUDA in the BB cross-compilation env. with CUDA_full.

Awesome! For me, builds involving CUDA are such a mix of pain and incomprehensible magic. Thanks a lot for working your way through this, you do the community a great favor! I really appreciate it!

stemann · 2022-03-23T07:58:57Z

I’ve just managed to battle CMake into finding CUDA in the BB cross-compilation env. with CUDA_full.

Awesome! For me, builds involving CUDA are such a mix of pain and incomprehensible magic. Thanks a lot for working your way through this, you do the community a great favor! I really appreciate it!

You're welcome :-) CUDA really is a bit of a nightmare. Let's just hope it works at the end :-) I have not yet had time to actually test the JLL's from Julia.

I argued that it should be no big feat to run some neural networks with ONNXRuntime on Julia - with TensorRT - on Jetson boards ... so I'd better make it happen :-)

stemann · 2022-03-23T08:07:43Z

BTW: @vchuravy argued, JuliaPackaging/Yggdrasil#4477 (comment) , that it would be better to go with platform variants than separating e.g. Torch (CPU-only) and Torch with CUDA into separate packages.

I'm following that approach for JuliaPackaging/Yggdrasil#4554 now; so it should probably be done for ONNXRuntime as well. One could imagine a separate ONNXRuntimeTraining JLL with the training stuff (dependent on Torch).

jw3126 · 2022-03-23T08:31:27Z

BTW: @vchuravy argued, JuliaPackaging/Yggdrasil#4477 (comment) , that it would be better to go with platform variants than separating e.g. Torch (CPU-only) and Torch with CUDA into separate packages.

Would this mean that it is impossible to have both a CPU and GPU net in a single julia session?

stemann · 2022-03-23T08:54:07Z

BTW: @vchuravy argued, JuliaPackaging/Yggdrasil#4477 (comment) , that it would be better to go with platform variants than separating e.g. Torch (CPU-only) and Torch with CUDA into separate packages.

Would this mean that it is impossible to have both a CPU and GPU net in a single julia session?

Good point. No, I don't think so - it would just be like using the "onnxruntime-gpu" binary that you have now - the CUDA-platform variant would just include both CPU and CUDA, ROCm-platform variant would include CPU and ROCm etc.

jw3126 · 2022-03-23T09:44:33Z

Ok got it thanks!

stemann · 2022-09-10T11:15:16Z

Cf. #19 for a WIP usage of JLL - with platform selection based on platform augmentation (e.g. for CUDA).

stemann · 2023-10-23T17:00:35Z

@jw3126 @GunnarFarneback Any suggestions for how to handle execution providers in a JLL world, i.e. artifact selection? E.g. from a high-level/user perspective?

I'm sorry that the following got a "bit" vague...

One objective is, of course, cooperation with the CUDA.jl stack, so in the context of CUDA, we should expect the user to use the CUDA_Runtime_jll preferences to define the CUDA version in a LocalPreferences.toml:

[CUDA_Runtime_jll]
version = "11.8"

Then there are two options:

Assume that if CUDA can load, the users wants the CUDA/CUDNN/TensorRT-artifact and fetch this - this would be the platform selection implemented for CUDA-platforms - used by CUDNN - and defined in https://github.com/JuliaPackaging/Yggdrasil/blob/master/platforms/cuda.jl#L13-L80
Or in a more complicated world, the user should still have the option to specify which artifact to fetch, e.g. by specifying an ONNXRuntime_jll platform preference in TOML:

[CUDA_Runtime_jll]
version = "11.8"

[ONNXRuntime_jll]
platform = "cuda"

where the user could choose to get a "cpu" artifact (basic onnxruntime main library), an AMD ROCm artifact, or another "cpu" artifact like XNNPACK or Intel oneDNN (alias DNNL). I.e. that the user should have the option to get another artifact/library, even though the user also has a functional CUDA set-up.

Complication: Most EPs are built into the main onnxruntime library - the only exceptions are TensorRT, Intel oneDNN, Intel OpenVINO, and CANN, which are available as shared libraries: https://onnxruntime.ai/docs/build/eps.html#execution-provider-shared-libraries.
This means that it would make sense to provide most EPs through various platform-variants of the ONNXRuntime_jll artifact(s) - i.e. with the ONNXRuntime_jll artifact being synonymous with the main onnxruntime library - and with some definition of platform (cuda might be one platform, rocm might be another - and then the concept gets quite vague, when one considers "XNNPACK" or "oneDNN" "platforms"...).

TensorRT is likely a special case when it comes to the shared library EPs: It is probably safe to assume that if the user has selected the CUDA-platform artifact, then the user won't mind getting the TensorRT library as well.

jw3126 · 2023-10-24T13:09:42Z

I think the high-level user experience should be like this:

pkg> add ONNXRunTime

julia> import ONNXRunTime as ORT

julia> ORT.load_inference(path, execution_provider=:cuda)
Error: # tell the user to add `CUDA_Runtime_jll` and optionally set preferences for that. 

pkg> add CUDA_Runtime_jll

julia> ORT.load_inference(path, execution_provider=:cuda)
# Now it works

Personally, I don't need other EPs than CUDA and CPU. If we want to support more EPs, that is fine by me, as long as there is somebody who takes responsibility for maintaining that EP.

So I think we should go for the simplest solution that supports CPU + CUDA + whatever you personally need and feel like maintaining.

stemann · 2023-10-24T13:35:57Z

I agree wrt. to the scope - the aim is not to support more than CPU and CUDA/CUDNN and TensorRT at this point.

But even with just CPU+CUDA, the user experience with the JLL would be a little different: The ONNXRuntime_jll CUDA-artifact depends on CUDNN_jll, and TensorRT_jll, and hence CUDA_Runtime_jll, so the user should automatically get CUDA_Runtime_jll if the artifact/platform selection for ONNXRuntime_jll returns the CUDA-artifact.

So my question was more along: How should the artifact/platform selection work?

GunnarFarneback · 2023-10-25T15:09:49Z

I don't have a clear view of what the possibilities are.

Having to add an additional package to make an execution provider available is fine. Having to set preferences is acceptable but a bit of a pain in that you either have to restart Julia or prepare toml files before starting. Having to load large artifacts that you won't use would be highly annoying.

For my work use cases being able to run on either CPU or GPU is important, and better optimizations through TensorRT are highly interesting. Additional execution providers would be mildly interesting, in particular DirectML.

stemann · 2023-10-25T16:51:04Z

Users should definitely not be forced to download or load unneeded artifacts.

Though I still assume, no one using the CUDA EP would object to also getting the TensorRT EP...?

The obvious solution for the shared library EPs, is to put them in separate JLL packages, e.g. ONNXRuntimeProviderOpenVINO_jll (with each shared library EP JLL being dependent on the main ONNXRuntime_jll).

For the EPs where the EP is built into the main onnxruntime library, the view is a bit more murky: Either there would be separate mutually-exclusive JLL's, e.g. ONNXRuntime_jll, ONNXRuntimeProviderDirectML_jll (where both could not be loaded at the same time) - or some slightly bizarre overload of the "platform" concept would have to be used, e.g. to have both an x86_64-w64-mingw32 artifact for ONNXRuntime_jll, but also an "x86_64-w64-mingw32-DirectML" artifact... on the other hand there are already MPI platforms and LLVM platforms...

I think I favor the separate JLLs for separate EPs approach - and maybe at some point, onnxruntime upstream will also move more EPs into shared libraries...(?)

stemann mentioned this issue Feb 6, 2022

Added ONNXRuntime (cpu) JuliaPackaging/Yggdrasil#4369

Merged

stemann mentioned this issue May 4, 2022

[ONNXRuntime] Added builds with CUDA and TensorRT Execution Providers JuliaPackaging/Yggdrasil#4386

Merged

stemann mentioned this issue Oct 15, 2023

Support CUDA on Julia 1.9+ via a package extension. #32

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contribution: JLL for support for additional platforms #12

Contribution: JLL for support for additional platforms #12

stemann commented Jan 29, 2022

jw3126 commented Jan 30, 2022

stemann commented Jan 31, 2022

jw3126 commented Feb 1, 2022

jw3126 commented Mar 22, 2022

stemann commented Mar 23, 2022 •

edited

Loading

jw3126 commented Mar 23, 2022

stemann commented Mar 23, 2022

stemann commented Mar 23, 2022

jw3126 commented Mar 23, 2022

stemann commented Mar 23, 2022

jw3126 commented Mar 23, 2022

stemann commented Sep 10, 2022

stemann commented Oct 23, 2023

jw3126 commented Oct 24, 2023

stemann commented Oct 24, 2023

GunnarFarneback commented Oct 25, 2023

stemann commented Oct 25, 2023 •

edited

Loading

Contribution: JLL for support for additional platforms #12

Contribution: JLL for support for additional platforms #12

Comments

stemann commented Jan 29, 2022

jw3126 commented Jan 30, 2022

stemann commented Jan 31, 2022

jw3126 commented Feb 1, 2022

jw3126 commented Mar 22, 2022

stemann commented Mar 23, 2022 • edited Loading

jw3126 commented Mar 23, 2022

stemann commented Mar 23, 2022

stemann commented Mar 23, 2022

jw3126 commented Mar 23, 2022

stemann commented Mar 23, 2022

jw3126 commented Mar 23, 2022

stemann commented Sep 10, 2022

stemann commented Oct 23, 2023

jw3126 commented Oct 24, 2023

stemann commented Oct 24, 2023

GunnarFarneback commented Oct 25, 2023

stemann commented Oct 25, 2023 • edited Loading

stemann commented Mar 23, 2022 •

edited

Loading

stemann commented Oct 25, 2023 •

edited

Loading