-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader #8013
Conversation
One remaining todo with this PR is to enable OpenVINO, DNNL and TensorRT in the Java testing pipeline (or enable Java in the relevant testing pipelines there). Is there a preference which way round that happens? |
I think it's easier to add it to the provider pipelines (since provider availability can sometimes be mutually exclusive) vs trying to make them all work in one Java pipeline. For example, this isn't a shared provider yet, but I don't expect to be able to have CUDA and ROCM at the same time. Thank you for cleaning up my original shared library stuff, this definitely makes it more efficient. |
Ok, I've added Java tests for TensorRT, OpenVINO and DNNL, and I added I'm not sure I've got the pipeline changes right, and I can't easily run them, so I'd appreciate it if someone could look over them and kick off the pipelines. |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline |
/azp run Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline |
/azp run Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
Azure Pipelines successfully started running 4 pipeline(s). |
Azure Pipelines successfully started running 5 pipeline(s). |
Azure Pipelines successfully started running 8 pipeline(s). |
Looks like the TensorRT test was skipped by gradle, I'll double check how it's passing in the flag. OpenVINO failed because it was looking for Java 8. I'm not sure what's setting |
The openvino build logs says:
So it should be able to find java in PATH. Is setting JAVA_HOME still needed? |
Oh, that's strange. |
I think you need to update or remove this: https://github.com/microsoft/onnxruntime/blob/master/tools/ci_build/github/linux/run_build.sh#L64 |
@Craigacp I think you need to update this file as well: https://github.com/microsoft/onnxruntime/blob/master/cmake/onnxruntime_java_unittests.cmake You may use this https://github.com/microsoft/onnxruntime/blob/master/cmake/onnxruntime_java.cmake#L162 |
I've already got the ep flags in there as I also removed the |
7c4b003
to
2d4cc94
Compare
I've updated this PR to remove the merge conflicts, could someone start the CI pipelines? |
Could someone please start the CI pipelines? @snnn @RyanUnderhill |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
/azp run Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 8 pipeline(s). |
Looks like I put the argument outside the quote when it should have been inside the quote for OpenVINO. Could you restart that please? |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
/azp run Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 8 pipeline(s). |
641f39d
to
5462429
Compare
I fixed the test passthrough and checked it against DNNL which is the easiest for me to run. I'd missed that the build.gradle needs to pass through the system properties to the test runner, and that I still don't understand why the OpenVINO test is failing. Is there someone on your side who understands how that's setup that could have a look at the error? They might understand why the filesystem doesn't appear to be writeable. |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
/azp run Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 8 pipeline(s). |
Well at least TensorRT is happy now - |
I will look the OpenVINO failure. Please give me a few days. |
Thanks. If it's going to be a complicated fix then I'll turn off the OpenVINO test in Java so we can get this merged. I'm planning to update the Java provider interfaces so you can supply the various parameter structs to the CUDA provider etc and this change blocks most of them as it modifies the same files, so I'd appreciate getting this merged in early next week if we can. |
Good suggestion. Let's merge this PR first. |
Please help turn off the OpenVINO test for now. |
Done. |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
/azp run Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 8 pipeline(s). |
Linux GPU CI Pipeline failure is due to an unknown reason that I'm still investigating. |
@@ -47,4 +48,4 @@ ARG BUILD_USER=onnxruntimedev | |||
WORKDIR /home/$BUILD_USER | |||
RUN adduser --gecos 'onnxruntime Build User' --disabled-password $BUILD_USER --uid $BUILD_UID | |||
RUN adduser $BUILD_USER video |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is why the openvino build failed. It created two users! The one used for running CI build doesn't have the necessary permissions. And the one has perm wasn't used for doing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. Well that sounds like a problem. To re-enable the java tests I think all that's necessary is to add --build_java
to the end of the docker run command in linux-openvino-ci-pipeline.yml, inside the double quotes.
Description:
Refactors the native library loading in Java to allow CUDA to be loaded on demand, fixing #7044. Then expands the shared provider library loading to DNNL, OpenVINO, TensorRT, fixing #6553.
Added a flag to the native library loading to allow users to supply a directory which contains all the native libraries, fixing #8003. This is also the only way to make the shared library providers load from a different place than the jar, as the individual library path specification conflicts with the way that the ONNX Runtime native code loads the shared library providers.
I also slightly refactored the Java cmake bits, and added the
--console=plain
flag to the gradle executions to stop gradle writing over cmake's output.Motivation and Context