[Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader #8013

Craigacp · 2021-06-10T00:05:57Z

Description:

Refactors the native library loading in Java to allow CUDA to be loaded on demand, fixing #7044. Then expands the shared provider library loading to DNNL, OpenVINO, TensorRT, fixing #6553.

Added a flag to the native library loading to allow users to supply a directory which contains all the native libraries, fixing #8003. This is also the only way to make the shared library providers load from a different place than the jar, as the individual library path specification conflicts with the way that the ONNX Runtime native code loads the shared library providers.

I also slightly refactored the Java cmake bits, and added the --console=plain flag to the gradle executions to stop gradle writing over cmake's output.

Motivation and Context

Why is this change required? What problem does it solve? Re-enables DNNL, OpenVINO and TensorRT in Java by allowing them to be packaged in the jar and dynamically loaded in the same way CUDA is.
If it fixes an open issue, please link to the issue here. Fixes [java] load shared library providers #6553. Fixes Dynamic GPU support #7044. Fixes re-download the jni library everytime restart the program #8003.

Craigacp · 2021-06-10T00:08:52Z

One remaining todo with this PR is to enable OpenVINO, DNNL and TensorRT in the Java testing pipeline (or enable Java in the relevant testing pipelines there). Is there a preference which way round that happens?

Craigacp · 2021-06-16T15:34:51Z

@RyanUnderhill @yuslepukhin

RyanUnderhill · 2021-06-18T07:57:50Z

One remaining todo with this PR is to enable OpenVINO, DNNL and TensorRT in the Java testing pipeline (or enable Java in the relevant testing pipelines there). Is there a preference which way round that happens?

I think it's easier to add it to the provider pipelines (since provider availability can sometimes be mutually exclusive) vs trying to make them all work in one Java pipeline. For example, this isn't a shared provider yet, but I don't expect to be able to have CUDA and ROCM at the same time.

Thank you for cleaning up my original shared library stuff, this definitely makes it more efficient.

Craigacp · 2021-06-19T01:59:00Z

Ok, I've added Java tests for TensorRT, OpenVINO and DNNL, and I added --build_java to the DNNL and OpenVINO pipelines (it was already present on the TensorRT one, though until this PR that wouldn't have done anything different to the CPU version of the Java code).

I'm not sure I've got the pipeline changes right, and I can't easily run them, so I'd appreciate it if someone could look over them and kick off the pipelines.

snnn · 2021-06-21T18:04:02Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline

snnn · 2021-06-21T18:04:13Z

/azp run Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline

snnn · 2021-06-21T18:04:24Z

/azp run Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

azure-pipelines · 2021-06-21T18:04:24Z

Azure Pipelines successfully started running 4 pipeline(s).

azure-pipelines · 2021-06-21T18:04:49Z

Azure Pipelines successfully started running 5 pipeline(s).

azure-pipelines · 2021-06-21T18:04:53Z

Azure Pipelines successfully started running 8 pipeline(s).

Craigacp · 2021-06-21T21:45:54Z

Looks like the TensorRT test was skipped by gradle, I'll double check how it's passing in the flag. OpenVINO failed because it was looking for Java 8. I'm not sure what's setting JAVA_HOME in that docker file. Any ideas?

snnn · 2021-06-21T22:01:22Z

Looks like the TensorRT test was skipped by gradle, I'll double check how it's passing in the flag. OpenVINO failed because it was looking for Java 8. I'm not sure what's setting JAVA_HOME in that docker file. Any ideas?

The openvino build logs says:

update-alternatives: using /usr/lib/jvm/java-11-openjdk-amd64/bin/java to provide /usr/bin/java (java) in auto mode

So it should be able to find java in PATH. Is setting JAVA_HOME still needed?

snnn · 2021-06-21T22:02:12Z

ERROR: JAVA_HOME is set to an invalid directory: /usr/lib/jvm/java-8-openjdk-amd64

Oh, that's strange.

snnn · 2021-06-21T22:03:56Z

I think you need to update or remove this:

https://github.com/microsoft/onnxruntime/blob/master/tools/ci_build/github/linux/run_build.sh#L64

snnn · 2021-06-21T22:05:48Z

@Craigacp I think you need to update this file as well:

https://github.com/microsoft/onnxruntime/blob/master/cmake/onnxruntime_java_unittests.cmake

You may use this GRADLE_ARGS flag:

https://github.com/microsoft/onnxruntime/blob/master/cmake/onnxruntime_java.cmake#L162

Craigacp · 2021-06-22T01:06:06Z

@Craigacp I think you need to update this file as well:

https://github.com/microsoft/onnxruntime/blob/master/cmake/onnxruntime_java_unittests.cmake

You may use this GRADLE_ARGS flag:

https://github.com/microsoft/onnxruntime/blob/master/cmake/onnxruntime_java.cmake#L162

I've already got the ep flags in there as GRADLE_TEST_EP_FLAGS. Weirdly it seems to have run the CUDA test in the TensorRT pipeline, but not the TensorRT test. I've updated the Java test cmake file to make it print the flags, hopefully that will give more insight.

I also removed the JAVA_HOME setting from run_build.sh. Could you kick off the TensorRT, OpenVINO and DNNL pipelines to check it out?

Craigacp · 2021-07-09T13:41:57Z

I've updated this PR to remove the merge conflicts, could someone start the CI pipelines?

Craigacp · 2021-07-16T18:58:28Z

Could someone please start the CI pipelines? @snnn @RyanUnderhill

edgchen1 · 2021-07-16T20:18:43Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

edgchen1 · 2021-07-16T20:19:11Z

/azp run Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline

azure-pipelines · 2021-07-16T20:19:28Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2021-07-16T20:19:48Z

Azure Pipelines successfully started running 8 pipeline(s).

Craigacp · 2021-07-16T20:49:13Z

Looks like I put the argument outside the quote when it should have been inside the quote for OpenVINO. Could you restart that please?

snnn · 2021-07-16T23:17:53Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

snnn · 2021-07-16T23:18:00Z

/azp run Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline

azure-pipelines · 2021-07-16T23:18:36Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2021-07-16T23:18:36Z

Azure Pipelines successfully started running 8 pipeline(s).

Craigacp · 2021-07-19T20:53:34Z

I fixed the test passthrough and checked it against DNNL which is the easiest for me to run. I'd missed that the build.gradle needs to pass through the system properties to the test runner, and that onnxruntime_java_unittests.cmake is only used on Windows.

I still don't understand why the OpenVINO test is failing. Is there someone on your side who understands how that's setup that could have a look at the error? They might understand why the filesystem doesn't appear to be writeable.

snnn · 2021-07-20T00:40:50Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

snnn · 2021-07-20T00:41:05Z

/azp run Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline

azure-pipelines · 2021-07-20T00:41:34Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2021-07-20T00:41:42Z

Azure Pipelines successfully started running 8 pipeline(s).

Craigacp · 2021-07-20T02:47:22Z

Well at least TensorRT is happy now - 7: InferenceTest > testTensorRT() PASSED.

snnn · 2021-07-20T18:44:14Z

I will look the OpenVINO failure. Please give me a few days.

Craigacp · 2021-07-20T19:35:44Z

Thanks. If it's going to be a complicated fix then I'll turn off the OpenVINO test in Java so we can get this merged. I'm planning to update the Java provider interfaces so you can supply the various parameter structs to the CUDA provider etc and this change blocks most of them as it modifies the same files, so I'd appreciate getting this merged in early next week if we can.

snnn · 2021-07-20T19:46:11Z

Good suggestion. Let's merge this PR first.

snnn · 2021-07-20T23:24:32Z

Please help turn off the OpenVINO test for now.

Craigacp · 2021-07-21T01:05:38Z

Done.

snnn · 2021-07-21T03:00:16Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Windows WebAssembly CI Pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

snnn · 2021-07-21T03:00:26Z

/azp run Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline

azure-pipelines · 2021-07-21T03:02:29Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2021-07-21T03:02:30Z

Azure Pipelines successfully started running 8 pipeline(s).

snnn · 2021-07-21T03:40:49Z

Linux GPU CI Pipeline failure is due to an unknown reason that I'm still investigating.

snnn · 2021-07-21T05:32:09Z

tools/ci_build/github/linux/docker/Dockerfile.ubuntu_openvino

@@ -47,4 +48,4 @@ ARG BUILD_USER=onnxruntimedev
 WORKDIR /home/$BUILD_USER
 RUN adduser --gecos 'onnxruntime Build User' --disabled-password $BUILD_USER --uid $BUILD_UID
 RUN adduser $BUILD_USER video


I think this is why the openvino build failed. It created two users! The one used for running CI build doesn't have the necessary permissions. And the one has perm wasn't used for doing anything.

Ah ok. Well that sounds like a problem. To re-enable the java tests I think all that's necessary is to add --build_java to the end of the docker run command in linux-openvino-ci-pipeline.yml, inside the double quotes.

Craigacp requested a review from a team as a code owner June 10, 2021 00:05

Craigacp mentioned this pull request Jun 10, 2021

Dynamic GPU support #7044

Closed

Craigacp force-pushed the java-shared-library-fix branch from 7c4b003 to 2d4cc94 Compare July 9, 2021 13:41

Craigacp added 5 commits July 19, 2021 14:57

Adding Java to the OpenVINO and DNNL pipelines.

527bce4

Fixes for the test pipelines.

ecda5fd

build_java should be inside the quotes in the OpenVINO docker build

348fd8b

Changing how the java provider arguments are collected.

20aa786

Fixing build flag passthrough to the Java tests.

5462429

Craigacp force-pushed the java-shared-library-fix branch from 641f39d to 5462429 Compare July 19, 2021 19:34

snnn self-assigned this Jul 20, 2021

Disabling OpenVINO Java test.

f6da012

snnn approved these changes Jul 21, 2021

View reviewed changes

snnn reviewed Jul 21, 2021

View reviewed changes

snnn merged commit 55b26b6 into microsoft:master Jul 21, 2021

Craigacp deleted the java-shared-library-fix branch July 21, 2021 14:15

snnn mentioned this pull request Aug 26, 2021

while OpenVINO as execution provider with onnxruntime in java and python get error Failed to load library libonnxruntime_providers_openvino.so #8851

Open

guoyu-wang mentioned this pull request Aug 27, 2021

Fix Android java API failure #8865

Merged

[Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader #8013

[Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader #8013

Conversation

Craigacp commented Jun 10, 2021 • edited Loading

Craigacp commented Jun 10, 2021

Craigacp commented Jun 16, 2021

RyanUnderhill commented Jun 18, 2021

Craigacp commented Jun 19, 2021

snnn commented Jun 21, 2021

snnn commented Jun 21, 2021

snnn commented Jun 21, 2021

azure-pipelines bot commented Jun 21, 2021

azure-pipelines bot commented Jun 21, 2021

azure-pipelines bot commented Jun 21, 2021

Craigacp commented Jun 21, 2021 • edited Loading

snnn commented Jun 21, 2021

snnn commented Jun 21, 2021

snnn commented Jun 21, 2021

snnn commented Jun 21, 2021 • edited Loading

Craigacp commented Jun 22, 2021 • edited Loading

Craigacp commented Jul 9, 2021

Craigacp commented Jul 16, 2021

edgchen1 commented Jul 16, 2021

edgchen1 commented Jul 16, 2021

azure-pipelines bot commented Jul 16, 2021

azure-pipelines bot commented Jul 16, 2021

Craigacp commented Jul 16, 2021

snnn commented Jul 16, 2021

snnn commented Jul 16, 2021

azure-pipelines bot commented Jul 16, 2021

azure-pipelines bot commented Jul 16, 2021

Craigacp commented Jul 19, 2021

snnn commented Jul 20, 2021

snnn commented Jul 20, 2021

azure-pipelines bot commented Jul 20, 2021

azure-pipelines bot commented Jul 20, 2021

Craigacp commented Jul 20, 2021

snnn commented Jul 20, 2021

Craigacp commented Jul 20, 2021

snnn commented Jul 20, 2021

snnn commented Jul 20, 2021

Craigacp commented Jul 21, 2021

snnn commented Jul 21, 2021

snnn commented Jul 21, 2021

azure-pipelines bot commented Jul 21, 2021

azure-pipelines bot commented Jul 21, 2021

snnn commented Jul 21, 2021

snnn Jul 21, 2021

Choose a reason for hiding this comment

Craigacp Jul 21, 2021

Choose a reason for hiding this comment

Craigacp commented Jun 10, 2021 •

edited

Loading

Craigacp commented Jun 21, 2021 •

edited

Loading

snnn commented Jun 21, 2021 •

edited

Loading

Craigacp commented Jun 22, 2021 •

edited

Loading