-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ml)!: cuda and openvino acceleration #5619
Conversation
Does this acceleration apply to darktable generation with opencl and thumbnails? For large collections it could be Awsome thanks a lot for this |
Deploying with Cloudflare Pages
|
Got some issues with this build.
|
Thanks for testing! It looks like I forgot to change the maxDistance being sent for facial recognition requests. Are you just running on CPU, or are you trying to use with an acceleration device? Could you try disabling facial recognition for now just to see that the other jobs work? |
I'm using CUDA, see below.
I had to revert back the release build (1.90.2) |
That issue should be fixed now. |
0622173
to
7361e69
Compare
Rebased on #5667 so that should be merged first |
561e1ee
to
be91703
Compare
57ac438
to
81c3112
Compare
6dd2415
to
1c81373
Compare
I did some testing with this and both CUDA and OpenVINO work correctly and actually use the GPU. |
4ddf503
to
1eb305e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. There are a few places you need to replace hwaccel.yml
still. Notable in the prepare-release.yml
file as well as links in the docs to the release artifact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Really like the overhaul of the docker hwaccel, so much more consistent and streamlined now. btw, thanks for adding the documentation for ARMNN :-)
I tested the feat/ml-tensorrt branch yesterday: I let the ml container run on my computer (Archlinux, GTX 1060) and adjusted the url for the machine learning server accordingly. @mertalev: Everything worked so far! The face detection step was way faster then on my server CPU (Xeon E3-1220v3). However, the facial recognition part still seemed to be running on the server. Is this not done by the ml container? |
Really happy to hear that! Yes, the clustering is all done on CPU. The face detection outputs are stored in Postgres and queried with a special vector search index. The ML service is designed to be very independent - it doesn't integrate with Postgres, Redis, etc. and has no knowledge about what earlier model outputs were. All of that is orchestrated by immich-microservices. |
I think we can merge this after splitting the documentation into a separate PR that will get merged after the next release. Please feel free to press the green button after doing so! Thank you so much |
Description
Potentially breaking change:
hwaccel.yml
is renamed tohwaccel.transcoding.yml
and the way it's used in the docker-compose is changed. Existing docker-compose / hwaccel.yml setups will continue to work, but if a user who used thehwaccel.yml
file updates their docker-compose.yml , they will need to change to the new format for it to work (or keep the olderextends
section).This PR adds hardware acceleration support for Nvidia and Intel devices through CUDA and OpenVINO. It uses prebuilt onnxruntime packages for these APIs and updates the ML Dockerfile to conditionally target a device based on the
DEVICE
build arg. There is a check at runtime to detect the available execution providers and set them accordingly.Current limitations:
onnxruntime-openvino doesn't currently support Python 3.11, so targeting an OpenVINO build will also target 3.10
The CUDA image is massive, but it can't be helped
Edit: I'm removing TensorRT support as it's slow to load and uses much more RAM than normal CUDA.
How has this been tested?
I ran the CPU, CUDA and OpenVINO variants, confirmed successful responses for each task when querying with Postman, and confirmed the CUDA and OpenVINO variants were running on GPU. For OpenVINO, I tested on Linux, but also included the WSL2 configuration recommended by Intel for the OpenVINO image it uses.
While testing, I also ended up increasing test coverage from 72% to 80%.