feat(ml): CUDA acceleration and ONNX compilation #2574

mertalev · 2023-05-26T07:47:04Z

Following the great work from #2563, I compiled the models to ONNX and cleaned up the Dockerfile and Python dependencies. I haven't done any serious profiling yet, but it seems to do around 20-25 images/s on a 3080 for image classification and CLIP.

Some things to note:

The first launch will be slower than successive ones. This is because each model is first compiled into ONNX and saved into the cache directory, at which point it can be loaded very quickly.
The image size is optimized by using a non-CUDA image and instead relying on the built-in CUDA libraries packaged with torch. This means that torch must be imported before onnxruntime when using CUDA.
CPU and CUDA can be targeted by setting the device build arg to cpu and cuda, respectively.
This PR also removes the object detection model and endpoint per the discussion in [Feature] ML improvements #2524. The default image classification model is also now significantly better.
The score threshold for classification is now an environmental variable (ML_MIN_TAG_SCORE).
The score threshold for facial recognition is now an environmental variable (ML_MIN_FACE_SCORE).
The lack of batching is a performance handicap since it means more overhead. GPU utilization fluctuates instead of being at a consistently high utilization, especially for facial recognition.
The image sizes are about 3.2gb for the CPU version, and about 8.5gb for the CUDA version.
VRAM utilization is still unoptimized. Ideally, one model would be loaded at a time and unloaded from the GPU when the job is complete.

Edit: Now with automatic model unloading! The default is to unload a model if it hasn't been used in 300 seconds, and this number can be overriden with the ML_MODEL_TTL env variable. Along with this is a flag to disable eager startup (ML_EAGER_STARTUP) so models are only loaded when needed.

vercel · 2023-05-26T07:47:08Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
immich	⬜️ Ignored (Inspect)			May 31, 2023 8:24pm

alextran1502 · 2023-05-28T13:18:40Z

When the PR is ready, please feel free to request review

alextran1502 · 2023-05-28T13:20:28Z

I believer this PR would supersede #2563, correct?

mertalev · 2023-05-29T06:24:27Z

Sure thing! I'm working on unloading models automatically, should be ready very soon.

I believer this PR would supersede #2563, correct?

Basically yes, since the scope of this PR is wider and makes other improvements.

mertalev · 2023-05-30T23:17:17Z

Hi @alextran1502, everything seems to work!

bo0tzz · 2023-05-31T07:52:44Z

You renamed a bunch of env vars, which is a backwards incompatible change. Can you revert that part?

mertalev · 2023-05-31T19:56:56Z

The names were getting a bit long 😅, but yes I can revert that.

jrasm91 · 2023-05-31T20:34:20Z

Man, I think the direction this is going is great, but the amount of things that is changing is a lot, many of which aren't even related to CUDA support, which is what I thought this PR was for.

Specifically, it looks like we've:

Added types to the python code
Worked on model unloading
Changed machine learning url endpoints
Changed the environment variable names
Changed how models are instantiated, presumably to enable HW support
Changed how logging works (not using the immich available LOG_LEVEL env variable either)
Added lots of comments, which I'm usually not a fan of, since that often means the code isn't very well written
Added new env variables to control the face score threshold
Defaulted the logger to use colors, which probably is not ideal. Does it support the NO_COLOR env?
Deleted the detect objects job.
Changed the machine learning interface from thumbnailPath to imagePath

I'm not necessarily against any one of these changes, but every single one could also be a separate PR, which would be easy to review, test, and revert if necessary. Putting them all together and it just ends up being a lot.

mertalev · 2023-05-31T21:54:19Z

Hmm, you do have a point that it's a lot of changes. I might have gotten a bit overzealous. Since this PR makes major changes to how models are loaded and run, I wanted to make it easier to maintain and understand why something is done. But some of the changes aren't necessarily related to CUDA acceleration, just things I noticed and decided to change.

I can shrink the PR a bit. From the things you listed, I'll share my thought process with them and bucket them into things that are more directly within scope and things that could be extracted into a separate PR (or just removed).

Within scope:

Changed how models are instantiated, presumably to enable HW support

Namesake of the PR.

Worked on model unloading

Not critical, but it's effectively a memory leak to never unload the models. With GPU support added and VRAM being a limited resource, it's a good time to address it.

Could be removed:

Added types to the python code

This is to make it more explicit what the models are expected to work with and output, but this can be moved to a separate PR, possibly synchronizing it with the API generator.

Changed machine learning url endpoints

This change is because sentence transformers is removed as a dependency, but it's inconsequential to keep it as-is.

Changed how logging works (not using the immich available LOG_LEVEL env variable either)
Defaulted the logger to use colors, which probably is not ideal. Does it support the NO_COLOR env?

I actually didn't know about the LOG_LEVEL and NO_COLOR env variables, it makes sense to use them. This can be refined and made a PR later.

Added lots of comments, which I'm usually not a fan of, since that often means the code isn't very well written

Haha there are definitely some aspects of the code I'd like to improve, but overall the comments are to give more context about not just what the code does but why it's doing it. The docstrings are more of a formality so I could remove those. I could also add documentation to a README.md or somewhere else instead.

Added new env variables to control the face score threshold

I'm not a fan of hardcoded values, but you're totally right that this has nothing to do with the PR topic.

Changed the machine learning interface from thumbnailPath to imagePath

From the perspective of the server, it doesn't care if an image is a thumbnail or not as long as it's an image. Again though, you're right that this isn't necessarily on-topic.

Debatable:

Deleted the detect objects job.

This one is tricky. I could re-add the object detection model, but it seems the direction we're going is to remove it anyway so I'd rather not spend time on getting it to the same standard as the other endpoints.

Let me know what you think and which changes you think I should keep for this PR, and which are good but should be part of a different PR.

jrasm91 · 2023-05-31T22:11:30Z

I totally get getting carried away. I almost always end up refactoring, cleaning up, and renaming things as I add new features. I'm a big fan of splitting them into separate PRs as much as possible.

Ideally, I would recommend making separate PRs for the logging changes, cache / model unloading changes, thumbnail path change, delete object detection job, and new env variables. You can rebase this branch after they're merged and add the HW support last.

If that sounds like too much work, I guess we can just limit the scope of this PR and merge it mostly as is. I'd prefer (1) and think the PRs could be tested and merged pretty quickly. Otherwise with (2) we will probably have a bit of back and forth until everything is sorted out and ready to be merged.

mertalev · 2023-05-31T22:17:49Z

I'm fine with (1) since those changes are easy to extract out into separate PRs.

samip5 · 2023-06-11T07:16:52Z

Something similar could potentially be done for non-cuda or does this work with AMD gpus too?

mertalev · 2023-06-11T07:38:09Z

This PR only supports CUDA, so no AMD support yet. But ONNX Runtime supports AMD's ROCm, so it can be added in another PR without too many changes.

mertalev · 2023-07-19T20:11:34Z

Closing this as it's very stale. This addition can be reintroduced with a new PR that's much smaller.

MrModest · 2024-01-03T20:15:45Z

@mertalev if you created new PRs, could you please link them here, so people can follow the progress of the feature? :)

mertalev · 2024-01-03T20:19:34Z

Hi! The current PR for it is #5619.

alextran1502 force-pushed the main branch from 07e0a11 to e41e0df Compare May 27, 2023 03:17

mertalev added 6 commits May 27, 2023 11:00

cuda support and onnx compilation

10d3b00

disable object detection (keep image classifier)

fd9b884

improved Dockerfile, cleanup

ec556ea

removed caching from dockerfile

ca1b128

arm compatibility

b1af3a4

Merge branch 'immich-app:main' into ml-accel

226aa44

alextran1502 mentioned this pull request May 29, 2023

feat(ml): support cuda acceleration #2563

Closed

mertalev added 4 commits May 30, 2023 18:03

added model ttl, type hints and documentation

2a049ce

Merge branch 'immich-app:main' into ml-accel

4d16745

fixed test

6861f6e

updated dependencies

3d1c207

martabal mentioned this pull request May 31, 2023

feat: prepare for v1.61 (main) imagegenius/docker-immich#97

Merged

reverted env variable names for backward compatibility

4ff3735

pl4nty mentioned this pull request Jul 2, 2023

[BUG] High search latency from ML encode-text and postgres #3076

Closed

3 tasks

mertalev closed this Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ml): CUDA acceleration and ONNX compilation #2574

feat(ml): CUDA acceleration and ONNX compilation #2574

mertalev commented May 26, 2023 •

edited

Loading

vercel bot commented May 26, 2023 •

edited

Loading

alextran1502 commented May 28, 2023

alextran1502 commented May 28, 2023

mertalev commented May 29, 2023

mertalev commented May 30, 2023

bo0tzz commented May 31, 2023

mertalev commented May 31, 2023

jrasm91 commented May 31, 2023

mertalev commented May 31, 2023 •

edited

Loading

jrasm91 commented May 31, 2023

mertalev commented May 31, 2023

samip5 commented Jun 11, 2023

mertalev commented Jun 11, 2023

mertalev commented Jul 19, 2023

MrModest commented Jan 3, 2024

mertalev commented Jan 3, 2024

feat(ml): CUDA acceleration and ONNX compilation #2574

feat(ml): CUDA acceleration and ONNX compilation #2574

Conversation

mertalev commented May 26, 2023 • edited Loading

vercel bot commented May 26, 2023 • edited Loading

alextran1502 commented May 28, 2023

alextran1502 commented May 28, 2023

mertalev commented May 29, 2023

mertalev commented May 30, 2023

bo0tzz commented May 31, 2023

mertalev commented May 31, 2023

jrasm91 commented May 31, 2023

mertalev commented May 31, 2023 • edited Loading

jrasm91 commented May 31, 2023

mertalev commented May 31, 2023

samip5 commented Jun 11, 2023

mertalev commented Jun 11, 2023

mertalev commented Jul 19, 2023

MrModest commented Jan 3, 2024

mertalev commented Jan 3, 2024

mertalev commented May 26, 2023 •

edited

Loading

vercel bot commented May 26, 2023 •

edited

Loading

mertalev commented May 31, 2023 •

edited

Loading