-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ml): CUDA acceleration and ONNX compilation #2574
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
When the PR is ready, please feel free to request review |
I believer this PR would supersede #2563, correct? |
Sure thing! I'm working on unloading models automatically, should be ready very soon.
Basically yes, since the scope of this PR is wider and makes other improvements. |
Hi @alextran1502, everything seems to work! |
You renamed a bunch of env vars, which is a backwards incompatible change. Can you revert that part? |
The names were getting a bit long 😅, but yes I can revert that. |
Man, I think the direction this is going is great, but the amount of things that is changing is a lot, many of which aren't even related to CUDA support, which is what I thought this PR was for. Specifically, it looks like we've:
I'm not necessarily against any one of these changes, but every single one could also be a separate PR, which would be easy to review, test, and revert if necessary. Putting them all together and it just ends up being a lot. |
Hmm, you do have a point that it's a lot of changes. I might have gotten a bit overzealous. Since this PR makes major changes to how models are loaded and run, I wanted to make it easier to maintain and understand why something is done. But some of the changes aren't necessarily related to CUDA acceleration, just things I noticed and decided to change. I can shrink the PR a bit. From the things you listed, I'll share my thought process with them and bucket them into things that are more directly within scope and things that could be extracted into a separate PR (or just removed). Within scope:
Could be removed:
Haha there are definitely some aspects of the code I'd like to improve, but overall the comments are to give more context about not just what the code does but why it's doing it. The docstrings are more of a formality so I could remove those. I could also add documentation to a README.md or somewhere else instead.
Debatable:
Let me know what you think and which changes you think I should keep for this PR, and which are good but should be part of a different PR. |
I totally get getting carried away. I almost always end up refactoring, cleaning up, and renaming things as I add new features. I'm a big fan of splitting them into separate PRs as much as possible. Ideally, I would recommend making separate PRs for the logging changes, cache / model unloading changes, thumbnail path change, delete object detection job, and new env variables. You can rebase this branch after they're merged and add the HW support last. If that sounds like too much work, I guess we can just limit the scope of this PR and merge it mostly as is. I'd prefer (1) and think the PRs could be tested and merged pretty quickly. Otherwise with (2) we will probably have a bit of back and forth until everything is sorted out and ready to be merged. |
I'm fine with (1) since those changes are easy to extract out into separate PRs. |
Something similar could potentially be done for non-cuda or does this work with AMD gpus too? |
This PR only supports CUDA, so no AMD support yet. But ONNX Runtime supports AMD's ROCm, so it can be added in another PR without too many changes. |
Closing this as it's very stale. This addition can be reintroduced with a new PR that's much smaller. |
@mertalev if you created new PRs, could you please link them here, so people can follow the progress of the feature? :) |
Hi! The current PR for it is #5619. |
Following the great work from #2563, I compiled the models to ONNX and cleaned up the Dockerfile and Python dependencies. I haven't done any serious profiling yet, but it seems to do around 20-25 images/s on a 3080 for image classification and CLIP.
Some things to note:
torch
. This means thattorch
must be imported beforeonnxruntime
when using CUDA.device
build arg tocpu
andcuda
, respectively.ML_MIN_TAG_SCORE
).ML_MIN_FACE_SCORE
).Edit: Now with automatic model unloading! The default is to unload a model if it hasn't been used in 300 seconds, and this number can be overriden with the
ML_MODEL_TTL
env variable. Along with this is a flag to disable eager startup (ML_EAGER_STARTUP
) so models are only loaded when needed.