-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable more ONNX optimizations #241
Comments
Running this ahead of time breaks the additional network blending, #264, but the memory difference is incredible:
Using the optimization script without the
After copying over a working text encoder, it produces a model that is the same size on disk:
but uses substantially less memory during inference:
|
Using the
Trying to run a normal fp32 model fails with:
|
My take on memory:
I do use some parts of ORT transformers for optimisations, but limited to things that are not targeted at specific hardware (the full optimisation in ORT Transformers is CUDA specific). I would have to recheck ORT transformers to see if more generic ONNX optimisations rather than CUDA optimisations have been added, but if not I can't adopt them. There's just too limited interest in CUDA specific ONNX for Stable Diffusion. |
Neat, thanks. I'm curious to see how low I can get it, and CPU offloading is the next thing I need to work on. Some of the optimizations in ORT are CUDA-specific and/or incompatible with the CPU provider, notably fp16, but they've also been adding a lot of node folding and graph optimization stuff recently. I think those are generic. If I'm reading https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/optimizer.py right, https://huggingface.co/docs/optimum/v1.7.1/en/onnxruntime/usage_guides/optimization has a lot of the node folding as well, and only becomes CUDA-specific at |
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/README.md#model-optimizer
The text was updated successfully, but these errors were encountered: