-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support serialization and deserialization of diffusers
modules
#252
Comments
@sayakpaul yes that would be tremendously helpful if you could submit a pull-request. I had started something myself, but was stuck because:
The second point is very important when you reload the quantized model on a smaller device, and this is how I thought I could load them first individually using |
@dacorvo thanks for welcoming the idea. A A good first step would be to have saving and loading supported for the
This workflow should be relatively easy to integrate. But one step at a time. I will work on the Anything you would like to add here before I start the PR? |
@sayakpaul it is still unclear to me how you will avoid the submodels being loaded in full-precision at least once on the device before being requantized when using |
With
transformer
based models becoming defacto for the diffusion community, I think it makes sense to provide support for saving and loading the quantized models indiffusers
throughoptimum.quanto
.I did a quick PoC. Load the original model:
Then quantize and freeze with FP8:
Finally serialize:
Loading logic shouldn't vary too much from what is here already:
optimum-quanto/optimum/quanto/models/transformers_models.py
Line 143 in 95c079f
If there's interest, I can open a PR soon.
Pinging @SunMarc in case he has any comments.
The text was updated successfully, but these errors were encountered: