Recent additions to diffusers added BitsAndBytesConfig as well as TorchAoConfig options that can be used as quantization_config when loading model components using from_pretrained
for example:
quantization_config = BitsAndBytesConfig(...)
transformer = SD3Transformer2DModel.from_pretrained(repo_id, subfolder="transformer", quantization_config=quantization_config)
ask is to also support Huggingface's own Optimum Quanto
right now its possible to use it, but only as post-load on-demand quantization, there is no option to use it like BnB or TorchAO to apply quantization automatically during load itself.
@yiyixuxu @sayakpaul @DN6 @asomoza