Add `optimum.quanto` as supported load-time `quantization_config` 

Recent additions to diffusers added `BitsAndBytesConfig` as well as `TorchAoConfig` options that can be used as `quantization_config` when loading model components using `from_pretrained`

for example:
```py
quantization_config = BitsAndBytesConfig(...)
transformer = SD3Transformer2DModel.from_pretrained(repo_id, subfolder="transformer", quantization_config=quantization_config)
```

ask is to also support Huggingface's own [Optimum Quanto](https://github.com/huggingface/optimum-quanto)
right now its possible to use it, but only as post-load on-demand quantization, there is no option to use it like BnB or TorchAO to apply quantization automatically during load itself.

@yiyixuxu @sayakpaul @DN6 @asomoza

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `optimum.quanto` as supported load-time `quantization_config` #10328

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add optimum.quanto as supported load-time quantization_config #10328

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add `optimum.quanto` as supported load-time `quantization_config` #10328