Support serialization and deserialization of `diffusers` modules #252

sayakpaul · 2024-07-23T06:21:37Z

With transformer based models becoming defacto for the diffusion community, I think it makes sense to provide support for saving and loading the quantized models in diffusers through optimum.quanto.

I did a quick PoC. Load the original model:

from diffusers import PixArtTransformer2DModel

model = PixArtTransformer2DModel.from_pretrained("PixArt-alpha/PixArt-XL-2-1024-MS", subfolder="transformer")

Then quantize and freeze with FP8:

from optimum.quanto import quantize, qfloat8, freeze

quantize(model, qfloat8)
freeze(model)

Finally serialize:

import os
import json
from optimum.quanto.quantize import quantization_map

save_directory = "."
model.save_pretrained(save_directory)
# Save quantization map to be able to reload the model
qmap_name = os.path.join(save_directory, "diffusers.json")
qmap = quantization_map(model)
with open(qmap_name, "w", encoding="utf8") as f:
    json.dump(qmap, f, indent=4)

Loading logic shouldn't vary too much from what is here already:

optimum-quanto/optimum/quanto/models/transformers_models.py

Line 143 in 95c079f

def from_pretrained(cls, model_name_or_path: Union[str, os.PathLike]):

If there's interest, I can open a PR soon.

Pinging @SunMarc in case he has any comments.

The text was updated successfully, but these errors were encountered:

dacorvo · 2024-07-23T08:01:35Z

@sayakpaul yes that would be tremendously helpful if you could submit a pull-request.

I had started something myself, but was stuck because:

not all pipeline models were transformers models,
I did not know how to load the pipeline submodels on the meta device when using DiffusionPipeline.from_pretrained.

The second point is very important when you reload the quantized model on a smaller device, and this is how QuantizedTransformerModel works.

I thought I could load them first individually using QuantizedTransformerModel then pass them to the pipeline on init, but maybe you have a better idea.

sayakpaul · 2024-07-23T08:57:24Z

@dacorvo thanks for welcoming the idea.

A DiffusionPipeline is not an nn.Module. It consists of multiple models that are nn.Modules. We have our own ModelMixin which is similar to PretrainedModel of transformers but not identical.

A good first step would be to have saving and loading supported for the ModelMixin class of diffusers. Once this is done, we can start thinking about how to do that on the pipeline level. The workflow for that would be:

First have the quantized variants of the individual models of a pipeline.
Initialize the appropriate pipeline with the quantized models (StableDiffusionPipeline, for example).
Save the pipeline with save_pretrained() which will save it components i.e., the models and the scheduler (non nn.Module).
While calling from_pretrained() on the pipeline level, for each model, we will just then have to detect if it's a quantized model and if so call the quanto integrations and delegate accordingly.

This workflow should be relatively easy to integrate.

But one step at a time. I will work on the ModelMixin PR and submit for your review here.

Anything you would like to add here before I start the PR?

dacorvo · 2024-07-23T09:36:15Z

@sayakpaul it is still unclear to me how you will avoid the submodels being loaded in full-precision at least once on the device before being requantized when using from-pretrained, but I trust you on this: you're the diffusers expert.

sayakpaul mentioned this issue Jul 24, 2024

feat: support diffusion models. #255

Merged

dacorvo closed this as completed in #255 Jul 25, 2024

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support serialization and deserialization of `diffusers` modules #252

Support serialization and deserialization of `diffusers` modules #252

sayakpaul commented Jul 23, 2024

dacorvo commented Jul 23, 2024 •

edited

Loading

sayakpaul commented Jul 23, 2024

dacorvo commented Jul 23, 2024 •

edited

Loading

Support serialization and deserialization of diffusers modules #252

Support serialization and deserialization of diffusers modules #252

Comments

sayakpaul commented Jul 23, 2024

dacorvo commented Jul 23, 2024 • edited Loading

sayakpaul commented Jul 23, 2024

dacorvo commented Jul 23, 2024 • edited Loading

Support serialization and deserialization of `diffusers` modules #252

Support serialization and deserialization of `diffusers` modules #252

dacorvo commented Jul 23, 2024 •

edited

Loading

dacorvo commented Jul 23, 2024 •

edited

Loading