Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support serialization and deserialization of diffusers modules #252

Closed
sayakpaul opened this issue Jul 23, 2024 · 3 comments · Fixed by #255
Closed

Support serialization and deserialization of diffusers modules #252

sayakpaul opened this issue Jul 23, 2024 · 3 comments · Fixed by #255

Comments

@sayakpaul
Copy link
Member

With transformer based models becoming defacto for the diffusion community, I think it makes sense to provide support for saving and loading the quantized models in diffusers through optimum.quanto.

I did a quick PoC. Load the original model:

from diffusers import PixArtTransformer2DModel

model = PixArtTransformer2DModel.from_pretrained("PixArt-alpha/PixArt-XL-2-1024-MS", subfolder="transformer")

Then quantize and freeze with FP8:

from optimum.quanto import quantize, qfloat8, freeze

quantize(model, qfloat8)
freeze(model)

Finally serialize:

import os
import json
from optimum.quanto.quantize import quantization_map

save_directory = "."
model.save_pretrained(save_directory)
# Save quantization map to be able to reload the model
qmap_name = os.path.join(save_directory, "diffusers.json")
qmap = quantization_map(model)
with open(qmap_name, "w", encoding="utf8") as f:
    json.dump(qmap, f, indent=4)

Loading logic shouldn't vary too much from what is here already:

def from_pretrained(cls, model_name_or_path: Union[str, os.PathLike]):

If there's interest, I can open a PR soon.

Pinging @SunMarc in case he has any comments.

@dacorvo
Copy link
Collaborator

dacorvo commented Jul 23, 2024

@sayakpaul yes that would be tremendously helpful if you could submit a pull-request.

I had started something myself, but was stuck because:

  • not all pipeline models were transformers models,
  • I did not know how to load the pipeline submodels on the meta device when using DiffusionPipeline.from_pretrained.

The second point is very important when you reload the quantized model on a smaller device, and this is how QuantizedTransformerModel works.

I thought I could load them first individually using QuantizedTransformerModel then pass them to the pipeline on init, but maybe you have a better idea.

@sayakpaul
Copy link
Member Author

@dacorvo thanks for welcoming the idea.

A DiffusionPipeline is not an nn.Module. It consists of multiple models that are nn.Modules. We have our own ModelMixin which is similar to PretrainedModel of transformers but not identical.

A good first step would be to have saving and loading supported for the ModelMixin class of diffusers. Once this is done, we can start thinking about how to do that on the pipeline level. The workflow for that would be:

  • First have the quantized variants of the individual models of a pipeline.
  • Initialize the appropriate pipeline with the quantized models (StableDiffusionPipeline, for example).
  • Save the pipeline with save_pretrained() which will save it components i.e., the models and the scheduler (non nn.Module).
  • While calling from_pretrained() on the pipeline level, for each model, we will just then have to detect if it's a quantized model and if so call the quanto integrations and delegate accordingly.

This workflow should be relatively easy to integrate.

But one step at a time. I will work on the ModelMixin PR and submit for your review here.

Anything you would like to add here before I start the PR?

@dacorvo
Copy link
Collaborator

dacorvo commented Jul 23, 2024

@sayakpaul it is still unclear to me how you will avoid the submodels being loaded in full-precision at least once on the device before being requantized when using from-pretrained, but I trust you on this: you're the diffusers expert.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants