Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint conversion script from Diffusers => Stable Diffusion (CompVis) #701

Merged
merged 8 commits into from
Oct 4, 2022

Conversation

jachiam
Copy link
Contributor

@jachiam jachiam commented Oct 2, 2022

I added a script that converts from the Diffusers save format to the Stable Diffusion checkpoint format.

Some notes: it only handles the UNet, the VAE, and the Text Encoder. Nothing else. No optimization state is preserved.

I think it works. I'm not ten million percent sure. It works locally in that after a conversion, all of the keys for the new checkpoint's state_dict do indeed match a corresponding key in a Stable Diffusion model, I can load it, and an image generated using the SD checkpoint appears basically right. I have not tested this beyond generating two images.

This would solve Issue #672.

Looking for advice on what would constitute a stronger test here / make this a passable script.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 2, 2022

The documentation is not available anymore as the PR was closed or merged.

@jachiam
Copy link
Contributor Author

jachiam commented Oct 2, 2022

A few folks in the Stable Diffusion discord are also reporting that this works for them.

@jachiam jachiam changed the title Conversion script Checkpoint conversion script from Diffusers => Stable Diffusion (CompVis) Oct 2, 2022
@jachiam
Copy link
Contributor Author

jachiam commented Oct 2, 2022

Hm, also worth noting it only handles the Stable Diffusion v1-4 architecture, and would not be generally applicable to other architectures.

@hopibel
Copy link

hopibel commented Oct 2, 2022

Looking for advice on what would constitute a stronger test here

The simplest test would be to do a side-by-side comparison of images from an unconverted CompVis model vs the same model converted to diffusers format and back again using your script, all using the same prompt and seed. They should generate identical images.

@jachiam
Copy link
Contributor Author

jachiam commented Oct 3, 2022

Would we consider it acceptable if I just took a Diffusers model, converted to CompVis format, made an image with the CompVis format, converted back to Diffusers format, converted again to CompVis format, and made a comparison image in CompVis format? TBH I mostly did this to avoid having to download an original CompVis format checkpoint file, because an hour into the first 1.5 hour download the internet cut out and the download had to restart. (Yes, I spent ~8 hours coding a script to avoid a 1.5 hours download.)

@hopibel
Copy link

hopibel commented Oct 3, 2022

Should be fine imo. The goal is just to show the conversion is correct by showing it's reversible without affecting its output

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR !From a quick look it looks good, will play around a bit and then we can merge.

setup.py Outdated Show resolved Hide resolved
Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool looks good to me!

@patrickvonplaten
Copy link
Contributor

Thanks @jachiam !

@patrickvonplaten patrickvonplaten merged commit 4ff4d4d into huggingface:main Oct 4, 2022
prathikr pushed a commit to prathikr/diffusers that referenced this pull request Oct 26, 2022
…Vis) (huggingface#701)

* Conversion script

* ran black

* ran isort

* remove unused import

* map location so everything gets loaded onto CPU before conversion

* ran black again

* Update setup.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
@tedliosu
Copy link

tedliosu commented Sep 3, 2023

Thank you so much for the work you've put into making this script, but unfortunately now the script no longer works because the weights under text_encoder, unet, and vae directories are saved using Huggingface's new safetensors file format instead of pytorch's traditional bin file format. Also stable diffusion frontends like AUTOMATIC1111's stable-diffusion-webui now expects a yaml config file for every custom model like the one here, so this script will have to also generate at least one additional file for every custom model generated using dreambooth as well if I'm not mistaken.

Please let us know when you may get around to updating this script so that it works with safetensors files too instead of just the traditional pytorch bin files.

@tedliosu
Copy link

Thank you so much for the work you've put into making this script, but unfortunately now the script no longer works because the weights under text_encoder, unet, and vae directories are saved using Huggingface's new safetensors file format instead of pytorch's traditional bin file format. Also stable diffusion frontends like AUTOMATIC1111's stable-diffusion-webui now expects a yaml config file for every custom model like the one here, so this script will have to also generate at least one additional file for every custom model generated using dreambooth as well if I'm not mistaken.

Please let us know when you may get around to updating this script so that it works with safetensors files too instead of just the traditional pytorch bin files.

My utmost apologies; I did not see this commit which has updated this script so that it works with safetensors files as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants