Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert diffusers model weights to the original CompVis ckpt format #672

Closed
apolinario opened this issue Sep 29, 2022 · 21 comments
Closed
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@apolinario
Copy link
Contributor

There is a conversion script that converts the CompVis ckpt to diffusers available here, but the other way around does not exist yet.

As some Stable Diffusion UIs and utilities are built on top of the CompVis codebase, it would be useful to be able to use diffusers tools back on those. It would be great to have a community contribution on this!

@apolinario apolinario added enhancement New feature or request good first issue Good for newcomers labels Sep 29, 2022
@rashmimarganiatgithub
Copy link
Contributor

@patrickvonplaten, can I take this or is someone working on this?

@apolinario
Copy link
Contributor Author

Hi @rashmimarganiatgithub , feel free to take this! Currently there is no one working on this as far as we are aware of!

@devilismyfriend
Copy link

@patrickvonplaten, can I take this or is someone working on this?

Please do :) a lot of folks want to do this due to dreambooth working better on diffusers but the various existing webuis don't use diffusers

@kabachuha
Copy link
Contributor

kabachuha commented Sep 30, 2022

Just reminding for any aspiring devs that there is already .ckpt -> diffusers .bin script in this repo https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py

Someone will just have to reverse all the operations

Upd2: ooops, I was tired and missed the initial comments, sorry

Upd: Reference for another huggingface model .bin -> .ckpt converter
https://github.com/jsksxs360/bin2ckpt/blob/main/convert.py

@AmericanPresidentJimmyCarter
Copy link
Contributor

Upd: Reference for another huggingface model .bin -> .ckpt converter

That looks like it is for a tensorflow checkpoint, not pytorch?

@jachiam
Copy link
Contributor

jachiam commented Oct 2, 2022

https://gist.github.com/jachiam/8a5c0b607e38fcc585168b90c686eb05

I think this works

I got it to do a conversion on my machine and then generated a reasonable image with the converted weights

@jachiam
Copy link
Contributor

jachiam commented Oct 2, 2022

I have now opened a PR to incorporate the conversion script into Diffusers

@Magicalore
Copy link

https://gist.github.com/jachiam/8a5c0b607e38fcc585168b90c686eb05

I think this works

I got it to do a conversion on my machine and then generated a reasonable image with the converted weights

Hey can you actually use this on your PC?

@jachiam
Copy link
Contributor

jachiam commented Oct 3, 2022

https://gist.github.com/jachiam/8a5c0b607e38fcc585168b90c686eb05
I think this works
I got it to do a conversion on my machine and then generated a reasonable image with the converted weights

Hey can you actually use this on your PC?

Yes.

@Magicalore
Copy link

https://gist.github.com/jachiam/8a5c0b607e38fcc585168b90c686eb05
I think this works
I got it to do a conversion on my machine and then generated a reasonable image with the converted weights

Hey can you actually use this on your PC?

Yes.

Sorry my questions should have been: HOW do you use this on your own PC? I have downloaded a trained model from hugging face (plenty of folders inside) and I would like to convert that model into a ckpt file, how can I do this? Thanks

@roiniti
Copy link

roiniti commented Oct 3, 2022

https://gist.github.com/jachiam/8a5c0b607e38fcc585168b90c686eb05
I think this works
I got it to do a conversion on my machine and then generated a reasonable image with the converted weights

Hey can you actually use this on your PC?

Yes.

Sorry my questions should have been: HOW do you use this on your own PC? I have downloaded a trained model from hugging face (plenty of folders inside) and I would like to convert that model into a ckpt file, how can I do this? Thanks

Download the file, download pytorch and
python .\convert_diffusers_to_sd.py --model_path "path to the folder with folders" --checkpoint_path "path to the output file"
The model_path is the folder with the logs, tokenizer, text_encoder... folders and you need to specify the name of the output file with the .ckpt extension (or just rename it later) for example:
python .\convert_diffusers_to_sd.py --model_path .\stable-diffusion-with-diffusers --checkpoint_path .\stable-diffusion.ckpt

@MistApproach
Copy link

Thanks for all the contribution in this topic so far!

I was able to convert my trained model to .ckpt using above script.
However, when trying to load this model using stable-diffusion-webui i get this error:

Loading weights [e02601f3] from /home/xyz/build/git/stable-diffusion-webui/models/Stable-diffusion/model.ckpt
Traceback (most recent call last):
  File "/home/xyz/build/git/stable-diffusion-webui/webui.py", line 77, in <module>
    shared.sd_model = modules.sd_models.load_model()
  File "/home/xyz/build/git/stable-diffusion-webui/modules/sd_models.py", line 147, in load_model
    load_model_weights(sd_model, checkpoint_info.filename, checkpoint_info.hash)
  File "/home/xyz/build/git/stable-diffusion-webui/modules/sd_models.py", line 129, in load_model_weights
    model.load_state_dict(sd, strict=False)
  File "/home/xyz/build/git/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
	size mismatch for cond_stage_model.transformer.text_model.embeddings.token_embedding.weight: copying a param with shape torch.Size([49409, 768]) from checkpoint, the shape in current model is torch.Size([49408, 768]).

@jachiam
Copy link
Contributor

jachiam commented Oct 4, 2022

@MistApproach, would I be right in guessing you did textual inversion for your model, or something like that? It looks like you have one more text embedding than the stable-diffusion-webgui expects. I don't have a fix for this ready to go but if it's a persistent issue and you can walk me through how you made your model, I might understand it and be able to fix.

@MistApproach
Copy link

MistApproach commented Oct 4, 2022

@jachiam, you are absolutely right - i took SD vanilla model and trained new token using diffusers textual_inversion.py. While this works out of the box using StableDiffusionPipeline i now need to glue my model with front-end. Having no prior experience in ML i find the glueing process a bit overwhelming.
I would appreciate any pointers on what would be the best way to achieve what i'm trying to do.

EDIT: I did some more digging and apparently it is not stable-diffusion-webui being picky. The resulting .ckpt throws the very same error when trying txt2img.py script from official CompVis repo...

@NielsRogge
Copy link

NielsRogge commented Oct 4, 2022

@MistApproach the reason you're getting the size mismatch is because the textual inversion method simply adds one addition token to CLIP's text embedding layer. The default embedding matrix consists of 49408 text tokens for which the model learns an embedding (each embedding being a vector of 768 numbers).

So to make sure the checkpoint works in the original CompVis repos, you'll need to update the size of the text encoder in the CompVis repository, something along the lines of model.text_encoder.resize_token_embeddings(49409) (with model being CLIP). After that, you can load the weights. Let me know if this helps.

Closing this issue as the request has been resolved.

@MistApproach
Copy link

@NielsRogge, thanks for the explanation, appreciate it! Here's what i did with my trained model:

model = StableDiffusionPipeline.from_pretrained(model_path)
model.text_encoder.resize_token_embeddings(49409)
model.save_pretrained(output_path)

Then i have converted resulting diffusers model to checkpoint using @jachiam script. Unfortunately, both CompVis and stable-diffusion-webui still throw the very same error.

@CrazyBoyM
Copy link

@MistApproach , Hi, have you slove the problem now?

@MistApproach
Copy link

@CrazyBoyM, unfortunately, no. Because of this i have dropped the idea of using stable-diffusion-webui for my frontend. Wrote a simple app using gradio and diffusers StableDiffusionPipeline under the hood.

@bbecausereasonss
Copy link

Can you please explain how to use that original ckpt > diffusors script? I don't understand the run syntax.

@patrickvonplaten
Copy link
Contributor

Does the following issue maybe help in terms of what command should be run?
#1154

@FurkanGozukara
Copy link

@MistApproach the reason you're getting the size mismatch is because the textual inversion method simply adds one addition token to CLIP's text embedding layer. The default embedding matrix consists of 49408 text tokens for which the model learns an embedding (each embedding being a vector of 768 numbers).

So to make sure the checkpoint works in the original CompVis repos, you'll need to update the size of the text encoder in the CompVis repository, something along the lines of model.text_encoder.resize_token_embeddings(49409) (with model being CLIP). After that, you can load the weights. Let me know if this helps.

Closing this issue as the request has been resolved.

I tested this and getting this error : #1877

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests