Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding additional input channels to model after intialization / Converting text2img to mask inpainting to resume training #1619

Closed
BenjaminIrwin opened this issue Dec 8, 2022 · 16 comments
Labels
stale Issues that haven't received updates

Comments

@BenjaminIrwin
Copy link

Have scoured the docs for an answer to this, to no avail. Is it possible to add additional input channels to a model after initializing it using .from_pretrained.

For example (taken from your Dreambooth example):

    unet = UNet2DConditionModel.from_pretrained(
        args.pretrained_model_name_or_path,
        subfolder="unet",
        revision=args.revision,
    )

In the code above, if I now wanted to introduce additional input channels to unet and zero-initialize the weights, would this be possible? If so, how would I do this?

Thank you in advance.

@patrickvonplaten
Copy link
Contributor

Hey @BenjaminIrwin,

This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:

from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)

It will initialize the models with the pretrained weights except for the input conv weight which now is of size:

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated

and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.

Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.

@BenjaminIrwin
Copy link
Author

Thanks very much. This is great.

@github-actions
Copy link

github-actions bot commented Jan 8, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Jan 8, 2023
@patrickvonplaten patrickvonplaten changed the title Adding additional input channels to model after intialization Adding additional input channels to model after intialization / Converting text2img to mask inpainting to resume training Feb 8, 2023
@patrickvonplaten
Copy link
Contributor

For future readers:

The code snippet above can be used to transform a text2image unet to an inpainting unet as asked here: #2280

@hamin
Copy link

hamin commented Feb 15, 2023

@patrickvonplaten this is exactly the issue i was looking for!

So I forked a popular hugginface space to create a custom dreambooth model trainging it against a couple of new concepts: https://huggingface.co/spaces/multimodalart/dreambooth-training

It's great! I've used it a few times and generated a few v1.5 based custom models!

I thought I could use a custrom trained model based on SDv1.5 and it would work with the inpainting pipeline out of the box...oh how wrong i was :)

I've tried to change my fork to add SDv1.5-inpainting as the base model but no luck debugging the workspace.

And then I saw this issue which if I'm not mistkane...should allow me to use my regular SDv1.5 model and use it for inpainting pipeline? Am I mistaken here?

I used your suggestion

model_path = "mycustommodel"

unet = UNet2DConditionModel.from_pretrained(model_path, torch_dtype=torch.float16, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    model_path, scheduler=scheduler, torch_dtype=torch.float16, unet=unet, safety_checker=None,
)


# For reference what i'm doing to setup inference...some m1 macbook specific stuff here

pipe = pipe.to("mps")

g_cuda = None
# @markdown Can set random seed here for reproducibility.
gen = torch.Generator(device="cpu")
seed = 52362  # @param {type:"number"}
gen.manual_seed(seed)

negative_prompt = ""
num_samples = 1
guidance_scale = 7.5
num_inference_steps = 25
height = 512
width = 512

images = pipe(
    prompt=prompt,
    image=init_image,
    mask_image=mask_image,
    generator=gen,
    height=height,
    width=width,
    negative_prompt=negative_prompt,
    num_images_per_prompt=num_samples,
    num_inference_steps=num_inference_steps,
    guidance_scale=guidance_scale,
).images

So i do that and finally i'm not getting the same unet error i was getting prior to your suggestion. However the inference isn't quite working, with the generated image looking just like some noise.
output

@patrickvonplaten Am i completely on the wrong path here? Is my only real option to train a new custom model with SDv1.5-inpaiting as the base model?

Thanks in advance!

@patrickvonplaten
Copy link
Contributor

You need to fine-tune the text-to-image to learn how to do in-painting. The architecture is slightly different

@hamin
Copy link

hamin commented Feb 16, 2023

@patrickvonplaten not sure how to do that...interestingly enough i ended up using the popular sd gui https://github.com/AUTOMATIC1111/stable-diffusion-webui

It has a nice UI to merge models. So I merged v1-5 inpainting, my custom model and v1-5-pruned.

Found out about it here https://www.reddit.com/r/StableDiffusion/comments/zyi24j/how_to_turn_any_model_into_an_inpainting_model/

That worked for me. @patrickvonplaten is your code approach essentially doing the same? Still learning a lot about diffusion models so apologies. And of course thank you for all of your work!

@satwiksunnam19
Copy link

@hamin @patrickvonplaten I've been trying to do the same stuff as @hamin. Still, after looking at these issues, the approach for converting any model to an inpainting model through the python scripts is always a better thing to do so, @patrickvonplaten, as you mentioned "You need to fine-tune the text-to-image to learn how to do in-painting. The architecture is slightly different", can you say how to do so or any script is available in internet to do so.

I hope you'll revert to me soon,
Regards,
SS

@patrickvonplaten
Copy link
Contributor

Opened a PR to improve error handling for the above case btw: #2847

@pcuenca
Copy link
Member

pcuenca commented Mar 28, 2023

Hey @BenjaminIrwin,

This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:

from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)

It will initialize the models with the pretrained weights except for the input conv weight which now is of size:

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated

and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.

Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.

This is great. Maybe this snippet could have a place in the documentation somewhere /cc @yiyixuxu what do you think?

@rob-hen
Copy link

rob-hen commented Apr 18, 2023

@patrickvonplaten Was looking for this answer in the documentation. Would be great to have this more prominent in the main doc.

@manxiaoyu
Copy link

RuntimeError: Given groups=1, weight of size [320, 9, 3, 3], expected input[4, 4, 64, 64] to have 9 channels, but got 4 channels instead

@TimAlexander
Copy link

TimAlexander commented Jan 6, 2024

@patrickvonplaten

It will initialize the models with the pretrained weights except for the input conv weight which now is of size:


Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
 - conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated

and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.

As you said, they are randomly initialized, meaning the learned kernels in the conv_in layer are gone. Should it not be possible to take the kernels of the pre-trained model and zero-initialize the other ones? Like this e.g.

#load this from a file
conv_in_weights_pretrained = torch.load("conv_in_weights_pretrained.pt") #shape[320,4,3,3]
#set first 4 channels of unet conv_in to pretrained
unet.conv_in.weight.data[:, :4, :, :] = conv_in_weights_pretrained.data
#zero initialize the remaining channels
unet.conv_in.weight.data[:, 4:, :, :] = 0

Thanks

@AIPopcorn
Copy link

Hey @BenjaminIrwin,
This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:

from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)

It will initialize the models with the pretrained weights except for the input conv weight which now is of size:

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated

and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.
Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.

This is great. Maybe this snippet could have a place in the documentation somewhere /cc @yiyixuxu what do you think?

Hi All,

I am getting following error - Can someone help please?

File "C:\SDComfyUI\ComfyUI_windows_portable\src\diffusers\src\diffusers\models\modeling_utils.py", line 154, in load_model_dict_into_meta
raise ValueError(
ValueError: Cannot load C:\Users\xxx.cache\huggingface\hub\models--stabilityai--stable-cascade\snapshots\f2a84281d6f8db3c757195dd0c9a38dbdea90bb4\decoder because embedding.1.weight expected shape tensor(..., device='meta', size=(320, 64, 1, 1)), but got torch.Size([320, 16, 1, 1]). If you want to instead overwrite randomly initialized weights, please make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. For more information, see also: #1619 (comment) as an example.

@peki12345
Copy link

Hey @BenjaminIrwin,
This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:

from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)

It will initialize the models with the pretrained weights except for the input conv weight which now is of size:

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated

and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.
Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.

This is great. Maybe this snippet could have a place in the documentation somewhere /cc @yiyixuxu what do you think?

Hi All,

I am getting following error - Can someone help please?

File "C:\SDComfyUI\ComfyUI_windows_portable\src\diffusers\src\diffusers\models\modeling_utils.py", line 154, in load_model_dict_into_meta raise ValueError( ValueError: Cannot load C:\Users\xxx.cache\huggingface\hub\models--stabilityai--stable-cascade\snapshots\f2a84281d6f8db3c757195dd0c9a38dbdea90bb4\decoder because embedding.1.weight expected shape tensor(..., device='meta', size=(320, 64, 1, 1)), but got torch.Size([320, 16, 1, 1]). If you want to instead overwrite randomly initialized weights, please make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. For more information, see also: #1619 (comment) as an example.

I got the same error

@Sturmkater
Copy link

low_cpu_mem_usage=False and ignore_mismatched_sizes=True

Sorry, but how do I pass them on?

Hey @BenjaminIrwin,

This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:

from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)

It will initialize the models with the pretrained weights except for the input conv weight which now is of size:

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated

and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.

Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.

Where do I have to run this code and how to pass on tows two parameters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues that haven't received updates
Projects
None yet
Development

No branches or pull requests