Adding additional input channels to model after intialization / Converting text2img to mask inpainting to resume training #1619

BenjaminIrwin · 2022-12-08T22:13:50Z

Have scoured the docs for an answer to this, to no avail. Is it possible to add additional input channels to a model after initializing it using .from_pretrained.

For example (taken from your Dreambooth example):

    unet = UNet2DConditionModel.from_pretrained(
        args.pretrained_model_name_or_path,
        subfolder="unet",
        revision=args.revision,
    )

In the code above, if I now wanted to introduce additional input channels to unet and zero-initialize the weights, would this be possible? If so, how would I do this?

Thank you in advance.

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2022-12-11T16:58:28Z

Hey @BenjaminIrwin,

This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:

from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)

It will initialize the models with the pretrained weights except for the input conv weight which now is of size:

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated

and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.

Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.

BenjaminIrwin · 2022-12-14T15:57:46Z

Thanks very much. This is great.

github-actions · 2023-01-08T15:03:22Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten · 2023-02-08T16:00:02Z

For future readers:

The code snippet above can be used to transform a text2image unet to an inpainting unet as asked here: #2280

hamin · 2023-02-15T17:21:23Z

@patrickvonplaten this is exactly the issue i was looking for!

So I forked a popular hugginface space to create a custom dreambooth model trainging it against a couple of new concepts: https://huggingface.co/spaces/multimodalart/dreambooth-training

It's great! I've used it a few times and generated a few v1.5 based custom models!

I thought I could use a custrom trained model based on SDv1.5 and it would work with the inpainting pipeline out of the box...oh how wrong i was :)

I've tried to change my fork to add SDv1.5-inpainting as the base model but no luck debugging the workspace.

And then I saw this issue which if I'm not mistkane...should allow me to use my regular SDv1.5 model and use it for inpainting pipeline? Am I mistaken here?

I used your suggestion

model_path = "mycustommodel"

unet = UNet2DConditionModel.from_pretrained(model_path, torch_dtype=torch.float16, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    model_path, scheduler=scheduler, torch_dtype=torch.float16, unet=unet, safety_checker=None,
)


# For reference what i'm doing to setup inference...some m1 macbook specific stuff here

pipe = pipe.to("mps")

g_cuda = None
# @markdown Can set random seed here for reproducibility.
gen = torch.Generator(device="cpu")
seed = 52362  # @param {type:"number"}
gen.manual_seed(seed)

negative_prompt = ""
num_samples = 1
guidance_scale = 7.5
num_inference_steps = 25
height = 512
width = 512

images = pipe(
    prompt=prompt,
    image=init_image,
    mask_image=mask_image,
    generator=gen,
    height=height,
    width=width,
    negative_prompt=negative_prompt,
    num_images_per_prompt=num_samples,
    num_inference_steps=num_inference_steps,
    guidance_scale=guidance_scale,
).images

So i do that and finally i'm not getting the same unet error i was getting prior to your suggestion. However the inference isn't quite working, with the generated image looking just like some noise.

@patrickvonplaten Am i completely on the wrong path here? Is my only real option to train a new custom model with SDv1.5-inpaiting as the base model?

Thanks in advance!

patrickvonplaten · 2023-02-16T13:44:53Z

You need to fine-tune the text-to-image to learn how to do in-painting. The architecture is slightly different

hamin · 2023-02-16T18:06:48Z

@patrickvonplaten not sure how to do that...interestingly enough i ended up using the popular sd gui https://github.com/AUTOMATIC1111/stable-diffusion-webui

It has a nice UI to merge models. So I merged v1-5 inpainting, my custom model and v1-5-pruned.

Found out about it here https://www.reddit.com/r/StableDiffusion/comments/zyi24j/how_to_turn_any_model_into_an_inpainting_model/

That worked for me. @patrickvonplaten is your code approach essentially doing the same? Still learning a lot about diffusion models so apologies. And of course thank you for all of your work!

satwiksunnam19 · 2023-02-27T07:45:36Z

@hamin @patrickvonplaten I've been trying to do the same stuff as @hamin. Still, after looking at these issues, the approach for converting any model to an inpainting model through the python scripts is always a better thing to do so, @patrickvonplaten, as you mentioned "You need to fine-tune the text-to-image to learn how to do in-painting. The architecture is slightly different", can you say how to do so or any script is available in internet to do so.

I hope you'll revert to me soon,
Regards,
SS

patrickvonplaten · 2023-03-27T18:32:29Z

Opened a PR to improve error handling for the above case btw: #2847

pcuenca · 2023-03-28T07:28:27Z

Hey @BenjaminIrwin,

This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:
from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)
It will initialize the models with the pretrained weights except for the input conv weight which now is of size:
Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated
and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.

Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.

This is great. Maybe this snippet could have a place in the documentation somewhere /cc @yiyixuxu what do you think?

rob-hen · 2023-04-18T14:17:13Z

@patrickvonplaten Was looking for this answer in the documentation. Would be great to have this more prominent in the main doc.

manxiaoyu · 2023-07-28T08:06:41Z

RuntimeError: Given groups=1, weight of size [320, 9, 3, 3], expected input[4, 4, 64, 64] to have 9 channels, but got 4 channels instead

TimAlexander · 2024-01-06T15:23:09Z

@patrickvonplaten

It will initialize the models with the pretrained weights except for the input conv weight which now is of size:
Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
 - conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated
and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.

As you said, they are randomly initialized, meaning the learned kernels in the conv_in layer are gone. Should it not be possible to take the kernels of the pre-trained model and zero-initialize the other ones? Like this e.g.

#load this from a file
conv_in_weights_pretrained = torch.load("conv_in_weights_pretrained.pt") #shape[320,4,3,3]
#set first 4 channels of unet conv_in to pretrained
unet.conv_in.weight.data[:, :4, :, :] = conv_in_weights_pretrained.data
#zero initialize the remaining channels
unet.conv_in.weight.data[:, 4:, :, :] = 0

Thanks

AIPopcorn · 2024-02-17T18:32:09Z

Hey @BenjaminIrwin,
This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:
from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)
It will initialize the models with the pretrained weights except for the input conv weight which now is of size:
Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated
and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.
Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.
This is great. Maybe this snippet could have a place in the documentation somewhere /cc @yiyixuxu what do you think?

Hi All,

I am getting following error - Can someone help please?

File "C:\SDComfyUI\ComfyUI_windows_portable\src\diffusers\src\diffusers\models\modeling_utils.py", line 154, in load_model_dict_into_meta
raise ValueError(
ValueError: Cannot load C:\Users\xxx.cache\huggingface\hub\models--stabilityai--stable-cascade\snapshots\f2a84281d6f8db3c757195dd0c9a38dbdea90bb4\decoder because embedding.1.weight expected shape tensor(..., device='meta', size=(320, 64, 1, 1)), but got torch.Size([320, 16, 1, 1]). If you want to instead overwrite randomly initialized weights, please make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. For more information, see also: #1619 (comment) as an example.

peki12345 · 2024-02-18T03:27:11Z

Hey @BenjaminIrwin,
This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:
from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)
It will initialize the models with the pretrained weights except for the input conv weight which now is of size:
Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated
and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.
Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.
This is great. Maybe this snippet could have a place in the documentation somewhere /cc @yiyixuxu what do you think?
Hi All,

I am getting following error - Can someone help please?

File "C:\SDComfyUI\ComfyUI_windows_portable\src\diffusers\src\diffusers\models\modeling_utils.py", line 154, in load_model_dict_into_meta raise ValueError( ValueError: Cannot load C:\Users\xxx.cache\huggingface\hub\models--stabilityai--stable-cascade\snapshots\f2a84281d6f8db3c757195dd0c9a38dbdea90bb4\decoder because embedding.1.weight expected shape tensor(..., device='meta', size=(320, 64, 1, 1)), but got torch.Size([320, 16, 1, 1]). If you want to instead overwrite randomly initialized weights, please make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. For more information, see also: #1619 (comment) as an example.

I got the same error

Sturmkater · 2024-02-23T16:15:33Z

low_cpu_mem_usage=False and ignore_mismatched_sizes=True

Sorry, but how do I pass them on?

Hey @BenjaminIrwin,

This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:
from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)
It will initialize the models with the pretrained weights except for the input conv weight which now is of size:
Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated
and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.

Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.

Where do I have to run this code and how to pass on tows two parameters?

github-actions bot added the stale Issues that haven't received updates label Jan 8, 2023

github-actions bot closed this as completed Jan 17, 2023

patrickvonplaten changed the title ~~Adding additional input channels to model after intialization~~ Adding additional input channels to model after intialization / Converting text2img to mask inpainting to resume training Feb 8, 2023

patrickvonplaten mentioned this issue Feb 8, 2023

Script for converting stable diffusion to stable diffusion inpainting. #2280

Closed

patrickvonplaten mentioned this issue Mar 7, 2023

About the StableDiffusionInpaintPipeline model tutorial #2552

Closed

patrickvonplaten mentioned this issue Apr 21, 2023

Missing docs about how to fine-tune models #3190

Closed

Zyj1moreT mentioned this issue Jul 27, 2023

can not train lora Yujun-Shi/DragDiffusion#21

Closed

andysingal mentioned this issue Aug 1, 2023

Cannot load ./test_model/unet because conv_in.weight expected shape tensor(..., device='meta', size=(320, 4, 3, 3)) #4410

Closed

Zorgonaute84 mentioned this issue Aug 23, 2023

[bug]: Error while invoking: Failed to load sd-1:main:unet model invoke-ai/InvokeAI#4348

Closed

1 task

alexisrolland mentioned this issue Oct 16, 2023

Inference with custom inpainting model triggers: RuntimeError: mat1 and mat2 must have the same dtype #5399

Closed

xiohulp mentioned this issue Feb 6, 2024

pls advise how to fix this sdbds/Moore-AnimateAnyone-for-windows#6

Open

LaughterOnWater mentioned this issue Feb 15, 2024

Errors regarding torch.Size, "expecting 64, got 16..." Ideas how to resolve? kijai/ComfyUI-DiffusersStableCascade#13

Closed

tom2698 mentioned this issue Feb 16, 2024

ValueError: Cannot load /home/tom/.cache/huggingface/hub/models another-ai/stable_cascade_easy#6

Closed

tianshanxin mentioned this issue Feb 16, 2024

It has been installed according to your requirements, but it still doesn't match. kijai/ComfyUI-DiffusersStableCascade#14

Open

ZerRui mentioned this issue Feb 20, 2024

sh sample/t2v.sh error, Vchitect/Latte#18

Open

CFengFeng mentioned this issue Feb 23, 2024

ValueError: Cannot load ../checkpoints/ootd/ootd_hd/checkpoint-36000 levihsu/OOTDiffusion#34

Closed

This was referenced Mar 5, 2024

where can i download the ootd_dc model? levihsu/OOTDiffusion#17

Closed

dc model inference problem levihsu/OOTDiffusion#90

Closed

xiohulp mentioned this issue Mar 7, 2024

pls advise how to fix this chaojie/ComfyUI-Moore-AnimateAnyone#16

Open

likeatingcake mentioned this issue Mar 28, 2024

Some weights of AutoencoderKL were not initialized from the model checkpoint at /path/to/Latte/t2v_required_models/ and are newly initialized because the shapes did not match: Vchitect/Latte#66

Open

Leo-86 mentioned this issue Mar 31, 2024

down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight expected shape tensor(..., device='meta', size=(320, 1280)), but got torch.Size([320, 768]) chaojie/ComfyUI-AniPortrait#11

Open

meetedlike mentioned this issue Apr 8, 2024

decoder.conv_in.bias expected shape tensor(..., device='meta', size=(64,)), but got torch.Size([512]). taabata/LCM_Inpaint_Outpaint_Comfy#31

Open

This was referenced Apr 13, 2024

PermissionError: [Errno 13] Permission denied chaojie/ComfyUI-AniPortrait#17

Open

pls help to fix this Zejun-Yang/AniPortrait#118

Open

putdanil mentioned this issue May 31, 2024

[0.28.0]: from_single_file doesn't work with custom checkpoints #8359

Closed

judian17 mentioned this issue Jul 6, 2024

Error occurred when executing DownloadAndLoadMimicMotionModel: kijai/ComfyUI-MimicMotionWrapper#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding additional input channels to model after intialization / Converting text2img to mask inpainting to resume training #1619

Adding additional input channels to model after intialization / Converting text2img to mask inpainting to resume training #1619

BenjaminIrwin commented Dec 8, 2022

patrickvonplaten commented Dec 11, 2022

BenjaminIrwin commented Dec 14, 2022

github-actions bot commented Jan 8, 2023

patrickvonplaten commented Feb 8, 2023

hamin commented Feb 15, 2023

patrickvonplaten commented Feb 16, 2023

hamin commented Feb 16, 2023

satwiksunnam19 commented Feb 27, 2023

patrickvonplaten commented Mar 27, 2023

pcuenca commented Mar 28, 2023

rob-hen commented Apr 18, 2023

manxiaoyu commented Jul 28, 2023

TimAlexander commented Jan 6, 2024 •

edited

Loading

AIPopcorn commented Feb 17, 2024

peki12345 commented Feb 18, 2024

Sturmkater commented Feb 23, 2024

Adding additional input channels to model after intialization / Converting text2img to mask inpainting to resume training #1619

Adding additional input channels to model after intialization / Converting text2img to mask inpainting to resume training #1619

Comments

BenjaminIrwin commented Dec 8, 2022

patrickvonplaten commented Dec 11, 2022

BenjaminIrwin commented Dec 14, 2022

github-actions bot commented Jan 8, 2023

patrickvonplaten commented Feb 8, 2023

hamin commented Feb 15, 2023

patrickvonplaten commented Feb 16, 2023

hamin commented Feb 16, 2023

satwiksunnam19 commented Feb 27, 2023

patrickvonplaten commented Mar 27, 2023

pcuenca commented Mar 28, 2023

rob-hen commented Apr 18, 2023

manxiaoyu commented Jul 28, 2023

TimAlexander commented Jan 6, 2024 • edited Loading

AIPopcorn commented Feb 17, 2024

peki12345 commented Feb 18, 2024

Sturmkater commented Feb 23, 2024

TimAlexander commented Jan 6, 2024 •

edited

Loading