Skip to content

StableDiffusionControlNetInpaintPipeline - does not work with 4 channel Unets #6964

@cryptexis

Description

@cryptexis

Describe the bug

Hi,
I have tried to use an example from here https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet (posted below). However, when a model is passed that has a Unet that expects 4 channels the pipeline breaks with the following error

RuntimeError: Given groups=1, weight of size [320, 9, 3, 3], expected input[2, 4, 64, 64] to have 9 channels, but got 4 channels instead

I raise this problem as it seems that in the source code it seems the handling of 4 vs 9 channel unet is handled
https://github.com/huggingface/diffusers/blob/v0.26.2-patch/src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint.py#L1532

Reproduction

# !pip install transformers accelerate
import cv2
from PIL import Image
from diffusers.pipelines.stable_diffusion import safety_checker
from diffusers import StableDiffusionControlNetInpaintPipeline, ControlNetModel, DDIMScheduler, DPMSolverMultistepScheduler
from diffusers.utils import load_image
import numpy as np
import torch


init_image = load_image(
    "https://huggingface.co/datasets/diffusers/test-arrays/resolve/main/stable_diffusion_inpaint/boy.png"
)
init_image = init_image.resize((512, 512))

generator = torch.Generator(device="cpu").manual_seed(1)

mask_image = load_image(
    "https://huggingface.co/datasets/diffusers/test-arrays/resolve/main/stable_diffusion_inpaint/boy_mask.png"
)
mask_image = mask_image.resize((512, 512))


def make_canny_condition(image):
    image = np.array(image)
    image = cv2.Canny(image, 100, 200)
    image = image[:, :, None]
    image = np.concatenate([image, image, image], axis=2)
    image = Image.fromarray(image)
    return image


control_image = make_canny_condition(init_image)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny", torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetInpaintPipeline.from_single_file(
    "./models/majicmixRealistic_v7-inpainting.safetensors", controlnet=controlnet, torch_dtype=torch.float16,
    safety_checker = None,
    requires_safety_checker = False
)

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, algorithm_type = 'sde-dpmsolver++', use_karras_sigmas = True)
pipe.enable_model_cpu_offload()


# generate image
image = pipe(
    "a handsome man with ray-ban sunglasses",
    num_inference_steps=20,
    generator=generator,
    image=init_image,
    mask_image=mask_image,
    control_image=control_image
    
).images[0]

Logs

No response

System Info

  • diffusers version: 0.26.2
  • Platform: Linux-5.15.0-1039-aws-x86_64-with-glibc2.31
  • Python version: 3.11.7
  • PyTorch version (GPU?): 2.1.2+cu121 (True)
  • Huggingface_hub version: 0.20.3
  • Transformers version: 4.37.2
  • Accelerate version: 0.21.0
  • xFormers version: 0.0.23.post1
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

@yiyixuxu @DN6 @sayakpaul @patrickvonplaten

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions