StableDiffusionControlNetInpaintPipeline - does not work with 4 channel Unets

### Describe the bug

Hi,
I have tried to use an example from here `https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet` (posted below). However, when a model is passed that has a Unet that expects 4 channels the pipeline breaks with the following error 

```
RuntimeError: Given groups=1, weight of size [320, 9, 3, 3], expected input[2, 4, 64, 64] to have 9 channels, but got 4 channels instead
```
I raise this problem as it seems that in the source code it seems the handling of 4 vs 9 channel unet is handled
https://github.com/huggingface/diffusers/blob/v0.26.2-patch/src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint.py#L1532


### Reproduction

```
# !pip install transformers accelerate
import cv2
from PIL import Image
from diffusers.pipelines.stable_diffusion import safety_checker
from diffusers import StableDiffusionControlNetInpaintPipeline, ControlNetModel, DDIMScheduler, DPMSolverMultistepScheduler
from diffusers.utils import load_image
import numpy as np
import torch


init_image = load_image(
    "https://huggingface.co/datasets/diffusers/test-arrays/resolve/main/stable_diffusion_inpaint/boy.png"
)
init_image = init_image.resize((512, 512))

generator = torch.Generator(device="cpu").manual_seed(1)

mask_image = load_image(
    "https://huggingface.co/datasets/diffusers/test-arrays/resolve/main/stable_diffusion_inpaint/boy_mask.png"
)
mask_image = mask_image.resize((512, 512))


def make_canny_condition(image):
    image = np.array(image)
    image = cv2.Canny(image, 100, 200)
    image = image[:, :, None]
    image = np.concatenate([image, image, image], axis=2)
    image = Image.fromarray(image)
    return image


control_image = make_canny_condition(init_image)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny", torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetInpaintPipeline.from_single_file(
    "./models/majicmixRealistic_v7-inpainting.safetensors", controlnet=controlnet, torch_dtype=torch.float16,
    safety_checker = None,
    requires_safety_checker = False
)

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, algorithm_type = 'sde-dpmsolver++', use_karras_sigmas = True)
pipe.enable_model_cpu_offload()


# generate image
image = pipe(
    "a handsome man with ray-ban sunglasses",
    num_inference_steps=20,
    generator=generator,
    image=init_image,
    mask_image=mask_image,
    control_image=control_image
    
).images[0]
```

### Logs

_No response_

### System Info

- `diffusers` version: 0.26.2
- Platform: Linux-5.15.0-1039-aws-x86_64-with-glibc2.31
- Python version: 3.11.7
- PyTorch version (GPU?): 2.1.2+cu121 (True)
- Huggingface_hub version: 0.20.3
- Transformers version: 4.37.2
- Accelerate version: 0.21.0
- xFormers version: 0.0.23.post1
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

### Who can help?

 @yiyixuxu @DN6 @sayakpaul @patrickvonplaten

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StableDiffusionControlNetInpaintPipeline - does not work with 4 channel Unets #6964

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

StableDiffusionControlNetInpaintPipeline - does not work with 4 channel Unets #6964

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions