IP-Adapter masking

Continuing the issue from [here](https://github.com/huggingface/diffusers/pull/7226#discussion_r1513786107) about assigning a separate input image to each IP-Adapter without passing a mask. @sayakpaul suspects it's because the images need to have the exact same resolution. cc @yiyixuxu 

Code to reproduce:

```py
from diffusers.utils import load_image
from diffusers import AutoPipelineForText2Image
from PIL import Image
import torch

face_image1 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl1.png")
face_image2 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl2.png")
ip_images = [[face_image1], [face_image2]]

pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name=["ip-adapter-plus-face_sdxl_vit-h.safetensors"] * 2)
pipeline.set_ip_adapter_scale([0.7] * 2)
generator = torch.Generator(device="cpu").manual_seed(0)
num_images = 1

image = pipeline(
    prompt="2 girls",
    ip_adapter_image=ip_images,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
    num_inference_steps=20,
    num_images_per_prompt=num_images,
    generator=generator,
).images[0]
image
```

Error:

```
RuntimeError: mat1 and mat2 shapes cannot be multiplied (514x1664 and 1280x1280)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IP-Adapter masking #7238

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

IP-Adapter masking #7238

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions