StableDiffusion3Pipeline with InstantX/SD3.5-Large-IP-Adapter

### Describe the bug

Example comes from the Stable Diffusion 3 documentation:
https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3#image-prompting-with-ip-adapters


Using InstantX/SD3.5-Large-IP-Adapter with StableDiffusion3Pipeline fails because the code expects an ip-adapter.safetensors file, which is not available.

Specifying the weight_name="ip-adapter.bin" resolves the issue but leads to the following runtime error:

RuntimeError: The size of tensor a (333) must match the size of tensor b (4096) at non-singleton dimension 1


### Reproduction

```python

import torch
from PIL import Image

from diffusers import StableDiffusion3Pipeline
from transformers import SiglipVisionModel, SiglipImageProcessor

image_encoder_id = "google/siglip-so400m-patch14-384"
ip_adapter_id = "InstantX/SD3.5-Large-IP-Adapter"

feature_extractor = SiglipImageProcessor.from_pretrained(
    image_encoder_id,
    torch_dtype=torch.float16
)
image_encoder = SiglipVisionModel.from_pretrained(
    image_encoder_id,
    torch_dtype=torch.float16
).to( "cuda")

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    torch_dtype=torch.float16,
    feature_extractor=feature_extractor,
    image_encoder=image_encoder,
).to("cuda")

pipe.load_ip_adapter(ip_adapter_id, weight_name="ip-adapter.bin")

pipe.set_ip_adapter_scale(0.6)

ref_img = Image.open("image.jpg").convert('RGB')

image = pipe(
    width=1024,
    height=1024,
    prompt="a cat",
    negative_prompt="lowres, low quality, worst quality",
    num_inference_steps=24,
    guidance_scale=5.0,
    ip_adapter_image=ref_img
).images[0]

image.save("result.jpg")

```

### Logs

```shell
File "/home/appuser/.local/lib/python3.11/site-packages/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 1060, in __call__
    noise_pred = self.transformer(
                 ^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/appuser/.local/lib/python3.11/site-packages/diffusers/models/transformers/transformer_sd3.py", line 396, in forward
    encoder_hidden_states, hidden_states = block(
                                           ^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/appuser/.local/lib/python3.11/site-packages/diffusers/models/attention.py", line 244, in forward
    encoder_hidden_states = encoder_hidden_states + context_attn_output
                            ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (333) must match the size of tensor b (4096) at non-singleton dimension 1
```

### System Info

base image:
pytorch/pytorch:2.6.0-cuda12.6-cudnn9-runtime

diffusers == 0.33.1
transformers == 4.51.3

### Who can help?

@yiyixuxu 
@sayakpaul 
@stevhliu 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StableDiffusion3Pipeline with InstantX/SD3.5-Large-IP-Adapter #11627

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

StableDiffusion3Pipeline with InstantX/SD3.5-Large-IP-Adapter #11627

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions