-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Example comes from the Stable Diffusion 3 documentation:
https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3#image-prompting-with-ip-adapters
Using InstantX/SD3.5-Large-IP-Adapter with StableDiffusion3Pipeline fails because the code expects an ip-adapter.safetensors file, which is not available.
Specifying the weight_name="ip-adapter.bin" resolves the issue but leads to the following runtime error:
RuntimeError: The size of tensor a (333) must match the size of tensor b (4096) at non-singleton dimension 1
Reproduction
import torch
from PIL import Image
from diffusers import StableDiffusion3Pipeline
from transformers import SiglipVisionModel, SiglipImageProcessor
image_encoder_id = "google/siglip-so400m-patch14-384"
ip_adapter_id = "InstantX/SD3.5-Large-IP-Adapter"
feature_extractor = SiglipImageProcessor.from_pretrained(
image_encoder_id,
torch_dtype=torch.float16
)
image_encoder = SiglipVisionModel.from_pretrained(
image_encoder_id,
torch_dtype=torch.float16
).to( "cuda")
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3.5-large",
torch_dtype=torch.float16,
feature_extractor=feature_extractor,
image_encoder=image_encoder,
).to("cuda")
pipe.load_ip_adapter(ip_adapter_id, weight_name="ip-adapter.bin")
pipe.set_ip_adapter_scale(0.6)
ref_img = Image.open("image.jpg").convert('RGB')
image = pipe(
width=1024,
height=1024,
prompt="a cat",
negative_prompt="lowres, low quality, worst quality",
num_inference_steps=24,
guidance_scale=5.0,
ip_adapter_image=ref_img
).images[0]
image.save("result.jpg")Logs
File "/home/appuser/.local/lib/python3.11/site-packages/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 1060, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/appuser/.local/lib/python3.11/site-packages/diffusers/models/transformers/transformer_sd3.py", line 396, in forward
encoder_hidden_states, hidden_states = block(
^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/appuser/.local/lib/python3.11/site-packages/diffusers/models/attention.py", line 244, in forward
encoder_hidden_states = encoder_hidden_states + context_attn_output
~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (333) must match the size of tensor b (4096) at non-singleton dimension 1System Info
base image:
pytorch/pytorch:2.6.0-cuda12.6-cudnn9-runtime
diffusers == 0.33.1
transformers == 4.51.3
Who can help?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working