StableDiffusion3Pipeline RuntimeError: c10::Half != c10::BFloat16

### Describe the bug

I tried to use Stable Diffusion 3.5 medium to generate images and copied their official script.

When I tried to run it, I got an error: 
` RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != c10::BFloat16. `

When I changed `torch_dtype=torch.bfloat16` in the script to `torch_dtype=torch.float16`, the script ran correctly.

This is very strange, because there are no torch.Tensor inputs in the script.

### Reproduction

```
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-medium", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
    "A capybara holding a sign that reads Hello World",
    num_inference_steps=40,
    guidance_scale=4.5,
).images[0]
image.save("capybara.png")
```

### Logs

```shell
Traceback (most recent call last):
  File "/path/to/project/try.py", line 7, in <module>
    image = pipe(
            ^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 969, in __call__
    ) = self.encode_prompt(
        ^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 437, in encode_prompt
    prompt_embed, pooled_prompt_embed = self._get_clip_prompt_embeds(
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 325, in _get_clip_prompt_embeds
    prompt_embeds = text_encoder(text_input_ids.to(device), output_hidden_states=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 1490, in forward
    text_embeds = self.text_projection(pooled_output)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/project/.venv/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 117, in forward
    return F.linear(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != c10::BFloat16
```

### System Info

```
- 🤗 Diffusers version: 0.32.2
- Platform: Linux-6.11.0-19-generic-x86_64-with-glibc2.39
- Running on Google Colab?: No
- Python version: 3.11.11
- PyTorch version (GPU?): 2.4.1+cu121 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.29.3
- Transformers version: 4.49.0
- Accelerate version: 1.5.2
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.5.3
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 3090, 24576 MiB
NVIDIA GeForce RTX 3090, 24576 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
```

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StableDiffusion3Pipeline RuntimeError: c10::Half != c10::BFloat16 #11117

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

StableDiffusion3Pipeline RuntimeError: c10::Half != c10::BFloat16 #11117

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions