[mps] training / inferencing deepfloyd must be done in float32

### Describe the bug

_**Just to keep track of this issue, because I'm not sure if I've done something wrong or if it's due to the current issues in Pytorch's MPS support.**_

I'm having bad results running inference on Apple MPS with DeepFloyd stage I / II in Diffusers:

# Stage I (400M)

## CPU (float32)

![image](https://github.com/huggingface/diffusers/assets/59658056/7980246a-9d24-47d7-8b8c-fb84b074e650)

## MPS (float32)

![image](https://github.com/huggingface/diffusers/assets/59658056/635a28d2-b5c9-402b-9d12-2f5b3f0380dc)


# Stage II (450M)

## CPU (float32)

![image](https://github.com/huggingface/diffusers/assets/59658056/660f3a35-7ed7-4813-a75a-6498e146723b)

## MPS (float32)

![image](https://github.com/huggingface/diffusers/assets/59658056/492d0c8b-ce70-47e0-9118-9e628085af70)

### Reproduction

```py
from diffusers import DiffusionPipeline
import torch

prompts = {
   'jester': 'a stunning portrait of a jester at the twisted carnival'
}

deepfloyd_lora_path = "ptx0/deepcinema"
deepfloyd_base_model_path = "DeepFloyd/IF-I-M-v1.0"
deepfloyd_stage_two_path = "DeepFloyd/IF-II-M-v1.0"

width = 96
height = 64
torch_device = "cuda" if torch.cuda.is_available() else "cpu" if torch.backends.mps.is_available() else "xpu" if torch.xpu.is_available() else "cpu"

pipe = DiffusionPipeline.from_pretrained(deepfloyd_base_model_path, watermarker=None, safety_checker=None, local_files_only=True).to(device=torch_device, dtype=torch.float32)
lora_pipe = DiffusionPipeline.from_pretrained(deepfloyd_base_model_path, **pipe.components, local_files_only=True).to(device=torch_device, dtype=torch.float32)
lora_pipe.load_lora_weights(deepfloyd_lora_path, weight_name="pytorch_lora_weights.safetensors")
lora_pipe.scheduler = pipe.scheduler.__class__.from_config(pipe.scheduler.config, variance_type="fixed_small")

from diffusers.pipelines import IFSuperResolutionPipeline
stage2_pipe = IFSuperResolutionPipeline.from_pretrained(deepfloyd_stage_two_path, watermarker=None, safety_checker=None, local_files_only=True).to(device=torch_device, dtype=torch.float32)

import os
for shortname, prompt in prompts.items():
    output_dir = f"outputs/{shortname}"
    if os.path.exists(output_dir):
        continue
    os.makedirs(output_dir, exist_ok=True)
    torch.manual_seed(42)
    image_base = pipe(prompt=prompt, width=width, height=height, guidance_scale=5.5, num_inference_steps=30).images[0]
    image_base.save(os.path.join(output_dir, "base.png"))

    torch.manual_seed(42)
    image_lora = lora_pipe(prompt=prompt, width=width, height=height, guidance_scale=5.5, num_inference_steps=30).images[0]
    image_lora.save(os.path.join(output_dir, "base_lora.png"))

    torch.manual_seed(84)
    image_base_2 = stage2_pipe(prompt=prompt, image=image_base, guidance_scale=5.5, num_inference_steps=30, width=width * 4, height = height * 4).images[0]
    image_base_2.save(os.path.join(output_dir, "base_stage2.png"))
    torch.manual_seed(84)
    image_lora_2 = stage2_pipe(prompt=prompt, image=image_lora, guidance_scale=5.5, num_inference_steps=30, width=width * 4, height = height * 4).images[0]
    image_lora_2.save(os.path.join(output_dir, "lora_stage2.png"))
```

I modified the value for if mps is available between 'mps' and 'cpu' manually for this test

### Logs

_No response_

### System Info

- `diffusers` version: 0.27.2
- Platform: macOS-14.4.1-arm64-arm-64bit
- Python version: 3.10.14
- PyTorch version (GPU?): 2.4.0.dev20240421 (False)
- Huggingface_hub version: 0.22.2
- Transformers version: 4.40.0.dev0
- Accelerate version: 0.26.1

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mps] training / inferencing deepfloyd must be done in float32 #7789

Describe the bug

Stage I (400M)

CPU (float32)

MPS (float32)

Stage II (450M)

CPU (float32)

MPS (float32)

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[mps] training / inferencing deepfloyd must be done in float32 #7789

Description

Describe the bug

Stage I (400M)

CPU (float32)

MPS (float32)

Stage II (450M)

CPU (float32)

MPS (float32)

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions