Skip to content

deepfloyd stage 2 crashes with tensor size mismatch when input image size is not divisible by 8 #7842

@bghira

Description

@bghira

Describe the bug

DeepFloyd's upstream code supports 8px-aligned inputs for stage II, which I believe the Diffusers implementation is based upon. However, it seems that for certain sizes, there is some unfortunate interaction between the hidden states and the residual hidden states.

I'm not sure if this is something fundamental to the model - if it is, we probably want to understand the conditions under which this problem occurs and provide an error to the user about an incompatible resolution.

Reproduction

from diffusers import IFSuperResolutionPipeline
import torch
from PIL import Image
import numpy as np

torch.manual_seed(42)

# Configuration for initial image and desired output
initial_width = 86  # Adjusted width to be one-fourth of 344 (approximately)
initial_height = 64  # Adjusted height to be one-fourth of 256

# Initialize your device setting based on availability
torch_device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "xpu" if torch.xpu.is_available() else "cpu"

# Create a dummy image (86x64)
dummy_image = torch.rand((3, initial_height, initial_width), dtype=torch.float32)  # Random noise image
dummy_image = (dummy_image * 255).to(torch.uint8)  # Convert to 8-bit format
dummy_pil_image = Image.fromarray(dummy_image.numpy().transpose(1, 2, 0))  # Convert to PIL image for compatibility
dummy_pil_image.save("dummy_input.png")  # Save the initial dummy image

# Load your stage 2 pipeline
stage2_pipe = IFSuperResolutionPipeline.from_pretrained("DeepFloyd/IF-II-M-v1.0", watermarker=None, safety_checker=None, local_files_only=False).to(device=torch_device, dtype=torch.bfloat16)

# Upscale the dummy image using stage 2 of the pipeline
upscaled_image = stage2_pipe(
    prompt="A simple upscaled image", 
    image=dummy_pil_image, 
    guidance_scale=5.5, 
    num_inference_steps=20, 
    width=344, 
    height=256
).images[0]

upscaled_image.save("upscaled_dummy_output.png")

Logs

0%|                                                                                                                                                                                                                                                                                             | 0/20 [00:00<?, ?it/s]

hidden_states.shape: torch.Size([2, 768, 16, 21])
res_hidden_states.shape: torch.Size([2, 768, 16, 21])
hidden_states.shape: torch.Size([2, 768, 16, 21])
res_hidden_states.shape: torch.Size([2, 768, 16, 21])
hidden_states.shape: torch.Size([2, 768, 16, 21])
res_hidden_states.shape: torch.Size([2, 768, 16, 21])
hidden_states.shape: torch.Size([2, 768, 32, 42])
res_hidden_states.shape: torch.Size([2, 768, 32, 43])

System Info

  • diffusers version: 0.27.2
  • Platform: macOS-14.4.1-arm64-arm-64bit
  • Python version: 3.10.14
  • PyTorch version (GPU?): 2.4.0.dev20240421 (False)
  • Huggingface_hub version: 0.22.2
  • Transformers version: 4.40.0.dev0
  • Accelerate version: 0.26.1

Who can help?

@DN6 @yiyixuxu

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleIssues that haven't received updates

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions