Skip to content

non power of 2 sized images with stable cascade fail #7644

@Teriks

Description

@Teriks

Describe the bug

I am not sure if this is a bug, but the documentation does not mention a limitation to power of 2 that I can find.

Generating a non power of 2 sized image fails.

Reproduction

import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

device = "cuda"
num_images_per_prompt = 1

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", variant='bf16',
                                                   torch_dtype=torch.bfloat16)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", variant='bf16',
                                                       torch_dtype=torch.float16)

prompt = "Anthropomorphic cat dressed as a pilot"
negative_prompt = ""

prior.enable_model_cpu_offload()
decoder.enable_model_cpu_offload()

prior_output = prior(
    prompt=prompt,
    height=1024,
    width=680,
    negative_prompt=negative_prompt,
    guidance_scale=4.0,
    num_images_per_prompt=num_images_per_prompt,
    num_inference_steps=20
)

decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings.half(),
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=0.0,
    output_type="pil",
    num_inference_steps=10
).images

Logs

Traceback (most recent call last):
  File "REDACT\test.py", line 28, in <module>
    decoder_output = decoder(
                     ^^^^^^^^
  File "REDACT\venv\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\diffusers\pipelines\stable_cascade\pipeline_stable_cascade.py", line 443, in __call__
    predicted_latents = self.decoder(
                        ^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\diffusers\models\unets\unet_stable_cascade.py", line 605, in forward
    x = self._up_decode(level_outputs, timestep_ratio_embed, clip)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\diffusers\models\unets\unet_stable_cascade.py", line 553, in _up_decode
    x = block(x, skip)
        ^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\diffusers\models\unets\unet_stable_cascade.py", line 74, in forward
    x = self.norm(self.depthwise(x))
                  ^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input type (float) and bias type (struct c10::Half) should be the same

Process finished with exit code 1

System Info

diffusers 0.27.2
torch 2.2.2+cu121

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions