Skip to content

VAE Tiling not supported with SD3 for non power of 2 images? #8788

@Teriks

Description

@Teriks

Describe the bug

VAE tiling works for SD3 with power of 2 images, but for no other alignments.

The mentioned issues with VAE tiling are due to: vae/config.json

Having:

"use_post_quant_conv": false,
"use_quant_conv": false

Which causes the method used here:

And Here:

tile = self.post_quant_conv(tile)

To be None

Perhaps at the moment, the model is simply not entirely compatible with the tiling in AutoEncoderKL, as the state dict does not possess the keys post_quant_conv.bias, quant_conv.weight, post_quant_conv.weight, quant_conv.bias

Is this intended?

Reproduction

import diffusers
import PIL.Image
import os

os.environ['HF_TOKEN'] = 'your token'

cn = diffusers.SD3ControlNetModel.from_pretrained('InstantX/SD3-Controlnet-Canny')

pipe = diffusers.StableDiffusion3ControlNetPipeline.from_pretrained(
    'stabilityai/stable-diffusion-3-medium-diffusers',
    controlnet=cn)

pipe.enable_sequential_cpu_offload()

pipe.vae.enable_tiling()

width = 1376
height = 920

# aligned by 16, but alignment by 64 also fails
output_size = (width-(width % 16), height-(height % 16))

not_pow_2 = PIL.Image.new('RGB', output_size)

args = {
    'guidance_scale': 8.0,
    'num_inference_steps': 30,
    'width': output_size[0],
    'height': output_size[1],
    'control_image': not_pow_2,
    'prompt': 'test prompt'
}

pipe(**args)

Logs

REDACT\venv\Lib\site-packages\diffusers\models\attention_processor.py:1584: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  hidden_states = F.scaled_dot_product_attention(
Traceback (most recent call last):
  File "REDACT\test.py", line 35, in <module>
    pipe(**args)
  File "REDACT\venv\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\diffusers\pipelines\controlnet_sd3\pipeline_stable_diffusion_3_controlnet.py", line 912, in __call__
    control_image = self.vae.encode(control_image).latent_dist.sample()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\diffusers\utils\accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl.py", line 258, in encode
    return self.tiled_encode(x, return_dict=return_dict)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl.py", line 363, in tiled_encode
    tile = self.quant_conv(tile)
           ^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not callable

System Info

Windows

diffusers 0.29.2

Who can help?

@yiyixuxu @sayakpaul @DN6 @asomoza

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions