feat: enable VAE tiling for decoding large images #2920

keturn · 2023-03-11T05:09:06Z

This turns on the VAE Tiling option added in diffusers 0.14. Should help with #2672.

I did succeed in generating an image with it larger than I could otherwise, but with 12 GB, maybe this isn't the target environment for this feature, cuz this is ridiculously outside the size where the images make any sense.

A few potential disadvantages:

it decides whether to tile based on whether the tensor is big enough that it can tile, instead of whether it needs to tile to fit the budget.
it doesn't advertise when it kicks in to effect.
no progress indicator.
tile size and overlap are not configurable.

tiled VAE fails with IndexError huggingface/diffusers#2646

JPPhoto · 2023-03-11T12:49:32Z

What kind of impact does tiling have on image quality? Can you show a difference image?

JPPhoto · 2023-03-11T20:57:23Z

I would love to see this as a CLI option that could be shoved into invokeai.init as - like xformers - it does produce different output. See the diffusers' PR for an explanation.

lstein · 2023-03-11T23:14:52Z

Does this need a configurable option to turn on and off?

I'm beginning to think that we have so many memory-conserving options (xformers, gpu_offload, and this) that we should consolidate them all under a --memory_options argument that takes some sort of list of options to activate.

keturn · 2023-03-11T23:36:47Z

Did a comparison for this image:

Generation Parameters

Stable Diffusion 1.5 + the MSE-finetuned VAE.

{
  "model": "stable diffusion",
  "model_weights": "diffusers-1.5+mse",
  "model_hash": "3623e578a0462b0237943d07449d084984faa20fa3361089e28a781d8b3b4fb6",
  "app_id": "invoke-ai/InvokeAI",
  "app_version": "3.0.0+a0",
  "image": {
    "prompt": "chinatown New year festival dance,\na magnificent Chinese dragon. \nphotographed for the Boston Globe, Visura, Dodho.\ncinematic lighting,\ndynamic pose,\nCanon EOS R 85mm f/1.8,\n4k UHD HDR,\nvolumetric lighting, atmospheric scatter. [bland blurred unfocused shitty rushed crappy low-budget over-saturated out-of-gamut]",
    "steps": 50,
    "cfg_scale": 21,
    "threshold": 0,
    "perlin": 0,
    "height": 1280,
    "width": 1536,
    "seed": 2669601462,
    "hires_fix": false,
    "seamless": false,
    "type": "txt2img",
    "postprocessing": null,
    "sampler": "ddim",
    "variations": []
  }
}

Comparison Details

With tiled VAE:

Diff of the value channel (green is lighter, blue is darker):

Color distance (CAM16):

—those diff images are pretty cool in themselves.

We see changes both in the high-frequency details and in broader features like the saturation of the red on the dragons.

It's a significant enough change that I'd be wary of setting it on by default all the time. On the other hand, squeezing such an esoteric option in to the UI to set per-image seems like a lot to ask. It serves a niche case, where you don't have quite enough memory to decode larger image sizes, but you do have enough memory to get them through the U-net, and you do have enough memory to decode default-sized tiles.

Seems like a very tricky thing to wrap UX around unless we are very good at predicting memory costs and availability.

keturn · 2023-03-12T03:03:44Z

oops, found a bug:

tiled VAE fails with IndexError huggingface/diffusers#2646

JPPhoto · 2023-03-12T05:02:58Z

Does this need a configurable option to turn on and off?

Yes, IMO, especially since the output is different. Ideally it would be a per-invocation option.

I'm beginning to think that we have so many memory-conserving options (xformers, gpu_offload, and this) that we should consolidate them all under a --memory_options argument that takes some sort of list of options to activate.

I think that's fine as long as it acts as turning on other options and doesn't replace each of them individually. The last thing I want to do is use xformers when it doesn't reliably reproduce the same image when using the same parameters, but I'd be willing to deal with some VAE tiling discrepancies if that option lets me generate some larger things from time to time..

psychedelicious · 2023-03-15T04:05:55Z

Regarding the UX for this, we are getting to a place where it is very difficult to present everything that is relevant to the user at once.

What we may need is a UI element that indicates non-default options are selected. Things like seamless, static seed and tiled vae decode would be non-default options. A badge over the icon indicates the number of things that are not default, and hovering over it displays a popover listing what is different.

Maybe it is on the invoke button, so as you go to click it, you are reminded of what you have set.

Each item in the popover could have a reset button to reset it to default.

psychedelicious · 2023-07-18T12:10:29Z

We now have tiled decode as toggle on the LatentsToImage node

feat: enable VAE tiling for decoding large images

84c3d4b

keturn added the enhancement New feature or request label Mar 11, 2023

Merge branch 'main' into feat/vae_tiling

09005df

This was referenced Mar 18, 2023

deps: upgrade to PyTorch 2.0 (replaces xformers) #2962

Closed

[enhancement]: OOM error during VAE decode #2672

Closed

psychedelicious closed this Jul 18, 2023

keturn deleted the feat/vae_tiling branch August 21, 2023 23:48

psychedelicious mentioned this pull request Apr 5, 2024

[bug]: Tiled decoding ruins the image #6144

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable VAE tiling for decoding large images #2920

feat: enable VAE tiling for decoding large images #2920

keturn commented Mar 11, 2023 •

edited

Loading

JPPhoto commented Mar 11, 2023

JPPhoto commented Mar 11, 2023 •

edited

Loading

lstein commented Mar 11, 2023

keturn commented Mar 11, 2023

keturn commented Mar 12, 2023

JPPhoto commented Mar 12, 2023

psychedelicious commented Mar 15, 2023

psychedelicious commented Jul 18, 2023

feat: enable VAE tiling for decoding large images #2920

feat: enable VAE tiling for decoding large images #2920

Conversation

keturn commented Mar 11, 2023 • edited Loading

JPPhoto commented Mar 11, 2023

JPPhoto commented Mar 11, 2023 • edited Loading

lstein commented Mar 11, 2023

keturn commented Mar 11, 2023

keturn commented Mar 12, 2023

JPPhoto commented Mar 12, 2023

psychedelicious commented Mar 15, 2023

psychedelicious commented Jul 18, 2023

keturn commented Mar 11, 2023 •

edited

Loading

JPPhoto commented Mar 11, 2023 •

edited

Loading