Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable VAE tiling for decoding large images #2920

Closed
wants to merge 2 commits into from
Closed

Conversation

keturn
Copy link
Contributor

@keturn keturn commented Mar 11, 2023

This turns on the VAE Tiling option added in diffusers 0.14. Should help with #2672.

I did succeed in generating an image with it larger than I could otherwise, but with 12 GB, maybe this isn't the target environment for this feature, cuz this is ridiculously outside the size where the images make any sense.

5k × 2k image

A few potential disadvantages:

  • it decides whether to tile based on whether the tensor is big enough that it can tile, instead of whether it needs to tile to fit the budget.
  • it doesn't advertise when it kicks in to effect.
  • no progress indicator.
  • tile size and overlap are not configurable.

Related:

@keturn keturn added the enhancement New feature or request label Mar 11, 2023
@JPPhoto
Copy link
Contributor

JPPhoto commented Mar 11, 2023

What kind of impact does tiling have on image quality? Can you show a difference image?

@JPPhoto
Copy link
Contributor

JPPhoto commented Mar 11, 2023

I would love to see this as a CLI option that could be shoved into invokeai.init as - like xformers - it does produce different output. See the diffusers' PR for an explanation.

@lstein
Copy link
Collaborator

lstein commented Mar 11, 2023

Does this need a configurable option to turn on and off?

I'm beginning to think that we have so many memory-conserving options (xformers, gpu_offload, and this) that we should consolidate them all under a --memory_options argument that takes some sort of list of options to activate.

@keturn
Copy link
Contributor Author

keturn commented Mar 11, 2023

Did a comparison for this image:

original image

Generation Parameters Stable Diffusion 1.5 + the MSE-finetuned VAE.
{
  "model": "stable diffusion",
  "model_weights": "diffusers-1.5+mse",
  "model_hash": "3623e578a0462b0237943d07449d084984faa20fa3361089e28a781d8b3b4fb6",
  "app_id": "invoke-ai/InvokeAI",
  "app_version": "3.0.0+a0",
  "image": {
    "prompt": "chinatown New year festival dance,\na magnificent Chinese dragon. \nphotographed for the Boston Globe, Visura, Dodho.\ncinematic lighting,\ndynamic pose,\nCanon EOS R 85mm f/1.8,\n4k UHD HDR,\nvolumetric lighting, atmospheric scatter. [bland blurred unfocused shitty rushed crappy low-budget over-saturated out-of-gamut]",
    "steps": 50,
    "cfg_scale": 21,
    "threshold": 0,
    "perlin": 0,
    "height": 1280,
    "width": 1536,
    "seed": 2669601462,
    "hires_fix": false,
    "seamless": false,
    "type": "txt2img",
    "postprocessing": null,
    "sampler": "ddim",
    "variations": []
  }
}
Comparison Details

With tiled VAE:
tiled VAE

Diff of the value channel (green is lighter, blue is darker):
value channel

Color distance (CAM16):
color distance

—those diff images are pretty cool in themselves.

We see changes both in the high-frequency details and in broader features like the saturation of the red on the dragons.

It's a significant enough change that I'd be wary of setting it on by default all the time. On the other hand, squeezing such an esoteric option in to the UI to set per-image seems like a lot to ask. It serves a niche case, where you don't have quite enough memory to decode larger image sizes, but you do have enough memory to get them through the U-net, and you do have enough memory to decode default-sized tiles.

Seems like a very tricky thing to wrap UX around unless we are very good at predicting memory costs and availability.

@keturn
Copy link
Contributor Author

keturn commented Mar 12, 2023

@JPPhoto
Copy link
Contributor

JPPhoto commented Mar 12, 2023

Does this need a configurable option to turn on and off?

Yes, IMO, especially since the output is different. Ideally it would be a per-invocation option.

I'm beginning to think that we have so many memory-conserving options (xformers, gpu_offload, and this) that we should consolidate them all under a --memory_options argument that takes some sort of list of options to activate.

I think that's fine as long as it acts as turning on other options and doesn't replace each of them individually. The last thing I want to do is use xformers when it doesn't reliably reproduce the same image when using the same parameters, but I'd be willing to deal with some VAE tiling discrepancies if that option lets me generate some larger things from time to time..

@psychedelicious
Copy link
Collaborator

Regarding the UX for this, we are getting to a place where it is very difficult to present everything that is relevant to the user at once.

What we may need is a UI element that indicates non-default options are selected. Things like seamless, static seed and tiled vae decode would be non-default options. A badge over the icon indicates the number of things that are not default, and hovering over it displays a popover listing what is different.

Maybe it is on the invoke button, so as you go to click it, you are reminded of what you have set.

Each item in the popover could have a reset button to reset it to default.

@psychedelicious
Copy link
Collaborator

We now have tiled decode as toggle on the LatentsToImage node

@keturn keturn deleted the feat/vae_tiling branch August 21, 2023 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants