-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable VAE tiling for decoding large images #2920
Conversation
What kind of impact does tiling have on image quality? Can you show a difference image? |
I would love to see this as a CLI option that could be shoved into |
Does this need a configurable option to turn on and off? I'm beginning to think that we have so many memory-conserving options (xformers, gpu_offload, and this) that we should consolidate them all under a |
Did a comparison for this image: Generation ParametersStable Diffusion 1.5 + the MSE-finetuned VAE.{
"model": "stable diffusion",
"model_weights": "diffusers-1.5+mse",
"model_hash": "3623e578a0462b0237943d07449d084984faa20fa3361089e28a781d8b3b4fb6",
"app_id": "invoke-ai/InvokeAI",
"app_version": "3.0.0+a0",
"image": {
"prompt": "chinatown New year festival dance,\na magnificent Chinese dragon. \nphotographed for the Boston Globe, Visura, Dodho.\ncinematic lighting,\ndynamic pose,\nCanon EOS R 85mm f/1.8,\n4k UHD HDR,\nvolumetric lighting, atmospheric scatter. [bland blurred unfocused shitty rushed crappy low-budget over-saturated out-of-gamut]",
"steps": 50,
"cfg_scale": 21,
"threshold": 0,
"perlin": 0,
"height": 1280,
"width": 1536,
"seed": 2669601462,
"hires_fix": false,
"seamless": false,
"type": "txt2img",
"postprocessing": null,
"sampler": "ddim",
"variations": []
}
} —those diff images are pretty cool in themselves. We see changes both in the high-frequency details and in broader features like the saturation of the red on the dragons. It's a significant enough change that I'd be wary of setting it on by default all the time. On the other hand, squeezing such an esoteric option in to the UI to set per-image seems like a lot to ask. It serves a niche case, where you don't have quite enough memory to decode larger image sizes, but you do have enough memory to get them through the U-net, and you do have enough memory to decode default-sized tiles. Seems like a very tricky thing to wrap UX around unless we are very good at predicting memory costs and availability. |
oops, found a bug: |
Yes, IMO, especially since the output is different. Ideally it would be a per-invocation option.
I think that's fine as long as it acts as turning on other options and doesn't replace each of them individually. The last thing I want to do is use xformers when it doesn't reliably reproduce the same image when using the same parameters, but I'd be willing to deal with some VAE tiling discrepancies if that option lets me generate some larger things from time to time.. |
Regarding the UX for this, we are getting to a place where it is very difficult to present everything that is relevant to the user at once. What we may need is a UI element that indicates non-default options are selected. Things like Maybe it is on the invoke button, so as you go to click it, you are reminded of what you have set. Each item in the popover could have a reset button to reset it to default. |
We now have tiled decode as toggle on the LatentsToImage node |
This turns on the VAE Tiling option added in diffusers 0.14. Should help with #2672.
I did succeed in generating an image with it larger than I could otherwise, but with 12 GB, maybe this isn't the target environment for this feature, cuz this is ridiculously outside the size where the images make any sense.
A few potential disadvantages:
Related: