Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM #9

Closed
zcdliuwei opened this issue May 17, 2023 · 4 comments
Closed

OOM #9

zcdliuwei opened this issue May 17, 2023 · 4 comments

Comments

@zcdliuwei
Copy link

Amazing project!!

I used a 1024 * 800 image and executed the following command:
python sr_val_ddpm_text_T_vqganfin_oldcanvas.py --ckpt ckpt/stablesr_000117.ckpt --vqgan_ckpt ckpt/vqgan_cfw_00011.ckpt --init-img inputs/test_example/ --outdir output --ddpm_steps 200 --dec_w 0.5

By default, I hope to obtain an output with a resolution of 4K, but I got:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 55.62 GiB (GPU 0; 79.19 GiB total capacity; 31.25 GiB already allocated; 14.34 GiB free; 39.41 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

My xformers have been installed correctly:
image

and this is my test image:
lowres

@IceClear
Copy link
Owner

IceClear commented May 17, 2023

Hi, thanks for your interest.
Large-resolution results require huge GPU memory since for sr_val_ddpm_text_T_vqganfin_oldcanvas.py, it decodes the whole latent codes together for the final outputs. This can avoid border artifacts but 32G memory only can host 2k resolution at most.
For your case, you just need to turn to use sr_val_ddpm_text_T_vqganfin_oldcanvas_tile.py which use the chop operation to generate image part by part for memory saving, but may lead to border artifacts. Actually, we also use this file for generating the SR result of this image.

@zcdliuwei
Copy link
Author

Yes
when I switch to use sr_val_ddpm_text_T_vqganfin_oldcanvas_tile.py,I got the expected output, it was perfect,and almost no boundary artifacts can be seen!!

Although I don't have many test cases, I believe that using this script can yield reasonable super-resolution results in the vast majority of cases, with almost no boundary artifacts

The only problem now is that the inference time is too long. The 4K example above took more than an hour in total,is there any optimization space in the inference time, or if I mistake used your script?
look forward to your reply

@IceClear
Copy link
Owner

IceClear commented May 18, 2023

The inference time can be very long for large resolutions and sometimes we observe boundary artifacts, it depends on the content.
Currently, we have not involved any inference acceleration.
I used to try DDIM but it leads to weird results so I gave up. But other accelerated technologies may work.
Under the current case, I think you can decrease the sampling steps. It is 200 by default but reduce it to 50 or even 20 sometimes can also lead to relatively good results, though some details tend to be a little blurry compared with the default settings. This is expected for diffusion models.

BTW, another thing you can try if you are interested is to add multi-GPU support. Although the batch size is 1, since we divide the image into multiple tiles, it is still possible to deal with them separately. Just make sure they are under the same seed.

@zcdliuwei
Copy link
Author

I will try decrease the sampling steps, undoubtedly, this will accelerate inference,
but to ensure the final SR effect, this may not be the preferred solution,
as for distributed inference, currently I only have one A100 server.

This is the SR method that I have seen that can support any input size、any upscale、suitable for images such as wild and AIGC, and has almost the best effect.

Thanks again for your amazing work. I will continue to pay attention to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants