OOM #9

zcdliuwei · 2023-05-17T13:08:16Z

Amazing project！！

I used a 1024 * 800 image and executed the following command：
python sr_val_ddpm_text_T_vqganfin_oldcanvas.py --ckpt ckpt/stablesr_000117.ckpt --vqgan_ckpt ckpt/vqgan_cfw_00011.ckpt --init-img inputs/test_example/ --outdir output --ddpm_steps 200 --dec_w 0.5

By default, I hope to obtain an output with a resolution of 4K, but I got:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 55.62 GiB (GPU 0; 79.19 GiB total capacity; 31.25 GiB already allocated; 14.34 GiB free; 39.41 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

My xformers have been installed correctly：

and this is my test image:

IceClear · 2023-05-17T14:33:38Z

Hi, thanks for your interest.
Large-resolution results require huge GPU memory since for sr_val_ddpm_text_T_vqganfin_oldcanvas.py, it decodes the whole latent codes together for the final outputs. This can avoid border artifacts but 32G memory only can host 2k resolution at most.
For your case, you just need to turn to use sr_val_ddpm_text_T_vqganfin_oldcanvas_tile.py which use the chop operation to generate image part by part for memory saving, but may lead to border artifacts. Actually, we also use this file for generating the SR result of this image.

zcdliuwei · 2023-05-18T06:27:15Z

Yes
when I switch to use sr_val_ddpm_text_T_vqganfin_oldcanvas_tile.py，I got the expected output, it was perfect，and almost no boundary artifacts can be seen！！

Although I don't have many test cases, I believe that using this script can yield reasonable super-resolution results in the vast majority of cases, with almost no boundary artifacts

The only problem now is that the inference time is too long. The 4K example above took more than an hour in total，is there any optimization space in the inference time, or if I mistake used your script?
look forward to your reply

IceClear · 2023-05-18T07:19:18Z

The inference time can be very long for large resolutions and sometimes we observe boundary artifacts, it depends on the content.
Currently, we have not involved any inference acceleration.
I used to try DDIM but it leads to weird results so I gave up. But other accelerated technologies may work.
Under the current case, I think you can decrease the sampling steps. It is 200 by default but reduce it to 50 or even 20 sometimes can also lead to relatively good results, though some details tend to be a little blurry compared with the default settings. This is expected for diffusion models.

BTW, another thing you can try if you are interested is to add multi-GPU support. Although the batch size is 1, since we divide the image into multiple tiles, it is still possible to deal with them separately. Just make sure they are under the same seed.

zcdliuwei · 2023-05-18T07:56:24Z

I will try decrease the sampling steps, undoubtedly, this will accelerate inference,
but to ensure the final SR effect, this may not be the preferred solution，
as for distributed inference, currently I only have one A100 server.

This is the SR method that I have seen that can support any input size、any upscale、suitable for images such as wild and AIGC, and has almost the best effect.

Thanks again for your amazing work. I will continue to pay attention to this issue.

zcdliuwei closed this as completed May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM #9

OOM #9

zcdliuwei commented May 17, 2023

IceClear commented May 17, 2023 •

edited

Loading

zcdliuwei commented May 18, 2023

IceClear commented May 18, 2023 •

edited

Loading

zcdliuwei commented May 18, 2023

OOM #9

OOM #9

Comments

zcdliuwei commented May 17, 2023

IceClear commented May 17, 2023 • edited Loading

zcdliuwei commented May 18, 2023

IceClear commented May 18, 2023 • edited Loading

zcdliuwei commented May 18, 2023

IceClear commented May 17, 2023 •

edited

Loading

IceClear commented May 18, 2023 •

edited

Loading