Feature Refinement to Improve High Resolution Image Inpainting #112

ankuPRK · 2022-04-27T16:03:39Z

We are a team of researchers at Geomagical Labs (geomagical.com), a subsidiary of IKEA. We work on pioneering Mixed Reality apps which allow customers to scan photorealistic models of their indoor spaces and re-imagine them with virtual furniture.

In this PR we propose an additional refinement step for LaMa to improve high-resolution inpainting results. We observed that when inpainting large regions at high resolution, LaMa struggles at structure completion. However, at low resolutions, LaMa can infill the same missing region much better. To address this we added an additional refinement step that uses the structure from low resolution predictions to guide higher resolution predictions.

Our approach can work on any inpainting network, and does not require any additional training or network modification.

How to run refinement

To run refinement, simply pass refine=True in the evaluation step as:

    python3 bin/predict.py refine=True model.path=$(pwd)/big-lama indir=$(pwd)/LaMa_test_images outdir=$(pwd)/output

Evaluation

Here's a few example comparisons, with each triplet showing the masked image, inpainting with LaMa, and inpainting with LaMa using refinement:

Comparison of unrefined and refined images on all test images (kindly shared by you) is available here: https://drive.google.com/drive/folders/15LEa9k_7-dUKb2CPUDuw7e6Zk28KCtzz?usp=sharing

We also performed some numerical evaluation on 1024x1024 size images sampled from [1], using the thin, medium, and thick masks. Results indicate that LaMa+refinement outperforms all the recent inpainting baselines on high resultion inpainting:

Method	FID (thin)	LPIPS (thin)	FID (medium)	LPIPS (medium)	FID (thick)	LPIPS (thick)
AOTGAN [3]	17.387	0.133	34.667	0.144	54.015	0.184
LatentDiffusion [4]	18.505	0.141	31.445	0.149	38.743	0.172
MAT [6]	16.284	0.137	27.829	0.135	38.120	0.157
ZITS [5]	15.696	0.125	23.500	0.121	31.777	0.140
LaMa-Fourier [2]	14.780	0.124	22.584	0.120	29.351	0.140
Big-LaMa [2]	13.143	0.114	21.169	0.116	29.022	0.140
Big-LaMa+refinement (ours)	13.193	0.112	19.864	0.115	26.401	0.135

Table 1. Performance comparison of various recent inpainting approaches on 1k 1024x1024 size images

Video

We have also created a video to explain the technical details of our approach:
https://www.youtube.com/watch?v=gEukhOheWgE

References

[1]
Unsplash Dataset. https://unsplash.com/data, 2020

[2]
Suvorov, R., Logacheva, E., Mashikhin, A., Remizova, A., Ashukha, A., Silvestrov, A., Kong, N., Goka, H., Park, K. and Lempitsky, V., 2022. Resolution-robust Large Mask Inpainting with Fourier Convolutions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2149-2159)

[3]
Zeng, Y., Fu, J., Chao, H. and Guo, B., 2022. Aggregated contextual transformations for high-resolution image inpainting. IEEE Transactions on Visualization and Computer Graphics.

[4]
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. and Ommer, B., 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv preprint arXiv:2112.10752.

[5]
Dong, Q., Cao, C. and Fu, Y., 2022. Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding. arXiv preprint arXiv:2203.00867.

[6]
Li, W., Lin, Z., Zhou, K., Qi, L., Wang, Y. and Jia, J., 2022. MAT: Mask-Aware Transformer for Large Hole Image Inpainting. arXiv preprint arXiv:2203.15270.

windj007 · 2022-04-27T16:07:50Z

Wow! That's extremely cool! Can't wait to try!

senya-ashukha · 2022-04-27T16:13:43Z

Wow folks! that is really impressive!

windj007 · 2022-04-27T16:39:57Z

Just curious, have you tried your approach with other methods, e.g. MAT or ZITS? Is the improvement for them is like that for LaMa? I'm also amazed by the fact that such an optimization technique does not introduce high-frequency artifacts... How do you think, why is it so?

ankuPRK · 2022-04-27T19:06:19Z

Hi, we haven't tried it with other methods due to our limited bandwidth. We started with LaMa, and then during evaluation we found that without refinement, LaMa still generalizes better than the newer methods on high resolution inpainting. So we didn't have a very strong motivation to try this on the other methods. We'd love to see how it works on them though.

When applying the refinement, we're basically asking the network to find a high-resolution featuremap that produces an output that, when downscaled, looks like the low-resolution output. We hypothesize that when this featuremap is found, it contains high-level encoded information learned from training about the contents in that region. For example, optimization may adjust a feature that was "kind-of-brick-like" to become "very-red-brick-like." The optimized high level features then gets decoded into low and high frequency brick-like textures.

So, it's possible that these latent features of the downscaler control low and high frequency details jointly. We also tried to optimize the latent features of the upscaler, but it didn't work well, and produced cloud-like artifacts. So it probably has something to do with large and overlapping receptive fields of pixels of the featuremap.

windj007 · 2022-04-28T10:52:39Z

Thank you for the clarification!

that without refinement, LaMa still generalizes better

Yeah, but in lower resolution these methods are stronger than LaMa - so they might probably benefit more from your method.
Maybe the difference between effects of your method on different architectures would highlight the inherent inductive biases in them - and help build a better new archtecture.

these latent features of the downscaler control low and high frequency details jointly.

Yes, lo-freq and high-freq details do not seem to be disentangled in the features.

windj007

I'd kindly suggest simplifying the usage and making default config values more sensible for broader set of environments (e.g. single-gpu). Anyway, this is a great contribution!

windj007 · 2022-04-28T11:31:23Z

bin/predict.py

+                    batch['mask'] = (batch['mask'] > 0) * 1
+                    batch = model(batch)                    
+                    cur_res = batch[predict_config.out_key][0].permute(1, 2, 0).detach().cpu().numpy()
+                    unpad_to_size = batch.get('unpad_to_size', None)


L84-87 should be outside if-else - they need to be executed regardless refinement

L84-87 already get addressed inside the refiner. Refiner works on unpadded images (it does the necesssary padding internally and then unpads the output appropriately). We can:

add an assertion to check unpad_to_size is not None

enable refiner to just run on padded image, if unpad_to_size is None.

Ok, I see. I'd move padding-unpadding from the refiner to predict.py - so both parts of the code are simplified and no logic duplication is introduced. What do you think, is it possible and does it make sense?

We can let the refiner get padded input. But refiner still needs some padding in place. Because -

Suppose your input image is a square of size 1000. Then the original image isn't padded because 1000%8==0, but in the refiner, once we downscale the image, it's size becomes 500, and 500%8!=0. So we have to pad it to make it 504x504.

So we can't get rid of lines 301 and 302 in refinement.py, but we can:

let the padded image to be input to refiner, so that we take L84-87 outside the if-else.

refiner then doesn't check unpad_to_size argument at all.

Padding would happen in the refiner to ensure downscaled image size is divisible by 8.

refiner still needs some padding in place

I see, thank you for the clarification! Let's just leave that piece of the as is - and add a comment about "padding-unpadding is handled within refiner"

image size is divisible by 8.

Padding size depends on depth of the generator and thus needs to be configurable

Gotcha, will add the comment. Yeah the padding size of the refiner is not exactly 8, but exactly equal to dataset.pad_out_to_modulo in the predict config. I'll add a comment there in the PR

windj007 · 2022-04-28T11:33:09Z

configs/prediction/default.yaml

+
+refine: False # refiner will only run if this is True
+refiner:
+  gpu_ids: 0,1 # the GPU ids of the machine to use. If only single GPU, use: "0,"


I'd suggest using only 0 by default - or even introduce "None" default (so refiner would rely on the parent device setting). That would make this work by default in more environments without any modifications by default.

Actually the refiner needs around 24GB GPU to process 1.8 megapixel images (~1200x1500). Since most people have two 12GB GPUs instead, we decided to split the model onto two GPUs, that's why the default config setting.
Do you suggest to still make it None by default?

Hm, right, I have not thought about memory consumption. It seems that the most of the consumption comes from storing activations for backward... And you're splitting res-blocks between GPUs to distribute that memory - not to speedup inference - because GPUs are called sequentially.

I have a couple ideas how to overcome that without complex logic or requirement to have two GPUs:

Set param.requires_grad_(False) for all parameters in the generator - that will lead to storing only activations, not gradients for parameters.

Use activation checkpointing - it does something very similar to what you're doing - it splits a nn.Sequential in multiple chunks and runs each chunk with torch.no_grad - so only activations between chunks have to be stored. That will slow the optimization down, but maybe not severely.

torch.cuda.amp - optimize in fp16 instead of fp32. In case of refinement there is no adversarial training, so there should not be stability issues due to reduced precision (but I'm not sure)

Thanks for the ideas! param.requires_grad_ is already set to False, since we freeze the model here: https://github.com/geomagical/lama-with-refiner/blob/24a20f804390c6ab969c28abbe999c940f8d6a56/bin/predict.py#L58
I also manually verified the requires_grad for all the params of the model, they were False.

We were already looking at activation checkpointing, will focus on it more now that you have also mentioned it. Will try the third idea also.

torch.cuda.amp isn't working because pytorch doesn't seem to support Half dtype for torch.fft.rfftn. PFA link to the relevant issues in Pytorch repo:

pytorch/pytorch#70664
pytorch/pytorch#71680

I also manually verified the requires_grad for all the params of the model

Great, thank you!

torch.cuda.amp isn't working because pytorch doesn't seem to support Half

Sure, I've forgot that I've already tried half and failed because of that... We could wrap rfftn/irfftn with conversion to and from .float(), but I'm not sure there wouldn't be other issues..

Hello @windj007, sorry for coming back after 2 months! We picked up the experiments, our findings:

We were able to perform the optimization in mixed precision. I haven't benchmarked it quantitatively, but qualitative results look good. However, for 1024x1024 images, it only reduces the memory from 21-22GB -> 17-18GB, so it is still not sufficient to fit on a single 12GB GPU

We also tried to play with checkpointing. Performing it naively throws RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn, which we bypass by setting use_reentrant=False. However, this setting has some memory leak problem, which causes the GPU consumption to increase at each training loop, eventually leading to OOM error. We plan to raise this issue on the Pytorch repo.

Can you please share your code for mixed-precision?

Sure, it's in amp_float16 branch of our fork:

https://github.com/geomagical/lama-with-refiner/tree/amp_float16

You can get this code by:

git clone git@github.com:geomagical/lama-with-refiner.git git checkout amp_float16

Also, I've changed the config file of the refiner to run on a single GPU. But yeah feel free to play around with config parameters or anything :)

Link to the config file in the code: https://github.com/geomagical/lama-with-refiner/blob/amp_float16/configs/prediction/default.yaml

Thank you so much 🙏

ankuPRK · 2022-05-03T15:50:22Z

Thanks for your feedback, appreciate it! We are very much interested in addressing your concerns until you're comfortable enough to merge this PR into your code. The usage complication is primarily because we try to fit the refinement on multiple devices. Refinement requires at least 24GB GPU to run on images of sizes like 1024x1024.

ankuPRK · 2022-05-14T06:27:20Z

configs/prediction/default.yaml

+refine: False # refiner will only run if this is True
+refiner:
+  gpu_ids: 0,1 # the GPU ids of the machine to use. If only single GPU, use: "0,"
+  modulo: ${dataset.pad_out_to_modulo}


@windj007 refiner padding is defined here

senya-ashukha · 2022-07-23T17:41:07Z

Hi folks! Your results are amazing, so I've added them to the main page! Fee free to share twitter post and paper!

ankuPRK · 2022-07-23T18:15:27Z

@senya-ashukha that works too! :)

ankuPRK · 2022-07-25T18:54:48Z

Hello,

Thanks for your shout out on the main README. Let us know if you would like us to close this PR.

senya-ashukha · 2022-07-28T22:37:05Z

Hi @ankuPRK, @windj007 is on long vacation. I've merged it, but @windj007 may discuss it further as he returns.

ankuPRK · 2022-07-28T22:54:09Z

Sounds good, thanks a lot! Happy to follow up on any further review comments regarding code/formatting/ideas or anything :)

mhashas · 2022-08-11T17:26:14Z

is there anyway i can make this work on a gpu of 11GB? or run it on CPU?

ankuPRK · 2022-08-15T20:06:22Z

@mhashas You can try using our mixed precision branch: #112 (comment)

We can probably have a CPU version, but it will be super slow. If you are okay with a slow CPU version, I think replacing the device with torch.device('cpu') in predict.py and refinement.py should work.

Feature Refinement to Improve High Resolution Image Inpainting

202112213501021 · 2024-03-30T09:02:13Z

Can I separate out the Feature Refinement to Improve High Resolution Image Inpainting technique， just for a single image?

runtime feature refinement initial commit

24a20f8

windj007 requested changes Apr 28, 2022

View reviewed changes

ankuPRK and others added 2 commits May 13, 2022 15:43

Merge remote-tracking branch 'origin' into refinement

809ba78

Merge branch 'saic-mdal:main' into refinement

c1d0f16

ankuPRK commented May 14, 2022

View reviewed changes

ankuPRK and others added 2 commits May 14, 2022 12:15

add a comment about padding inside refiner

3f571e6

Merge branch 'saic-mdal:main' into refinement

e4b276b

Merge branch 'saic-mdal:main' into refinement

63d7816

senya-ashukha merged commit bd69ec3 into advimman:main Jul 28, 2022

Sanster mentioned this pull request Aug 1, 2022

Does Lama Cleaner support the new feature refinement? Sanster/IOPaint#57

Closed

HeunSeungLim pushed a commit to HeunSeungLim/lama_HL that referenced this pull request Nov 14, 2022

Merge pull request advimman#112 from geomagical/refinement

58acfcd

Feature Refinement to Improve High Resolution Image Inpainting

rentainhe mentioned this pull request May 15, 2023

[Feature Request] Combing GroundingDINO and SAM for more automatic inpaint Sanster/IOPaint#306

Open

SAgiKPJH mentioned this pull request Jan 4, 2024

Improve LaMa 적용 SagiK-Repository/Docker_AI_Image_RemovePart_Service#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Refinement to Improve High Resolution Image Inpainting #112

Feature Refinement to Improve High Resolution Image Inpainting #112

ankuPRK commented Apr 27, 2022 •

edited

windj007 commented Apr 27, 2022

senya-ashukha commented Apr 27, 2022

windj007 commented Apr 27, 2022

ankuPRK commented Apr 27, 2022 •

edited

windj007 commented Apr 28, 2022

windj007 left a comment

windj007 Apr 28, 2022

ankuPRK May 3, 2022

windj007 May 6, 2022

ankuPRK May 9, 2022

windj007 May 13, 2022

ankuPRK May 14, 2022

windj007 Apr 28, 2022

ankuPRK May 3, 2022

windj007 May 6, 2022

ankuPRK May 9, 2022 •

edited

ankuPRK May 13, 2022

windj007 May 13, 2022

ankuPRK Jul 22, 2022

FBehrad Aug 2, 2022

ankuPRK Aug 2, 2022 •

edited

FBehrad Aug 3, 2022

ankuPRK commented May 3, 2022

ankuPRK May 14, 2022

senya-ashukha commented Jul 23, 2022

ankuPRK commented Jul 23, 2022

ankuPRK commented Jul 25, 2022 •

edited

senya-ashukha commented Jul 28, 2022

ankuPRK commented Jul 28, 2022

mhashas commented Aug 11, 2022

ankuPRK commented Aug 15, 2022

202112213501021 commented Mar 30, 2024

Feature Refinement to Improve High Resolution Image Inpainting #112

Feature Refinement to Improve High Resolution Image Inpainting #112

Conversation

ankuPRK commented Apr 27, 2022 • edited

How to run refinement

Evaluation

Video

References

windj007 commented Apr 27, 2022

senya-ashukha commented Apr 27, 2022

windj007 commented Apr 27, 2022

ankuPRK commented Apr 27, 2022 • edited

windj007 commented Apr 28, 2022

windj007 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankuPRK May 9, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankuPRK Aug 2, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankuPRK commented May 3, 2022

Choose a reason for hiding this comment

senya-ashukha commented Jul 23, 2022

ankuPRK commented Jul 23, 2022

ankuPRK commented Jul 25, 2022 • edited

senya-ashukha commented Jul 28, 2022

ankuPRK commented Jul 28, 2022

mhashas commented Aug 11, 2022

ankuPRK commented Aug 15, 2022

202112213501021 commented Mar 30, 2024

ankuPRK commented Apr 27, 2022 •

edited

ankuPRK commented Apr 27, 2022 •

edited

ankuPRK May 9, 2022 •

edited

ankuPRK Aug 2, 2022 •

edited

ankuPRK commented Jul 25, 2022 •

edited