adding auto1111 features to inpainting pipeline #6072

yiyixuxu · 2023-12-06T10:00:24Z

This PR adds two auto1111 features to the inpainting pipeline, aiming to improve the "visible mask border" issue discussed here #5808

new feature 1: `mask_blur` option

I simply added this to image processor, the user can blur the mask before passing it to the pipeline

mask_blurred = pipe_inpaint.mask_processor.blur(mask, blur_factor=32)
out = pipe_inpaint(..., mask_image=mask_blurred, ...)

new feature 2: `padding_mask_crop`

this feature corresponding to the inpaint area = "only masked" option in auto1111

It is quite interesting! when this is selected, it will:
- crop out a rectangle area around the mask (based on a padding parameter user specified)
- crop out the same area from the image
- upscale both image and mask to full resolution and inpaint with them
- size them back and overlay the output over the original image

notes on the "masked_content = fill, original, latent nothing, latent noise" option

I think what we have right now is :

when strength < 1.0, it is similar to masked_content = original in auto1111
when strength = 0, it is similar to masked_content = latent_noise in auto1111

so when you want to generate content based on the original image, set a strength value < 1.0; if you want to generate something completely new, use strength =1.

auto1111 uses different methods to fill the masked area. But we achieve a similar outcome with the special behavior of strength value when it is set to be 1. IMO we don't need to add the other two options for now. The source code for the fill method literally had a note that says: "It is not super effective."

testing

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
from PIL import Image

model = "runwayml/stable-diffusion-v1-5"
blur_factor = 33
seed = 0

base = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore.png")
mask = load_iamge("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore_mask.png")

# create inpaint pipeline
pipe1 = AutoPipelineForInpainting.from_pretrained(model, torch_dtype=torch.float16).to('cuda')

# this is baseline, no mask blur, no inpant_full_res
generator =torch.Generator(device='cuda').manual_seed(seed)    
inpaint = pipe1('boat', image=base, mask_image=mask, strength=0.75,generator=generator).images[0]
inpaint.save(f'out_base.png')

# create blurred nask
mask_blurred = pipe1.mask_processor.blur(mask, blur_factor=blur_factor)
mask_blurred.save(f'mask_blurred.png')

# with mask blur
generator =torch.Generator(device='cuda').manual_seed(seed) 
inpaint = pipe1('boat', image=base, mask_image=mask_blurred, strength=0.75,generator=generator).images[0]
inpaint.save(f'out_mask_blur.png')

# with both mask_blur and inpaint_full_res
generator =torch.Generator(device='cuda').manual_seed(seed) 
inpaint = pipe1('boat', image=base, mask_image=mask_blurred, strength=0.75,generator=generator, padding_mask_crop=32).images[0]
inpaint.save(f'out_mask_blur_full_res.png')

mask	mask_blurred

baseline

mask_blur

mask_blur + padding_mask_crop=32

Output with seed 33

baseline

mask_blur

m ask_blur + padding_mask_crop=32

yiyixuxu · 2023-12-06T10:02:17Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py

@@ -861,6 +969,35 @@ def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32
        assert emb.shape == (w.shape[0], embedding_dim)
        return emb

+    def apply_overlay(


will refactor this method so that we can use it to fix the unmasked area when not using inpaint_full_res too!

Sounds good! Let's make it maybe private for now knowing that we'll change the naming soon.

Suggested change

def apply_overlay(

def _apply_overlay(

It is ready to use as a public pipeline method now!

I tested with this example, using the apply_overlay() method to keep the unmasked area fixed (in this case, it is the bottle!)

from diffusers import AutoPipelineForInpainting from diffusers.utils import load_image import torch init_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/g_rgb.png") image_mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/g_mask.png") inpaint_pipe = AutoPipelineForInpainting.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, ) inpaint_pipe.to("cuda") generator = torch.Generator(device="cuda").manual_seed(3) output = inpaint_pipe( image=init_image, mask_image=image_mask, prompt="a bottle emerging from ripples in a lake surrounded by plants and flowers", negative_prompt="blurry, bad quality, painting, cgi, malformation", guidance_scale=7., num_inference_steps=40, generator=generator, ) output.images[0].save(f"out_no_overlay.png") repained_image = inpaint_pipe.apply_overlay(image_mask, init_image, output.images[0]) repained_image.save(f"out_overlay.png")

no overlay overlay

Couldn't we add this method to the image_processor instead? I can see that self.mask_processor.resize(...) is used, but could we also just use self.image_processor.resize(...) here?

The apply_overlay method seems very much related to the image processor to me. I think writing

image = self.image_processor.apply_overlay(mask, init_image, image, crop_coords)

further below is cleaner here

HuggingFaceDocBuilderDev · 2023-12-06T19:17:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

patrickvonplaten · 2023-12-06T21:24:47Z

src/diffusers/image_processor.py

+        height: int,
+        width: int,


Suggested change

height: int,

width: int,

height: int,

width: int,

Was it incorrect to have height and width be optional before? Did it throw an error when they weren't passed?

Ok to remove Optional[int] = None if one had to pass them anyways - otherwise it'd be backwards breaking

It was incorrect before and will throw an error when they were not passed.

This is the function in the current codebase

diffusers/src/diffusers/image_processor.py

Line 212 in bf7f9b4

def resize(

def resize( self, image: Union[PIL.Image.Image, np.ndarray, torch.Tensor], height: Optional[int] = None, width: Optional[int] = None, ) -> Union[PIL.Image.Image, np.ndarray, torch.Tensor]: """ Resize image. Args: image (`PIL.Image.Image`, `np.ndarray` or `torch.Tensor`): The image input, can be a PIL image, numpy array or pytorch tensor. height (`int`, *optional*, defaults to `None`): The height to resize to. width (`int`, *optional*`, defaults to `None`): The width to resize to. Returns: `PIL.Image.Image`, `np.ndarray` or `torch.Tensor`: The resized image. """ if isinstance(image, PIL.Image.Image): image = image.resize((width, height), resample=PIL_INTERPOLATION[self.config.resample]) elif isinstance(image, torch.Tensor): image = torch.nn.functional.interpolate( image, size=(height, width), ) elif isinstance(image, np.ndarray): image = self.numpy_to_pt(image) image = torch.nn.functional.interpolate( image, size=(height, width), ) image = self.pt_to_numpy(image) return image

patrickvonplaten · 2023-12-06T21:28:15Z

src/diffusers/image_processor.py

-            if self.config.do_resize:
-                height, width = self.get_default_height_width(image[0], height, width)
-                image = [self.resize(i, height, width) for i in image]


Can you explain that change a bit? Why should it be done before do_convert_rgb and do_convert_to_grayscale? Or is the order irrelevant here as both will give the same results?

I did not know this before but resize with resampe = PIL.Image.LANCZOS will turn a grayscale image into RGB....

diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py

Line 143 in bf7f9b4

mask = [i.resize((width, height), resample=PIL.Image.LANCZOS) for i in mask]

Moving the resize before converting ensures the processed image is the desired type. As a reference, this was actually the order in the deprecated prepare_mask_and_masked_image

if isinstance(image, list) and isinstance(image[0], PIL.Image.Image): # resize all images w.r.t passed height an width image = [i.resize((width, height), resample=PIL.Image.LANCZOS) for i in image] image = [np.array(i.convert("RGB"))[None, :] for i in image] image = np.concatenate(image, axis=0)

patrickvonplaten · 2023-12-06T21:28:48Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py

@@ -37,6 +38,94 @@
 logger = logging.get_logger(__name__)  # pylint: disable=invalid-name


+def get_crop_region(mask_image: PIL.Image.Image, width: int, height: int, pad=0):


Should we maybe move this also into image_processor and make it a function of the image processor object?

ok to move there, but I'm in favor not to: this is a super specific function I don't think will be used for another purpose... also need the image to be a "mask" for it to work

Ok to use # Copied from now, but we'll already have to use it for all our inpaint pipelines (SD, SDXL, SD + ControlNet, ...)

good point, I will move to the image processor then 😅

patrickvonplaten

Overall I'm very happy with the API.

# create blurred nask
mask_blurred = pipe1.mask_processor.blur(mask, blur_factor=blur_factor)

# with mask blur
generator =torch.Generator(device='cuda').manual_seed(seed) 
inpaint = pipe1('boat', image=base, mask_image=mask_blurred, strength=0.75,generator=generator).images[0]

Think going for a blur function on the processor and passing it to mask_image is the way to go here!

Left some questions / nits. Also can we maybe change the argument inpaint_full_res to something more specific? E.g. we already know that it's inpaint because people use the inpainting pipeline. What's happening with full_res is essentially that the model inpaints a much larger area and the non-masked context is heavily reduced no?

Would something like padding_mask_crop: Optional[int] = None` make more sense maybe? This way we also only need one argument instead of two. We'd def need a good docstring then!

iamwavecut · 2023-12-07T01:04:11Z

padding_mask_crop

Well, maybe, it should read as mask_crop_padding?

kadirnar · 2023-12-11T10:45:12Z

Hi @yiyixuxu ,

I reviewed the codes and tested them for the SDXL Inpaint+ Controlnet pipeline. And the results are not good. Will you write the codes for SDXL? Or can I create a new pull request?

yiyixuxu · 2023-12-11T17:27:07Z

Hi @kadirnar
Can you clarify what you mean by "results are not good"?

kadirnar · 2023-12-11T17:36:36Z

Hi @kadirnar Can you clarify what you mean by "results are not good"?

Image,mask,control_image(depth):

custom parameters(sdxl-turbo):

output_inpaint = sdxl_pipeline(
    num_inference_steps=6,
    height=1024,
    width=1024,
    strength=1.0,
    controlnet_conditioning_scale=0.0,
    guidance_scale=0.0,
    padding_mask_crop=32
    ).images

mask-blur=32:

controlnet_conditioning_scale=0.5:
The background is bad.

strength=0.7 and controlnet_conditioning_scale=0.0:

yiyixuxu · 2023-12-11T18:26:09Z

@kadirnar
thanks. I don't think padding_mask_crop works with "outpainting", i.e. you are generating a new background here

padding_mask_crop is not "mask_blur" in auto1111, it is the "inpaint area = only masked" option. You can still probably use mask blur here with this API

mask_blurred = pipe_inpaint.mask_processor.blur(mask, blur_factor=32)
out = pipe_inpaint(..., mask_image=mask_blurred, ...)

strength = 0.7 failing to generate the new background as expected, you need to use 1.0 here since you do not want the generation to be similar to the original image you provided

kadirnar · 2023-12-11T20:17:29Z

@kadirnar thanks. I don't think padding_mask_crop works with "outpainting", i.e. you are generating a new background here

padding_mask_crop is not "mask_blur" in auto1111, it is the "inpaint area = only masked" option. You can still probably use mask blur here with this API
mask_blurred = pipe_inpaint.mask_processor.blur(mask, blur_factor=32)
out = pipe_inpaint(..., mask_image=mask_blurred, ...)
strength = 0.7 failing to generate the new background as expected, you need to use 1.0 here since you do not want the generation to be similar to the original image you provided

Thank you. I want to increase the controlnet_conditioning_scale parameter. When I do this the results are bad. And why is noise applied to the entire picture? What can I do to apply it only to the mask?

DN6

Changes look good to me. Could we add a test for mask blur?

kadirnar · 2023-12-19T14:38:17Z

Will this pull request be merged?

ManuelZ · 2023-12-22T08:11:18Z

Will it be possible to use this PR with StableDiffusionControlNetInpaintPipeline?

patrickvonplaten · 2023-12-26T16:52:33Z

@yiyixuxu let's try to get this PR in no?

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py

…sion_inpaint.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

patrickvonplaten · 2023-12-26T21:13:33Z

Very nice! Good job on getting this one in

* add inpaint_full_res * fix * update * move get_crop_region to image processor * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * move apply_overlay to image processor --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

vladmandic · 2024-01-04T19:36:03Z

        image = self.image_processor.postprocess(image, output_type=output_type, do_denormalize=do_denormalize)

        if padding_mask_crop is not None:
            image = [self.image_processor.apply_overlay(mask_image, original_image, i, crops_coords) for i in image]

this only works if output_type='pil', otherwise its a mismatch - you can't apply overlay on image which may be a np array or a tensor (if output_type=latent)

│ /home/vlado/.local/lib/python3.11/site-packages/diffusers/image_processor.py:627 in apply_overlay                                                                                                    │
│                                                                                                                                                                                                      │
│   626 │   │                                                                                                                                                                                          │
│ ❱ 627 │   │   width, height = image.width, image.height                                                                                                                                              │
│   628                                                                                                                                                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'Tensor' object has no attribute 'width'

yiyixuxu · 2024-01-05T02:28:13Z

@vladmandic,

would it be okay to allow this feature only for `output_typpe=pil"? (by checking the inputs and throwing an error otherwise)

The resize methods they use for this are quite specific so I'm not sure how well it would work on latent. I will maybe play around to see how it works if it is important to enable this feature for other output types.

cc @patrickvonplaten here too

vladmandic · 2024-01-05T02:44:53Z

To start with, yes, absolutely.
But in principle, it should work on numpy and latents (just div 8) as well using cv2 for resize as it doesn't care about channel content.

* add inpaint_full_res * fix * update * move get_crop_region to image processor * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * move apply_overlay to image processor --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

manurare · 2024-06-08T10:47:34Z

Hi,

Thanks for this. I have several questions about the problem of inpainting with a soft/blurred mask.

First of all, the provided example does not use a pure inpainting model but model = "runwayml/stable-diffusion-v1-5". Why is this exactly? In the pipe __call__ method, only if unet has 4 input channels the noisy latents are linearly interpolated with the noise via the mask. This means that if we use "runwayml/stable-diffusion-inpainting" that condition is never met. Is this not required for pure inpainting models because of their extra conditioning on mask + masked image?

Also, although blurring the mask, effectively we end up with a binary mask because of this line. Therefore, what is the point of blurring a mask? I was expecting the soft mask values to serve as mechanism to create a smooth transition between original content and inpainted content. Is it possible to do inpainting with a non-binary mask?

add inpaint_full_res

bd82ec3

yiyixuxu requested a review from patrickvonplaten December 6, 2023 10:00

yiyixuxu commented Dec 6, 2023

View reviewed changes

fix

fd12d4b

patrickvonplaten reviewed Dec 6, 2023

View reviewed changes

patrickvonplaten approved these changes Dec 6, 2023

View reviewed changes

patrickvonplaten requested a review from DN6 December 6, 2023 21:46

update

5d971a4

yiyixuxu mentioned this pull request Dec 8, 2023

ControNet Inpainting: use masked_image to create initial latents #5498

Closed

kadirnar mentioned this pull request Dec 11, 2023

Much worse performance from StableDiffusionControlNetInpaintPipeline than sd-webui-controlnet #6101

Closed

DN6 reviewed Dec 12, 2023

View reviewed changes

move get_crop_region to image processor

724c63e

patrickvonplaten reviewed Dec 26, 2023

View reviewed changes

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py Outdated Show resolved Hide resolved

yiyixuxu and others added 2 commits December 26, 2023 09:41

Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffu…

615e331

…sion_inpaint.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

move apply_overlay to image processor

294a05a

yiyixuxu merged commit f0a588b into main Dec 26, 2023
16 checks passed

yiyixuxu mentioned this pull request Dec 26, 2023

add padding_mask_crop to all inpaint pipelines #6345

Closed

yiyixuxu deleted the inpaint-mask branch December 26, 2023 20:31

yiyixuxu mentioned this pull request Dec 27, 2023

[doc] update inpaint doc to use apply_overlay #6364

Merged

This was referenced Jan 8, 2024

add padding_mask_crop to all inpaint pipelines #6360

Merged

Inpainting produces results that are uneven with input image #5808

Open

dain5832 mentioned this pull request Mar 12, 2024

About the inpaint mode: fill, original, latent noise, latent nothing #5691

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding auto1111 features to inpainting pipeline #6072

adding auto1111 features to inpainting pipeline #6072

yiyixuxu commented Dec 6, 2023 •

edited

Loading

yiyixuxu Dec 6, 2023

patrickvonplaten Dec 6, 2023

yiyixuxu Dec 7, 2023 •

edited

Loading

patrickvonplaten Dec 26, 2023

HuggingFaceDocBuilderDev commented Dec 6, 2023

patrickvonplaten Dec 6, 2023

yiyixuxu Dec 7, 2023

patrickvonplaten Dec 6, 2023

yiyixuxu Dec 7, 2023

patrickvonplaten Dec 6, 2023

yiyixuxu Dec 7, 2023 •

edited

Loading

patrickvonplaten Dec 15, 2023

yiyixuxu Dec 16, 2023

patrickvonplaten left a comment

iamwavecut commented Dec 7, 2023

kadirnar commented Dec 11, 2023

yiyixuxu commented Dec 11, 2023

kadirnar commented Dec 11, 2023

yiyixuxu commented Dec 11, 2023

kadirnar commented Dec 11, 2023

DN6 left a comment

kadirnar commented Dec 19, 2023

ManuelZ commented Dec 22, 2023

patrickvonplaten commented Dec 26, 2023

patrickvonplaten commented Dec 26, 2023

vladmandic commented Jan 4, 2024

yiyixuxu commented Jan 5, 2024

vladmandic commented Jan 5, 2024 •

edited

Loading

manurare commented Jun 8, 2024 •

edited

Loading

		@@ -37,6 +38,94 @@
		logger = logging.get_logger(__name__) # pylint: disable=invalid-name


		def get_crop_region(mask_image: PIL.Image.Image, width: int, height: int, pad=0):

adding auto1111 features to inpainting pipeline #6072

adding auto1111 features to inpainting pipeline #6072

Conversation

yiyixuxu commented Dec 6, 2023 • edited Loading

new feature 1: mask_blur option

new feature 2: padding_mask_crop

notes on the "masked_content = fill, original, latent nothing, latent noise" option

testing

baseline

mask_blur

mask_blur + padding_mask_crop=32

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiyixuxu Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Dec 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiyixuxu Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

iamwavecut commented Dec 7, 2023

kadirnar commented Dec 11, 2023

yiyixuxu commented Dec 11, 2023

kadirnar commented Dec 11, 2023

yiyixuxu commented Dec 11, 2023

kadirnar commented Dec 11, 2023

DN6 left a comment

Choose a reason for hiding this comment

kadirnar commented Dec 19, 2023

ManuelZ commented Dec 22, 2023

patrickvonplaten commented Dec 26, 2023

patrickvonplaten commented Dec 26, 2023

vladmandic commented Jan 4, 2024

yiyixuxu commented Jan 5, 2024

vladmandic commented Jan 5, 2024 • edited Loading

manurare commented Jun 8, 2024 • edited Loading

yiyixuxu commented Dec 6, 2023 •

edited

Loading

new feature 1: `mask_blur` option

new feature 2: `padding_mask_crop`

yiyixuxu Dec 7, 2023 •

edited

Loading

yiyixuxu Dec 7, 2023 •

edited

Loading

vladmandic commented Jan 5, 2024 •

edited

Loading

manurare commented Jun 8, 2024 •

edited

Loading