Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding auto1111 features to inpainting pipeline #6072

Merged
merged 6 commits into from
Dec 26, 2023
Merged

adding auto1111 features to inpainting pipeline #6072

merged 6 commits into from
Dec 26, 2023

Conversation

yiyixuxu
Copy link
Collaborator

@yiyixuxu yiyixuxu commented Dec 6, 2023

This PR adds two auto1111 features to the inpainting pipeline, aiming to improve the "visible mask border" issue discussed here #5808

new feature 1: mask_blur option

I simply added this to image processor, the user can blur the mask before passing it to the pipeline

mask_blurred = pipe_inpaint.mask_processor.blur(mask, blur_factor=32)
out = pipe_inpaint(..., mask_image=mask_blurred, ...)

new feature 2: padding_mask_crop

this feature corresponding to the inpaint area = "only masked" option in auto1111
Screenshot 2023-12-01 at 1 38 15 PM
It is quite interesting! when this is selected, it will:
- crop out a rectangle area around the mask (based on a padding parameter user specified)
- crop out the same area from the image
- upscale both image and mask to full resolution and inpaint with them
- size them back and overlay the output over the original image

notes on the "masked_content = fill, original, latent nothing, latent noise" option

I think what we have right now is :

  • when strength < 1.0, it is similar to masked_content = original in auto1111
  • when strength = 0, it is similar to masked_content = latent_noise in auto1111

so when you want to generate content based on the original image, set a strength value < 1.0; if you want to generate something completely new, use strength =1.

auto1111 uses different methods to fill the masked area. But we achieve a similar outcome with the special behavior of strength value when it is set to be 1. IMO we don't need to add the other two options for now. The source code for the fill method literally had a note that says: "It is not super effective."

testing

import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
from PIL import Image

model = "runwayml/stable-diffusion-v1-5"
blur_factor = 33
seed = 0

base = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore.png")
mask = load_iamge("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/seashore_mask.png")

# create inpaint pipeline
pipe1 = AutoPipelineForInpainting.from_pretrained(model, torch_dtype=torch.float16).to('cuda')

# this is baseline, no mask blur, no inpant_full_res
generator =torch.Generator(device='cuda').manual_seed(seed)    
inpaint = pipe1('boat', image=base, mask_image=mask, strength=0.75,generator=generator).images[0]
inpaint.save(f'out_base.png')

# create blurred nask
mask_blurred = pipe1.mask_processor.blur(mask, blur_factor=blur_factor)
mask_blurred.save(f'mask_blurred.png')

# with mask blur
generator =torch.Generator(device='cuda').manual_seed(seed) 
inpaint = pipe1('boat', image=base, mask_image=mask_blurred, strength=0.75,generator=generator).images[0]
inpaint.save(f'out_mask_blur.png')

# with both mask_blur and inpaint_full_res
generator =torch.Generator(device='cuda').manual_seed(seed) 
inpaint = pipe1('boat', image=base, mask_image=mask_blurred, strength=0.75,generator=generator, padding_mask_crop=32).images[0]
inpaint.save(f'out_mask_blur_full_res.png')
mask mask_blurred
seashore_mask yiyi_test_2_out_mask_blurred

baseline

yiyi_test_2_out

mask_blur

yiyi_test_2_out_mask_blur

mask_blur + padding_mask_crop=32

yiyi_test_2_out_mask_blur_full_res

Output with seed 33

baseline
yiyi_test_2_out
mask_blur
yiyi_test_2_out_mask_blur
m ask_blur + padding_mask_crop=32
yiyi_test_2_out_mask_blur_full_res

@@ -861,6 +969,35 @@ def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32
assert emb.shape == (w.shape[0], embedding_dim)
return emb

def apply_overlay(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will refactor this method so that we can use it to fix the unmasked area when not using inpaint_full_res too!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Let's make it maybe private for now knowing that we'll change the naming soon.

Suggested change
def apply_overlay(
def _apply_overlay(

Copy link
Collaborator Author

@yiyixuxu yiyixuxu Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ready to use as a public pipeline method now!

I tested with this example, using the apply_overlay() method to keep the unmasked area fixed (in this case, it is the bottle!)

from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
import torch

init_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/g_rgb.png")
image_mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/g_mask.png")

inpaint_pipe = AutoPipelineForInpainting.from_pretrained(
            "runwayml/stable-diffusion-v1-5",
            torch_dtype=torch.float16,
        )
inpaint_pipe.to("cuda")

generator = torch.Generator(device="cuda").manual_seed(3)
output = inpaint_pipe(
    image=init_image,
    mask_image=image_mask,
    prompt="a bottle emerging from ripples in a lake surrounded by plants and flowers",
    negative_prompt="blurry, bad quality, painting, cgi, malformation",
    guidance_scale=7.,
    num_inference_steps=40,
    generator=generator,
    )
output.images[0].save(f"out_no_overlay.png")

repained_image = inpaint_pipe.apply_overlay(image_mask, init_image, output.images[0])
repained_image.save(f"out_overlay.png")
no overlay overlay
yiyi_test_4_out_no_overlay yiyi_test_4_out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we add this method to the image_processor instead? I can see that self.mask_processor.resize(...) is used, but could we also just use self.image_processor.resize(...) here?

The apply_overlay method seems very much related to the image processor to me. I think writing

image = self.image_processor.apply_overlay(mask, init_image, image, crop_coords)

further below is cleaner here

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines +250 to +251
height: int,
width: int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
height: int,
width: int,
height: int,
width: int,

Was it incorrect to have height and width be optional before? Did it throw an error when they weren't passed?

Ok to remove Optional[int] = None if one had to pass them anyways - otherwise it'd be backwards breaking

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was incorrect before and will throw an error when they were not passed.

This is the function in the current codebase

    def resize(
        self,
        image: Union[PIL.Image.Image, np.ndarray, torch.Tensor],
        height: Optional[int] = None,
        width: Optional[int] = None,
    ) -> Union[PIL.Image.Image, np.ndarray, torch.Tensor]:
        """
        Resize image.

        Args:
            image (`PIL.Image.Image`, `np.ndarray` or `torch.Tensor`):
                The image input, can be a PIL image, numpy array or pytorch tensor.
            height (`int`, *optional*, defaults to `None`):
                The height to resize to.
            width (`int`, *optional*`, defaults to `None`):
                The width to resize to.

        Returns:
            `PIL.Image.Image`, `np.ndarray` or `torch.Tensor`:
                The resized image.
        """
        if isinstance(image, PIL.Image.Image):
            image = image.resize((width, height), resample=PIL_INTERPOLATION[self.config.resample])
        elif isinstance(image, torch.Tensor):
            image = torch.nn.functional.interpolate(
                image,
                size=(height, width),
            )
        elif isinstance(image, np.ndarray):
            image = self.numpy_to_pt(image)
            image = torch.nn.functional.interpolate(
                image,
                size=(height, width),
            )
            image = self.pt_to_numpy(image)
        return image

Comment on lines -306 to -308
if self.config.do_resize:
height, width = self.get_default_height_width(image[0], height, width)
image = [self.resize(i, height, width) for i in image]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain that change a bit? Why should it be done before do_convert_rgb and do_convert_to_grayscale? Or is the order irrelevant here as both will give the same results?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know this before but resize with resampe = PIL.Image.LANCZOS will turn a grayscale image into RGB....

mask = [i.resize((width, height), resample=PIL.Image.LANCZOS) for i in mask]

Moving the resize before converting ensures the processed image is the desired type. As a reference, this was actually the order in the deprecated prepare_mask_and_masked_image

if isinstance(image, list) and isinstance(image[0], PIL.Image.Image):
            # resize all images w.r.t passed height an width
            image = [i.resize((width, height), resample=PIL.Image.LANCZOS) for i in image]
            image = [np.array(i.convert("RGB"))[None, :] for i in image]
            image = np.concatenate(image, axis=0)

@@ -37,6 +38,94 @@
logger = logging.get_logger(__name__) # pylint: disable=invalid-name


def get_crop_region(mask_image: PIL.Image.Image, width: int, height: int, pad=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe move this also into image_processor and make it a function of the image processor object?

Copy link
Collaborator Author

@yiyixuxu yiyixuxu Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to move there, but I'm in favor not to: this is a super specific function I don't think will be used for another purpose... also need the image to be a "mask" for it to work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok to use # Copied from now, but we'll already have to use it for all our inpaint pipelines (SD, SDXL, SD + ControlNet, ...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I will move to the image processor then 😅

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm very happy with the API.

# create blurred nask
mask_blurred = pipe1.mask_processor.blur(mask, blur_factor=blur_factor)

# with mask blur
generator =torch.Generator(device='cuda').manual_seed(seed) 
inpaint = pipe1('boat', image=base, mask_image=mask_blurred, strength=0.75,generator=generator).images[0]

Think going for a blur function on the processor and passing it to mask_image is the way to go here!

Left some questions / nits. Also can we maybe change the argument inpaint_full_res to something more specific? E.g. we already know that it's inpaint because people use the inpainting pipeline. What's happening with full_res is essentially that the model inpaints a much larger area and the non-masked context is heavily reduced no?

Would something like padding_mask_crop: Optional[int] = None` make more sense maybe? This way we also only need one argument instead of two. We'd def need a good docstring then!

@iamwavecut
Copy link

padding_mask_crop

Well, maybe, it should read as mask_crop_padding?

@kadirnar
Copy link
Contributor

Hi @yiyixuxu ,

I reviewed the codes and tested them for the SDXL Inpaint+ Controlnet pipeline. And the results are not good. Will you write the codes for SDXL? Or can I create a new pull request?

@yiyixuxu
Copy link
Collaborator Author

Hi @kadirnar
Can you clarify what you mean by "results are not good"?

@kadirnar
Copy link
Contributor

Hi @kadirnar Can you clarify what you mean by "results are not good"?

Image,mask,control_image(depth):

image

custom parameters(sdxl-turbo):

output_inpaint = sdxl_pipeline(
    num_inference_steps=6,
    height=1024,
    width=1024,
    strength=1.0,
    controlnet_conditioning_scale=0.0,
    guidance_scale=0.0,
    padding_mask_crop=32
    ).images

mask-blur=32:
image

controlnet_conditioning_scale=0.5:
The background is bad.
image

strength=0.7 and controlnet_conditioning_scale=0.0:
image

@yiyixuxu
Copy link
Collaborator Author

@kadirnar
thanks. I don't think padding_mask_crop works with "outpainting", i.e. you are generating a new background here

padding_mask_crop is not "mask_blur" in auto1111, it is the "inpaint area = only masked" option. You can still probably use mask blur here with this API

mask_blurred = pipe_inpaint.mask_processor.blur(mask, blur_factor=32)
out = pipe_inpaint(..., mask_image=mask_blurred, ...)

strength = 0.7 failing to generate the new background as expected, you need to use 1.0 here since you do not want the generation to be similar to the original image you provided

@kadirnar
Copy link
Contributor

@kadirnar thanks. I don't think padding_mask_crop works with "outpainting", i.e. you are generating a new background here

padding_mask_crop is not "mask_blur" in auto1111, it is the "inpaint area = only masked" option. You can still probably use mask blur here with this API

mask_blurred = pipe_inpaint.mask_processor.blur(mask, blur_factor=32)
out = pipe_inpaint(..., mask_image=mask_blurred, ...)

strength = 0.7 failing to generate the new background as expected, you need to use 1.0 here since you do not want the generation to be similar to the original image you provided

Thank you. I want to increase the controlnet_conditioning_scale parameter. When I do this the results are bad. And why is noise applied to the entire picture? What can I do to apply it only to the mask?

Copy link
Collaborator

@DN6 DN6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. Could we add a test for mask blur?

@kadirnar
Copy link
Contributor

Will this pull request be merged?

@ManuelZ
Copy link

ManuelZ commented Dec 22, 2023

Will it be possible to use this PR with StableDiffusionControlNetInpaintPipeline?

@patrickvonplaten
Copy link
Contributor

@yiyixuxu let's try to get this PR in no?

yiyixuxu and others added 2 commits December 26, 2023 09:41
…sion_inpaint.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
@yiyixuxu yiyixuxu merged commit f0a588b into main Dec 26, 2023
16 checks passed
@yiyixuxu yiyixuxu deleted the inpaint-mask branch December 26, 2023 20:31
@patrickvonplaten
Copy link
Contributor

Very nice! Good job on getting this one in

donhardman pushed a commit to donhardman/diffusers that referenced this pull request Dec 29, 2023
* add inpaint_full_res

* fix

* update

* move get_crop_region to image processor

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* move apply_overlay to image processor

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
antoine-scenario pushed a commit to antoine-scenario/diffusers that referenced this pull request Jan 2, 2024
* add inpaint_full_res

* fix

* update

* move get_crop_region to image processor

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* move apply_overlay to image processor

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
@vladmandic
Copy link
Contributor

        image = self.image_processor.postprocess(image, output_type=output_type, do_denormalize=do_denormalize)

        if padding_mask_crop is not None:
            image = [self.image_processor.apply_overlay(mask_image, original_image, i, crops_coords) for i in image]

this only works if output_type='pil', otherwise its a mismatch - you can't apply overlay on image which may be a np array or a tensor (if output_type=latent)

│ /home/vlado/.local/lib/python3.11/site-packages/diffusers/image_processor.py:627 in apply_overlay                                                                                                    │
│                                                                                                                                                                                                      │
│   626 │   │                                                                                                                                                                                          │
│ ❱ 627 │   │   width, height = image.width, image.height                                                                                                                                              │
│   628                                                                                                                                                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'Tensor' object has no attribute 'width'

@yiyixuxu
Copy link
Collaborator Author

yiyixuxu commented Jan 5, 2024

@vladmandic,

would it be okay to allow this feature only for `output_typpe=pil"? (by checking the inputs and throwing an error otherwise)

The resize methods they use for this are quite specific so I'm not sure how well it would work on latent. I will maybe play around to see how it works if it is important to enable this feature for other output types.

cc @patrickvonplaten here too

@vladmandic
Copy link
Contributor

vladmandic commented Jan 5, 2024

To start with, yes, absolutely.
But in principle, it should work on numpy and latents (just div 8) as well using cv2 for resize as it doesn't care about channel content.

AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* add inpaint_full_res

* fix

* update

* move get_crop_region to image processor

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* move apply_overlay to image processor

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
@manurare
Copy link

manurare commented Jun 8, 2024

Hi,

Thanks for this. I have several questions about the problem of inpainting with a soft/blurred mask.

First of all, the provided example does not use a pure inpainting model but model = "runwayml/stable-diffusion-v1-5". Why is this exactly? In the pipe __call__ method, only if unet has 4 input channels the noisy latents are linearly interpolated with the noise via the mask. This means that if we use "runwayml/stable-diffusion-inpainting" that condition is never met. Is this not required for pure inpainting models because of their extra conditioning on mask + masked image?

Also, although blurring the mask, effectively we end up with a binary mask because of this line. Therefore, what is the point of blurring a mask? I was expecting the soft mask values to serve as mechanism to create a smooth transition between original content and inpainted content. Is it possible to do inpainting with a non-binary mask?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants