-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding auto1111 features to inpainting pipeline #6072
Conversation
@@ -861,6 +969,35 @@ def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32 | |||
assert emb.shape == (w.shape[0], embedding_dim) | |||
return emb | |||
|
|||
def apply_overlay( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will refactor this method so that we can use it to fix the unmasked area when not using inpaint_full_res too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Let's make it maybe private for now knowing that we'll change the naming soon.
def apply_overlay( | |
def _apply_overlay( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is ready to use as a public pipeline method now!
I tested with this example, using the apply_overlay()
method to keep the unmasked area fixed (in this case, it is the bottle!)
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
import torch
init_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/g_rgb.png")
image_mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/g_mask.png")
inpaint_pipe = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
)
inpaint_pipe.to("cuda")
generator = torch.Generator(device="cuda").manual_seed(3)
output = inpaint_pipe(
image=init_image,
mask_image=image_mask,
prompt="a bottle emerging from ripples in a lake surrounded by plants and flowers",
negative_prompt="blurry, bad quality, painting, cgi, malformation",
guidance_scale=7.,
num_inference_steps=40,
generator=generator,
)
output.images[0].save(f"out_no_overlay.png")
repained_image = inpaint_pipe.apply_overlay(image_mask, init_image, output.images[0])
repained_image.save(f"out_overlay.png")
no overlay | overlay |
---|---|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we add this method to the image_processor
instead? I can see that self.mask_processor.resize(...)
is used, but could we also just use self.image_processor.resize(...)
here?
The apply_overlay
method seems very much related to the image processor to me. I think writing
image = self.image_processor.apply_overlay(mask, init_image, image, crop_coords)
further below is cleaner here
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
height: int, | ||
width: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
height: int, | |
width: int, | |
height: int, | |
width: int, |
Was it incorrect to have height
and width
be optional before? Did it throw an error when they weren't passed?
Ok to remove Optional[int] = None
if one had to pass them anyways - otherwise it'd be backwards breaking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was incorrect before and will throw an error when they were not passed.
This is the function in the current codebase
diffusers/src/diffusers/image_processor.py
Line 212 in bf7f9b4
def resize( |
def resize(
self,
image: Union[PIL.Image.Image, np.ndarray, torch.Tensor],
height: Optional[int] = None,
width: Optional[int] = None,
) -> Union[PIL.Image.Image, np.ndarray, torch.Tensor]:
"""
Resize image.
Args:
image (`PIL.Image.Image`, `np.ndarray` or `torch.Tensor`):
The image input, can be a PIL image, numpy array or pytorch tensor.
height (`int`, *optional*, defaults to `None`):
The height to resize to.
width (`int`, *optional*`, defaults to `None`):
The width to resize to.
Returns:
`PIL.Image.Image`, `np.ndarray` or `torch.Tensor`:
The resized image.
"""
if isinstance(image, PIL.Image.Image):
image = image.resize((width, height), resample=PIL_INTERPOLATION[self.config.resample])
elif isinstance(image, torch.Tensor):
image = torch.nn.functional.interpolate(
image,
size=(height, width),
)
elif isinstance(image, np.ndarray):
image = self.numpy_to_pt(image)
image = torch.nn.functional.interpolate(
image,
size=(height, width),
)
image = self.pt_to_numpy(image)
return image
if self.config.do_resize: | ||
height, width = self.get_default_height_width(image[0], height, width) | ||
image = [self.resize(i, height, width) for i in image] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain that change a bit? Why should it be done before do_convert_rgb
and do_convert_to_grayscale
? Or is the order irrelevant here as both will give the same results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know this before but resize with resampe = PIL.Image.LANCZOS
will turn a grayscale image into RGB....
diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py
Line 143 in bf7f9b4
mask = [i.resize((width, height), resample=PIL.Image.LANCZOS) for i in mask] |
Moving the resize before converting ensures the processed image is the desired type. As a reference, this was actually the order in the deprecated prepare_mask_and_masked_image
if isinstance(image, list) and isinstance(image[0], PIL.Image.Image):
# resize all images w.r.t passed height an width
image = [i.resize((width, height), resample=PIL.Image.LANCZOS) for i in image]
image = [np.array(i.convert("RGB"))[None, :] for i in image]
image = np.concatenate(image, axis=0)
@@ -37,6 +38,94 @@ | |||
logger = logging.get_logger(__name__) # pylint: disable=invalid-name | |||
|
|||
|
|||
def get_crop_region(mask_image: PIL.Image.Image, width: int, height: int, pad=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe move this also into image_processor
and make it a function of the image processor object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to move there, but I'm in favor not to: this is a super specific function I don't think will be used for another purpose... also need the image to be a "mask" for it to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok to use # Copied from now, but we'll already have to use it for all our inpaint pipelines (SD, SDXL, SD + ControlNet, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I will move to the image processor then 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm very happy with the API.
# create blurred nask
mask_blurred = pipe1.mask_processor.blur(mask, blur_factor=blur_factor)
# with mask blur
generator =torch.Generator(device='cuda').manual_seed(seed)
inpaint = pipe1('boat', image=base, mask_image=mask_blurred, strength=0.75,generator=generator).images[0]
Think going for a blur
function on the processor and passing it to mask_image
is the way to go here!
Left some questions / nits. Also can we maybe change the argument inpaint_full_res
to something more specific? E.g. we already know that it's inpaint because people use the inpainting pipeline. What's happening with full_res
is essentially that the model inpaints a much larger area and the non-masked context is heavily reduced no?
Would something like padding_mask_crop
: Optional[int] = None` make more sense maybe? This way we also only need one argument instead of two. We'd def need a good docstring then!
Well, maybe, it should read as |
Hi @yiyixuxu , I reviewed the codes and tested them for the SDXL Inpaint+ Controlnet pipeline. And the results are not good. Will you write the codes for SDXL? Or can I create a new pull request? |
Hi @kadirnar |
Image,mask,control_image(depth): custom parameters(sdxl-turbo): output_inpaint = sdxl_pipeline(
num_inference_steps=6,
height=1024,
width=1024,
strength=1.0,
controlnet_conditioning_scale=0.0,
guidance_scale=0.0,
padding_mask_crop=32
).images |
@kadirnar padding_mask_crop is not "mask_blur" in auto1111, it is the "inpaint area = only masked" option. You can still probably use mask blur here with this API mask_blurred = pipe_inpaint.mask_processor.blur(mask, blur_factor=32)
out = pipe_inpaint(..., mask_image=mask_blurred, ...)
|
Thank you. I want to increase the controlnet_conditioning_scale parameter. When I do this the results are bad. And why is noise applied to the entire picture? What can I do to apply it only to the mask? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me. Could we add a test for mask blur?
Will this pull request be merged? |
Will it be possible to use this PR with |
@yiyixuxu let's try to get this PR in no? |
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py
Outdated
Show resolved
Hide resolved
…sion_inpaint.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Very nice! Good job on getting this one in |
* add inpaint_full_res * fix * update * move get_crop_region to image processor * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * move apply_overlay to image processor --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add inpaint_full_res * fix * update * move get_crop_region to image processor * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * move apply_overlay to image processor --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
image = self.image_processor.postprocess(image, output_type=output_type, do_denormalize=do_denormalize)
if padding_mask_crop is not None:
image = [self.image_processor.apply_overlay(mask_image, original_image, i, crops_coords) for i in image] this only works if
|
would it be okay to allow this feature only for `output_typpe=pil"? (by checking the inputs and throwing an error otherwise) The resize methods they use for this are quite specific so I'm not sure how well it would work on latent. I will maybe play around to see how it works if it is important to enable this feature for other output types. cc @patrickvonplaten here too |
To start with, yes, absolutely. |
* add inpaint_full_res * fix * update * move get_crop_region to image processor * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * move apply_overlay to image processor --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Hi, Thanks for this. I have several questions about the problem of inpainting with a soft/blurred mask. First of all, the provided example does not use a pure inpainting model but Also, although blurring the mask, effectively we end up with a binary mask because of this line. Therefore, what is the point of blurring a mask? I was expecting the soft mask values to serve as mechanism to create a smooth transition between original content and inpainted content. Is it possible to do inpainting with a non-binary mask? |
This PR adds two auto1111 features to the inpainting pipeline, aiming to improve the "visible mask border" issue discussed here #5808
new feature 1:
mask_blur
optionI simply added this to image processor, the user can blur the mask before passing it to the pipeline
new feature 2:
padding_mask_crop
this feature corresponding to the inpaint area = "only masked" option in auto1111
It is quite interesting! when this is selected, it will:
- crop out a rectangle area around the mask (based on a
padding
parameter user specified)- crop out the same area from the image
- upscale both image and mask to full resolution and inpaint with them
- size them back and overlay the output over the original image
notes on the "masked_content = fill, original, latent nothing, latent noise" option
I think what we have right now is :
masked_content = original
in auto1111masked_content = latent_noise
in auto1111so when you want to generate content based on the original image, set a strength value < 1.0; if you want to generate something completely new, use
strength =1
.auto1111 uses different methods to fill the masked area. But we achieve a similar outcome with the special behavior of
strength
value when it is set to be1
. IMO we don't need to add the other two options for now. The source code for thefill
method literally had a note that says: "It is not super effective."testing
baseline
mask_blur
mask_blur + padding_mask_crop=32
Output with seed
33
baseline
mask_blur
m ask_blur + padding_mask_crop=32