Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stable Diffusion Inpainting] Allow standard text-to-img checkpoints to be useable for SD inpainting #3533

Merged
merged 13 commits into from May 26, 2023

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented May 23, 2023

This PR allows to use StableDiffusionControlNetInpaintPipeline and StableDiffusionInpaintPipeline for both inpainting models and normal text-to-image models. This has two purposes:

1.)

We can completely remove **StableDiffusionInpaintLegacyPipeline**. People will still want to use one and the same model weights for text2img, img2img, and inpainting so we need to allow the following scenario:

from diffusers import (
    StableDiffusionPipeline,
    StableDiffusionImg2ImgPipeline,
    StableDiffusionInpaintPipeline,
)

text2img = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
img2img = StableDiffusionImg2ImgPipeline(**text2img.components)
inpaint = StableDiffusionInpaintPipeline(**text2img.components)

It's also stated in our official docs, here:
https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components.example

With this PR the standard inpainting pipeline is augmented to allow for inpainting with sd-v1-5 (the text2image checkpoint, not the inpainting) which I think removes some barries. It was also very much a requested feature:

2.)

We need this PR to fully support the ControlNet inpainting model: https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint

This checkpoint works very well with the new inpaint pipeline:

# !pip install transformers accelerate
from diffusers import StableDiffusionControlNetInpaintPipeline, ControlNetModel, DDIMScheduler
from diffusers.utils import load_image
import numpy as np
import torch

init_image = load_image(
    "https://huggingface.co/datasets/diffusers/test-arrays/resolve/main/stable_diffusion_inpaint/boy.png"
)
init_image = init_image.resize((512, 512))

generator = torch.Generator(device="cpu").manual_seed(1)

mask_image = load_image(
    "https://huggingface.co/datasets/diffusers/test-arrays/resolve/main/stable_diffusion_inpaint/boy_mask.png"
)
mask_image = mask_image.resize((512, 512))


def make_inpaint_condition(image, image_mask):
    image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
    image_mask = np.array(image_mask.convert("L")).astype(np.float32) / 255.0

    assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
    image[image_mask > 0.5] = -1.0  # set as masked pixel
    image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
    image = torch.from_numpy(image)
    return image


control_image = make_inpaint_condition(init_image, mask_image)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)

# speed up diffusion process with faster scheduler and memory optimization
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)

pipe.enable_model_cpu_offload()

# generate image
image = pipe(
    "a handsome man with ray-ban sunglasses",
    num_inference_steps=20,
    generator=generator,
    guidance_scale=9.0,
    eta=1.0,
    image=init_image,
    mask_image=mask_image,
    control_image=control_image,
).images[0]

img

becomes:

img

Also, I also played around with other SAM to generate masks and then use our SAM ControlNet model: https://colab.research.google.com/drive/1D6mBtne_m-3E9R-cl_ZCRY87Fq1OjkyI?usp=sharing

cc @sayakpaul this could be useful for a diffusers tool, but had very limited success here tbh.
=> very limited success, the model doesn't seem to work super well. It seems like we either need a ControlNet checkpoint that is purely made for inpainting or use the 9-channel inpainting models.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 23, 2023

The documentation is not available anymore as the PR was closed or merged.

@wangdong2023
Copy link

nice, was looking for this functionality yesterday.

@@ -137,6 +136,13 @@ def __init__(
):
super().__init__()

deprecation_message = (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fully deprecate this class. Now "normal" inpaint has all features that are needed.

@patrickvonplaten
Copy link
Contributor Author

patrickvonplaten commented May 25, 2023

Will update README of: https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint once this PR is merged

Copy link
Contributor

@williamberman williamberman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@patrickvonplaten patrickvonplaten merged commit d114d80 into main May 26, 2023
9 checks passed
@patrickvonplaten patrickvonplaten deleted the add_default_sd_to_inpaint branch May 26, 2023 08:47
rupertmenneer pushed a commit to rupertmenneer/diffusers that referenced this pull request May 26, 2023
…e_sigma when pure noise

updated this commit w.r.t the latest merge here: huggingface#3533
patrickvonplaten added a commit that referenced this pull request May 30, 2023
* Throw error if strength adjusted num_inference_steps < 1

* Added new fast test to check ValueError raised when num_inference_steps < 1

when strength adjusts the num_inference_steps then the inpainting pipeline should fail

* fix #3487 initial latents are now only scaled by init_noise_sigma when pure noise

updated this commit w.r.t the latest merge here: #3533

* fix

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
@rb-synth
Copy link

Looking at the example, the control image is just the masked original. How can I use e.g., normalbae with inpainting?

@ksai2324
Copy link

I tried using the pipeline with lllyasviel/sd-controlnet-canny , but it didn't work. It just delivered a random noise in the masked part.
I'm still trying to understand why.
@rb-synth: did you manage to run it with normalbae?

yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
…to be useable for SD inpainting (huggingface#3533)

* Add default to inpaint

* Make sure controlnet also works with normal sd for inpaint

* Add tests

* improve

* Correct encode images function

* Correct inpaint controlnet

* Improve text2img inpanit

* make style

* up

* up

* up

* up

* fix more
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
…gface#3532)

* Throw error if strength adjusted num_inference_steps < 1

* Added new fast test to check ValueError raised when num_inference_steps < 1

when strength adjusts the num_inference_steps then the inpainting pipeline should fail

* fix huggingface#3487 initial latents are now only scaled by init_noise_sigma when pure noise

updated this commit w.r.t the latest merge here: huggingface#3533

* fix

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
…to be useable for SD inpainting (huggingface#3533)

* Add default to inpaint

* Make sure controlnet also works with normal sd for inpaint

* Add tests

* improve

* Correct encode images function

* Correct inpaint controlnet

* Improve text2img inpanit

* make style

* up

* up

* up

* up

* fix more
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
…gface#3532)

* Throw error if strength adjusted num_inference_steps < 1

* Added new fast test to check ValueError raised when num_inference_steps < 1

when strength adjusts the num_inference_steps then the inpainting pipeline should fail

* fix huggingface#3487 initial latents are now only scaled by init_noise_sigma when pure noise

updated this commit w.r.t the latest merge here: huggingface#3533

* fix

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants