Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require more information regarding inpainting #5

Open
ghpkishore opened this issue Feb 24, 2023 · 11 comments
Open

Require more information regarding inpainting #5

ghpkishore opened this issue Feb 24, 2023 · 11 comments

Comments

@ghpkishore
Copy link

ghpkishore commented Feb 24, 2023

Hi!

@haofanwang

I am trying to understand how to perform inpaintiing with control net which you had mentioned in the third part of the Read me. I have gotten extremely poor results in comparison to normal inpainting with the same model, and therefore feel something is not quite right in the code. I tried with the canny edge detection annotator

The way my process is structured is that I get an input image and a mask, which is what is required for typical inpainting pipeline.
Then using the annotators, I create a canny image of a reference image, which I pass as the control hint.

However, the output in no way ensures the fidelity of the original input and also doesn't follow the canny edge properly.

Code is below:

import torch
from diffusers.utils import load_image
from diffusers import StableDiffusionInpaintPipeline, StableDiffusionControlNetInpaintPipeline
from annotator.util import resize_image, HWC3
from annotator.canny import CannyDetector
import PIL
from PIL import Image
import cv2
import einops
import gradio as gr
import numpy as np
import torch
import random

def getCannyImage(input_image,image_resolution, low_threshold, high_threshold):
    input_image=np.array(input_image)
    apply_canny = CannyDetector()
    with torch.no_grad():
        img = resize_image(HWC3(input_image), image_resolution)
        H, W, C = img.shape
        detected_map = apply_canny(img, low_threshold, high_threshold)
        detected_map = HWC3(detected_map)
    cannyImageNumpy =[255-detected_map]
    cannyImage=Image.fromarray(cannyImageNumpy[0])
    return cannyImage 

pipe_control = StableDiffusionControlNetInpaintPipeline.from_pretrained("models/control_sd15_canny",torch_dtype=torch.float16).to('cuda')
pipe_inpaint = StableDiffusionInpaintPipeline.from_pretrained("models/stable-diffusion-inpainting",torch_dtype=torch.float16).to('cuda')
pipe_control.unet = pipe_inpaint.unet
pipe_control.unet.in_channels = 4



image = load_image("./inputImages/input.png")
mask = load_image("./inputImages/mask.png")
canny_input=load_image("./inputImages/reference.png")

# Canny edge detection parameters
image_resolution, low_threshold, high_threshold=512,100,200
cannyImage=getCannyImage(canny_input,image_resolution, low_threshold, high_threshold)
control_image = cannyImage
control_image = control_image

image = pipe_control(prompt=" Woman performing Yoga on a tennis court", 
                     negative_prompt="lowres, bad anatomy, worst quality, low quality",
                     controlnet_hint=control_image, 
                     image=image,
                     mask_image=mask,
                     num_inference_steps=100).images[0]

image.save("inpaint_canny.jpg")

Reference and Input Images are also below, mask is essentially mask of the input image:
reference
input

Please let me know if I am missing something.

@haofanwang
Copy link
Owner

haofanwang commented Feb 24, 2023

It just looks fine to me. @ghpkishore

(1) To make sure everything goes well, could you first try our example with segmentation and check you can get same result as ours.

(2) Could you post your control_image and mask here? The mask should be binary.

@ghpkishore
Copy link
Author

ghpkishore commented Feb 24, 2023

HI @haofanwang I did check whether the mask was binary, and it initially wasn't. Then i fixed it to ensure that it is, even after that it is not functioning correctly. I am adding all the five images here. Input, mask, reference image for canny, canny output, and final output

(1) To make sure everything goes well, could you first try our example with segmentation and check you can get same result as ours. - Yes i tried with your example, it worked correctly.

I do not know if it is because I am painting over a much larger area and maybe that is why it fails. But in normal inpainting it works.

Tennis_reference
Tennis_input
Tennis_canny
Tennis_output
Tennis_mask

@haofanwang
Copy link
Owner

I will try your images to check what's going wrong.

@haofanwang
Copy link
Owner

haofanwang commented Feb 24, 2023

I can reproduce it. But it seems to be related to the basemodel. Even if I directly use the official demo, you can see the face is still distorted. So, don't worry, you should already be on the right way. By the way, which normal inpainting model do you use? Is it also a stable-diffusion model? If so, you can use it via our script. @ghpkishore

截屏2023-02-24 19 12 28

@ghpkishore
Copy link
Author

ghpkishore commented Feb 24, 2023

Thanks @haofanwang . Is there anyway I can fix it then? I should be able to use SD 2.1 inpainting model if need be right? Also, regarding the binarization of mask, I do not feel it is necessary, as your code already has that in place inside of the prepare_mask_and_masked_image function.

The model I was using is runwayml inpainting model, the one you showcases above

@haofanwang
Copy link
Owner

I'm not sure whether the naming is consistent between SD1.5 and SD 2.1, if so, yes. You can have a try, just report here if it fails, I can support it once I have time, or it would be much appreciated if you can help with it. @ghpkishore

@ghpkishore
Copy link
Author

@haofanwang It doesn't work directly with the SD2.1 inpainting model. Need to figure out why. The error I got was
"mat1 and mat2 shapes cannot be multiplied (154x768 and 1024x320) "

I will try with different sets of control nets for inpainting with 1.5 and then move on to 2.1.

Also is there a possible to tack on multiple different condition nets together? I know that it is under open discussion for controlnet library, however, since it is possible in t2i adapter,I would want to figure out if inpainting is possible with it. Seems like it should be similar file to control net and inpainting integration.

@haofanwang
Copy link
Owner

haofanwang commented Feb 24, 2023

For SD2.1, it may not be compatible with existing ControlNet models that trained on SD1.5.

For multi-control, I haven't support it yet. I know that the team from diffusers are working on it, so I don't want to make this project too heavy. But I will take a look on it.

@ghpkishore
Copy link
Author

Oh wow, thanks for letting me know. I just started to look into trying to figure out where the difference is coming from. I thought if I can identify which part of the code I am getting the matmul error, I would be able to fix it. I didn't know that it could be due to difference in training models

@ghpkishore
Copy link
Author

ghpkishore commented Feb 24, 2023

@haofanwang I do not think it is the model which is at fault. I really think there is something wrong with the canny edge implementation of mine

I tried with the segmentation model, it seemed to work. Therefore something seems off with canny edge and my code. Will work again

@haofanwang
Copy link
Owner

haofanwang commented Feb 24, 2023

To verify, you can just use the web demo to generate a canny edge image, check whether it is same as yours, and whether this canny image can solve your problem. Don't dive into code directly that is really maddening. @ghpkishore

Also, as mentioned in #10, the converting may lead to unexpected result due to some unknown reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants