Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image inpainting #147

Open
krrishdholakia opened this issue Jul 31, 2022 · 7 comments
Open

Image inpainting #147

krrishdholakia opened this issue Jul 31, 2022 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@krrishdholakia
Copy link

Hi,

2 quick questions around this:

  • Is there any colab / guiding doc around leveraging this model for image inpainting?
  • Given a source person + t-shirt image, how can i use a guided text prompt (i.e. "show person wearing this t-shirt") to generate an image of the same?
@krrishdholakia
Copy link
Author

Did some further research:

  • so if i have the cloth mask, cloth image, human image, human image parsed, human pose --> what is a way i can concatenate these together to present the input image to the diffuser model, and have it generate an output and then match that against the expected output?

  • ideally, i could just concatenate the cloth image + human image and check output against the expected one.

open to thoughts/ ways of doing this .

@anton-l anton-l self-assigned this Aug 9, 2022
@anton-l anton-l added the enhancement New feature or request label Aug 9, 2022
@krrishdholakia
Copy link
Author

hi @anton-l,

just wanted to circle back to this. I'm not sure how i could concat the 2 images and pass that + output through the diffusion model. Curious if you might have any ideas for how to approach this?

cc: @patrickvonplaten, @patil-suraj

@anton-l
Copy link
Member

anton-l commented Aug 13, 2022

Hi @krrishdholakia! By setting in_channels and out_channels in the UNet configuration you can adapt it to concatenated input and outputs, e.g. in_channels=6 for two concatenated input images.

@krrishdholakia
Copy link
Author

krrishdholakia commented Aug 14, 2022

@anton-l How would you calculate loss at the interim stages for this? since you want it to generate a target image different (i.e. person wearing the clothing) from the concatenated images (clothing item + source person image)

# Predict the noise residual noise_pred = model(noisy_images, timesteps)["sample"] loss = F.mse_loss(noise_pred, noise) accelerator.backward(loss)

@krrishdholakia
Copy link
Author

hey @anton-l just wanted to follow up on this

cc: @patil-suraj @patrickvonplaten

@anton-l
Copy link
Member

anton-l commented Aug 29, 2022

@krrishdholakia the idea would be to feed the concatenated clothing + person images (6 channels), and have 6 channels as output as well (since the number of channels needs to match to compute the residuals). Then the first (or last) 3 channels of the output would be your predicted clothed person, and the other 3 channels can be discarded (not used for the loss calculation). This is similar to how super-resolution is done with diffusion models.

@patil-suraj
Copy link
Contributor

Hey @krrishdholakia not quite what you're looking for, but we now have an in-painting example with stable diffusion here https://github.com/huggingface/diffusers/tree/main/examples/inference#in-painting-using-stable-diffusion

PhaneeshB pushed a commit to nod-ai/diffusers that referenced this issue Mar 1, 2023
* Add SharkDownloader for user

* Change tank_url to gs://shark_tank
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants