Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of inpaint doesn't work for Stable Diffusion 2.1 #60

Closed
Emulator000 opened this issue May 3, 2023 · 2 comments · Fixed by #63
Closed

Example of inpaint doesn't work for Stable Diffusion 2.1 #60

Emulator000 opened this issue May 3, 2023 · 2 comments · Fixed by #63

Comments

@Emulator000
Copy link

I'm trying the same example for the 2.1 configuration, downloaded the appropriate CLIP, UNET and VAE and converted them correctly but it does not seems to work.

Command:

cargo run --example stable-diffusion-inpaint --features clap -- --sd-version="v2-1" --prompt "Face of a yellow cat, high resolution, sitting on a park bench." --input-image="temp/dog.png" --mask-image="temp/dog_mask.png" --width=512 --height=512

This is the output that i get with the dog/cat example that works perfectly with Stable Diffusion 1.5:
image

It seems that the inpainted area doensn't correctly populate.

Any possible reason for that? Should the example/code be adapted for some additional steps for the 2.1 version?

@LaurentMazare
Copy link
Owner

No clue what is going on here, I also tried it and got the same results using the weights from stabilityai/stable-diffusion-2-inpainting. I also tried the native resolution of 768x768 without luck. Spotting the json config files for the different versions, I haven't noticed anything that would obviously require some adaptation. I guess at this point the simpler would likely be to run the inpainting process on the Python and Rust side and see at which layer things start to diverge.

@LaurentMazare
Copy link
Owner

Ah it seems that actually one difference was that the scheduler for stable-diffusion 2.1 uses a prediction type of v_prediction in normal generation but uses a prediction type of epsilon for inpainting (whereas stable-diffusion 1.5 uses epsilon for both). I've just merged a PR #63 that should hopefully help with this - at least on a single generated image it looks better now.
sd_final

There is still a scheduler inconsistency as we use DDIM rather than PNDM - also using a proper DPM solver would likely help here but hopefully this doesn't make much of a difference.
Please give a spin to the current github tip if you can and let us know how it gets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants