Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support runwayML custom inpainting model #1243

Merged
merged 20 commits into from
Oct 27, 2022
Merged

Support runwayML custom inpainting model #1243

merged 20 commits into from
Oct 27, 2022

Conversation

lstein
Copy link
Collaborator

@lstein lstein commented Oct 25, 2022

Inpaint using the runwayML custom inpainting model

This is still a work in progress but seems functional. It supports inpainting, txt2img and img2img on the ddim, plms and k* samplers.

Installation

To test this, get the file sd-v1-5-inpainting.ckpt from https://huggingface.co/runwayml/stable-diffusion-inpainting and place it at models/ldm/stable-diffusion-v1/sd-v1-5-inpainting.ckpt

Usage

Launch invoke.py with --model inpainting-1.5 and proceed as usual. All the usual arguments and settings should work (but haven't been systematically tested)

Caveats

  1. The inpainting model takes about 800 Mb more memory than the standard 1.5 model. This model will not work on 4 GB cards.
  2. I think performance is a bit slower as well, but have not benchmarked.
  3. The inpainting model is temperamental. It wants you to describe the entire scene and not just the masked area to replace. So if you want to replace the parrot on a man's shoulder with a crow, the prompt "crow" may fail. Try "man with a crow on shoulder" instead. The symptom of a failed inpainting is that the area will be erased and replaced with background.
  4. When using img2img mode, the inpainting model really does not like to change the image much compared to standard 1.4 or 1.5. High configuration guidance scales, strengths and step counts are needed. This seems like a feature of the model, but I can't be sure.

This is still a work in progress but seems functional. It supports
inpainting, txt2img and img2img on the ddim and k* samplers (plms
still needs work, but I know what to do).

To test this, get the file `sd-v1-5-inpainting.ckpt' from
https://huggingface.co/runwayml/stable-diffusion-inpainting and place it
at `models/ldm/stable-diffusion-v1/sd-v1-5-inpainting.ckpt`

Launch invoke.py with --model inpainting-1.5 and proceed as usual.

Caveats:

1. The inpainting model takes about 800 Mb more memory than the standard
   1.5 model. This model will not work on 4 GB cards.

2. The inpainting model is temperamental. It wants you to describe the
   entire scene and not just the masked area to replace. So if you want
   to replace the parrot on a man's shoulder with a crow, the prompt
   "crow" may fail. Try "man with a crow on shoulder" instead. The
   symptom of a failed inpainting is that the area will be erased and
   replaced with background.

3. This has not been tested well. Please report bugs.
@lstein lstein marked this pull request as draft October 25, 2022 14:56
- The plms sampler now works with custom inpainting model
- Quashed bug that was causing generation on normal models to fail (oops!)
- Can now generate non-square images with custom inpainting model
- The plms sampler now works with custom inpainting model
- Quashed bug that was causing generation on normal models to fail (oops!)
- Can now generate non-square images with custom inpainting model

Credits for advice and assistance during porting:

@Any-Winter-4079 (http://github.com/any-winter-4079)
@db3000 (Danny Beer http://github.com/db3000)
@Any-Winter-4079
Copy link
Contributor

I'll test after class. Thanks!

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 25, 2022

There's a lot to test, but starting with basic txt2img.
The 1.5 inpainting model seems to be able to generate coherent images with txt2img. Nice! The output is different, but it's a different model with more training steps done (so it's somewhat expected).

Inpainting 1.5

"an anime girl" -s 50 -S 3031912972 -W 512 -H 512 -C 7.5 --fnformat {prefix}.{seed}.png -A plms
Screenshot 2022-10-25 at 19 50 06

1.4

Now, for regular 1.4, I see images have changed.
!switch stable-diffusion-1.4
"an anime girl" -s 50 -S 3031912972 -W 512 -H 512 -C 7.5 --fnformat {prefix}.{seed}.png -A plms
image

DDIM
"an anime girl" -s 50 -S 3031912972 -W 512 -H 512 -C 7.5 --fnformat {prefix}.{seed}.png -A ddim
image

So this will be the first thing I investigate.

@lstein
Copy link
Collaborator Author

lstein commented Oct 25, 2022

@Any-Winter-4079 The changes that you are seeing in the 1.4 model might actually be due to an unrelated PR I worked on a couple of days ago and merged last night (I'll have to look it up). Appallingly enough, it turned out that when you surround the prompt with quotation marks ("), the quotes were being passed to the generation engine. So "an anime girl" and an anime girl without the quotes, could give different results!

You might want to try the comparison without quotation marks in the prompts.

@lstein
Copy link
Collaborator Author

lstein commented Oct 25, 2022

This model is very intriguing. Even in straight img2img mode it is great at making targeted changes. For example, I can change the pattern of a subject's clothing from leopard to zebra print without changing their posture, face, or background. On the other hand, I can't make big changes, such as changing their posture or the overall style of the image, even with high step and CFG.

Outpainting, which I tested with the outcrop restoration module, works quite well with this model. Better than 1.4 by far.

@Any-Winter-4079
Copy link
Contributor

It seems to produce the same result with an without "" for me.

@lstein
Copy link
Collaborator Author

lstein commented Oct 25, 2022

So you're seeing differences using the 1.4 model between the PR and the pre-PR code base? I'll check it out on my own end. There were a bunch of fiddly changes, but hope I didn't inadvertently change the noise generation part of the code, which would most likely cause this.

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 25, 2022

So you're seeing differences using the 1.4 model between the PR and the pre-PR code base?

Yes, but I pulled other changes (not just this PR).
You can check using mps_noise, so you can recreate my own images.
Old prompts and results on Mac: https://github.com/invoke-ai/InvokeAI/blob/development/docs/help/SAMPLER_CONVERGENCE.md


Let me know if you can reproduce it, so we know it's not something on my end.

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 25, 2022

About 1.5-inpainting:
txt2img seems to work. From the limited experiments I've done, I'd say regular 1.5 is better than 1.5-inpainting (also, 1.4 and 1.5 are related)
image

img2img works. I'm still not sure what my conclusions are. 1.5 seems to do pretty poorly, so 1.5-inpainting is an improvement, but they all seem to have issues.
image

img2img with clipseg
image

@lstein
Copy link
Collaborator Author

lstein commented Oct 25, 2022

There are so many variations of parameters that it has hard to do an apples-to-apples comparison.

One conclusion that I've reached is that the strength option (-f) only makes things worse for the inpainting model. I am thinking of ignoring its value and using 1.0. I think it made sense to have this in the non-inpainting model because that model is blindly drawing on top of an image encoded in latent space and strength controls how much modification is allowed. In the inpainting model the model "understands" that it is replacing part of the image.

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 25, 2022

There are so many variations of parameters that it has hard to do an apples-to-apples comparison.

The above results are using the same parameters, including seeds, strength, source images, masks, etc. Everything is the same except for the model.

I'll try with strength 0.99 and keep testing.


Results with 1.5-inpainting:

img2img with clipseg:

Strength (-f) is ignored with clipseg. Using -f0.99 and -f0.01.
mirkerr.png
mirkerr

"blonde hair" -W512 -H512 -C7.5 -S3031912972 -I mirkerr.png -tm "hair" -f0.01
Screenshot 2022-10-26 at 00 26 54

"blonde hair" -W512 -H512 -C7.5 -S3031912972 -I mirkerr.png -tm "hair" -f0.99
Screenshot 2022-10-26 at 00 28 33

img2img without clipseg:
Mask:
mirkerr_mask

For starters, removing the hair seems to work much better than using clipseg (the hair is more blond!). But other than that, the strength value is ignored, as the results are the same.

One conclusion that I've reached is that the strength option (-f) only makes things worse for the inpainting model. I am thinking of ignoring its value and using 1.0.

Does -f affect the final result then, @lstein ? I would've thought we ignore -f given the results.
"miranda kerr with blonde hair" -W512 -H512 -C7.5 -S3031912972 -I mirkerr.png -M mirkerr_mask.png -f0.01 (but also -f0.99 and -f0.75)
image

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 25, 2022

Oh, by the way. When using inpaint_st.py, the other day I could load a Dreambooth .ckpt as main model (in my case, to define the style) and use 1.5 inpainting on top. So it's like using 2 models at the same same time.

I'm not sure how to do this now that 1.5 inpainting is the main model.
I mean, I would've sworn I was using both simultaneously. I''ll check again just to be sure.
...

I''ll check again just to be sure.

Yeah, not sure how could I have done it. It probably was Dreambooth .ckpt + regular img2img on that model.

@lstein
Copy link
Collaborator Author

lstein commented Oct 26, 2022

I’ve done img2img slightly wrong. I’m going to remove some code that over constrains the image. This should give us more variability.

The strength parameter is inappropriate for this model and will be disabled. Sorry for the confusion, but it’s taken me some time to realize how the pieces fit together.

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 26, 2022

1.5-inpaint after 906dafe
Original:
Screenshot 2022-10-26 at 12 44 24

Using mask:
"miranda kerr with blonde hair" -W512 -H512 -C7.5 -S3031912972 -I mirkerr.png -M mirkerr_mask.png
Screenshot 2022-10-26 at 12 36 22
vs. yesterday
Screenshot 2022-10-26 at 12 39 22

Using clipseg:
"blonde hair" -W512 -H512 -C7.5 -S3031912972 -I mirkerr.png -tm "hair"
Screenshot 2022-10-26 at 12 39 52
vs. yesterday
Screenshot 2022-10-26 at 12 40 40

btw I don't see clipseg in the output:
"blonde hair" -s 50 -S 3031912972 -W 512 -H 512 -C 7.5 --fnformat {prefix}.{seed}.png -I mirkerr.png -A k_lms -f 0.75

@lstein
Copy link
Collaborator Author

lstein commented Oct 26, 2022

This PR will break both --hires and --embiggen, as they reimplement some low-level image generation steps that don't work with the new model. If you try to use these switches they will be ignored.

I will fix these before marking the PR as ready for merging.

@lstein
Copy link
Collaborator Author

lstein commented Oct 26, 2022

No much of a difference between yesterday and today.

What happens to your test image when you raise -C modestly to anything between 10.0 and 15.0?

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 26, 2022

Using 1.5 inpaint and 906dafe
-C 7.5
Screenshot 2022-10-26 at 12 36 22
-C 15
Screenshot 2022-10-26 at 13 06 36

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 26, 2022

I barely see any change between yesterday and today (left side of image, hair is a bit different)
image
For the Miranda Kerr example, I prefer yesterday's (it abides more by the original image in hair length, etc.) but it's hard to draw meaningful conclusions. I'll test a bit more.

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 26, 2022

With the latest version, single word prompts seem to work.
"macaw" -s 50 -S 3096140878 -W 512 -H 512 -C 7.5 --fnformat {prefix}.{seed}.png -I 197632255-563dc05b-58cf-498b-88c2-8e2ae274b2a4.png -A k_lms -M 197632285-8d7f0f15-0c2d-4adb-8bb5-b9e4914a1de3.png -f 0.1
Screenshot 2022-10-26 at 13 18 26
Also using the same seed we used in inpaint_st.py (3).
Screenshot 2022-10-26 at 13 23 25

vs. what happened when we run inpaint_st.py (painted background)


With yesterday's version
"macaw" -s 50 -S 3096140878 -W 512 -H 512 -C 7.5 --fnformat {prefix}.{seed}.png -I 197632255-563dc05b-58cf-498b-88c2-8e2ae274b2a4.png -A k_lms -M 197632285-8d7f0f15-0c2d-4adb-8bb5-b9e4914a1de3.png -f 0.1
Screenshot 2022-10-26 at 13 30 21
And using the same seed we used in inpaint_st.py (3).
Screenshot 2022-10-26 at 13 31 50

I'm confused now. With yesterday's version, sometimes it works, sometimes it doesn't. So I guess w/ today's version, here it did work, but we can't guarantee it always works of course.

@Any-Winter-4079
Copy link
Contributor

Any-Winter-4079 commented Oct 26, 2022

I'm doing a small experiment with 20 images comparing yesterday's vs. today's code version, using single-word prompts && img2img.

Update:
Results
"macaw" -s 10 -W 512 -H 512 -C 7.5 --fnformat {prefix}.{seed}.png -I 197632255-563dc05b-58cf-498b-88c2-8e2ae274b2a4.png -A k_lms -M 197632285-8d7f0f15-0c2d-4adb-8bb5-b9e4914a1de3.png -n 20
Used 10 steps to speed it up, but we can see the macaws forming.

Yesterday's

image

Today's

image

It looks like it happens to both code versions (about 10% of the time).

This PR will break both --hires and --embiggen, as they reimplement some low-level image generation steps that don't work with the new model. If you try to use these switches they will be ignored.

About this, I've never even set up embiggen so I couldn't tell. This might be a good excuse to do so.

- change default model back to 1.4
- remove --fnformat from canonicalized dream prompt arguments
  (not needed for image reproducibility)
- add -tm to canonicalized dream prompt arguments
  (definitely needed for image reproducibility)
lstein added a commit that referenced this pull request Oct 27, 2022
This was a difficult merge because both PR #1108 and #1243 made
changes to obscure parts of the diffusion code.

- prompt weighting, merging and cross-attention working
  - cross-attention does not work with runwayML inpainting
    model, but weighting and merging are tested and working
- CLI command parsing code rewritten in order to get embedded
  quotes right
- --hires now works with runwayML inpainting
- --embiggen does not work with runwayML and will give an error
- Added an --invert option to invert masks applied to inpainting
- Updated documentation
@lstein lstein merged commit 9b71597 into development Oct 27, 2022
@lstein lstein deleted the inpaint-model branch October 27, 2022 06:06
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants