Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dreambooth support #995

Closed
Any-Winter-4079 opened this issue Oct 8, 2022 · 13 comments
Closed

Dreambooth support #995

Any-Winter-4079 opened this issue Oct 8, 2022 · 13 comments
Labels
dreambooth enhancement New feature or request

Comments

@Any-Winter-4079
Copy link
Contributor

This issue is to discuss Dreambooth (whether fully integrating it in this repo -training and inference-, or training via 3rd party, for example Colab, and doing inference in this repo).
Discussion and comparison with regular Textual Inversion is also encouraged.

@Any-Winter-4079 Any-Winter-4079 added the enhancement New feature or request label Oct 8, 2022
@hipsterusername
Copy link
Member

+1

Will also note there have been discussions of making it easy to generate (or import) new concepts from the WebUI. Should support both textual inversion & dreambooth, and plans include having a "library" of these for ongoing use.

I think, given the purpose and intent of this repo, full integration should be the aim.

@Any-Winter-4079
Copy link
Contributor Author

Interesting comparison:
https://www.reddit.com/r/StableDiffusion/comments/xjlv19/comparison_of_dreambooth_and_textual_inversion/

What I've noticed:
Textual inversion:

Excels at style transfer. "elephant in the style of Marsey"

May benefit from more images. My run with 74 images performed better than the one with 3

Best results (both in terms of style transfer and character preservation) at ~25,000 steps

DreamBooth (model download):

Far, far better for my use case. The character is more editable and the composition improves. It doesn't match the art style quite as well, though.

3 images worked better than 72

works extremely well with cross-attention prompt2prompt (the "img2img alternative test" script in automatic1111's UI)

1,000 steps (~30min on an A6000) is sufficient for good results

Worth mentioning - it's usable with deforum for animations

Combining the two doesn't seem to work, unfortunately. The next step might be either to directly finetune the network itself and apply one of these techniques afterwards, or possibly training the classifier.

@Any-Winter-4079
Copy link
Contributor Author

My best success at Textual Inversion was with 3-5 images, so I'll try with many more.
Also, it seems Dreambooth may better preserve the character.
Results from post above:
Dreambooth
Screenshot 2022-10-09 at 16 55 18
Textual Inversion
Screenshot 2022-10-09 at 16 55 32

@tildebyte
Copy link
Contributor

@tildebyte
Copy link
Contributor

If you're me (or like me), and you're wondering about the difference between the two, AFAUI (big caveat), TI can't do things like taking a fine-tuned (Dreambooth) model which is capable of creating these

image

and creating this

image

which is just utterly amazing

@Any-Winter-4079
Copy link
Contributor Author

@tildebyte I tried this Colab (the one in the reddit post you share), but in the free tier, you have to pass --use_8bit_adam and not use full precision, which seemingly affects quality (I tried training with images of myself and it performed worse than TI).

It was a few days ago, so I may try again, just to see if they've introduced any other improvement.
Did you have any success with that Colab?

@tildebyte
Copy link
Contributor

tildebyte commented Oct 9, 2022

Colab doesn't have a free tier anymore (AFAIU); I haven't touched it since the change.

Oh, nvm. There's still a free tier, but they've added a "pay-as-you-go" tier.

TL;DR - I haven't done anything in Colab since SD dropped 😁

@Any-Winter-4079
Copy link
Contributor Author

I think there is potential in TI, but I want to get Dreambooth to work, and compare them.

@tildebyte Here are some TI results, for comparison with what you refer to in #995 (comment)
Original images: #517 (comment). I used 5 of the 7 images to train.
After training TI with this repo (changing to num_vectors_per_token: 6 in v1-inference.yaml), these are some of the results I can achieve.

Workflow 1

Use txt2img playing with prompt weighting (N)

txt2img
"a painting of * :N on the beach in the style of van gogh" -s 50 -S 989419747 -W 512 -H 512 -C 7.5 -A k_lms for several N values.
image

Workflow 2

Obtain a concept with txt2img and swap it with * using img2img playing with different -f (strength) values

txt2img
"close-up of a man low poly" -s 50 -S 3319463269 -W 512 -H 512 -C 7.5 -A k_lms
Screenshot 2022-10-09 at 20 15 00
img2img
"close-up of * low poly" -I outputs/preflight/000310.3319463269.png -S3319463269 for several -f values
image

txt2img
"Funky pop african man face figurine, product studio shot, on a white background, diffused lighting, centered" -s 50 -S 3231549968 -W 512 -H 512 -C 7.5 -A k_lms
Screenshot 2022-10-09 at 22 19 47
inspired by https://publicprompts.art/funky-pop/
img2img
Funky pop * face figurine, product studio shot, on a white background, diffused lighting, centered -I outputs/preflight/000384.3231549968.png -S3231549968 for several -f values
image

(By the way, the training images in #517 (comment) were personally created by the author of the comment, and the character will appear on some project -which I find pretty cool- so you can use them to test TI but don't share them as training set)

I've also managed to train using a couple other concepts (hamburger, dog cartoon and myself), but results weren't that good. For people, I wonder if using some 'beautify' filter on the training set wold help, to make sure the face looks closer to a 3D character -very smooth, so it doesn't have to learn patterns inside the face, e.g. cheek colors, smiles, wrinkles, etc. and can focus on learning more generally the face shape).

@Any-Winter-4079
Copy link
Contributor Author

All in all, I tend to prefer the "txt2img then img2img" workflow

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Oct 9, 2022

@tildebyte But yeah, the word out there seems to be Dreambooth is better or easier to use. For example, this is a quick attempt (using txt2img && img2img) at something akin to your pictures above, and you can see how the style is starting to get lost, and colors start to appear.
image
(Probably having specified a male face in txt2mg would have helped, to be fair)

It may still be possible to do these things with Textual Inversion, but it may require a more complex workflow, while Dreambooth may be easier to use.

Still, if someone has successfully used free Colab for Dreambooth and has had success, you are encouraged to share it here! The more info we have (what Colab, obviously, but also number of training images, lighting, closeness...), the better for us to implement it in the repo and document it.

Update:
Quick example with male (same problem)
image

@bbecausereasonss
Copy link

Dreambooth is INCREDIBLE. No contest.

@Any-Winter-4079
Copy link
Contributor Author

@Any-Winter-4079
Copy link
Contributor Author

Any-Winter-4079 commented Oct 17, 2022

Update: I've tried https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb a bunch of times and I can't get it to work well. It does run, and the result has some resemblance, but that's all.

Last attempt was with 50 training images and 150 reg images. --max_train_steps=2000, --use_8bit_adam and --gradient_checkpointing as well for the free Colab.

@Millu Millu closed this as not planned Won't fix, can't repro, duplicate, stale Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dreambooth enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants