Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add training example for DreamBooth. #554

Merged
merged 35 commits into from
Sep 27, 2022
Merged

Add training example for DreamBooth. #554

merged 35 commits into from
Sep 27, 2022

Conversation

Victarry
Copy link
Contributor

Add DreamBooth training example.

One question is how to specify the identifier [V] of input prompt to bind with the concept of subject.
The original paper says using random sampling of rare-tokens to generate the identifier. Should we include this logic in the training script?
Currently I use sks as in https://github.com/XavierXiao/Dreambooth-Stable-Diffusion.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 18, 2022

The documentation is not available anymore as the PR was closed or merged.

@ghpkishore
Copy link

ghpkishore commented Sep 19, 2022

Hey @victarray I had asked the author of the paper and he replied this : The special token we create is different from Gal et al. - we create a rare token and then finetune the model instead of the text embedding.

Regarding incorporating the logic into this script, my guess is it is not required. This is a response given by @patrickvonplaten when I asked a similar query:

According to our philosophy: https://github.com/huggingface/diffusers/tree/main/examples#-diffusers-examples we don't want to provide "one-script-fits-it-all" examples but rather relatively simple scripts that one can easily tweak. In your case I'd highly recommend to go into the examples code to make the model trainable as well

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool @Victarry !
Great start, the script is looking good, but we need to address few things before merging.
Let some comment below. More specifically

  • We need to handle the class image generation in multi-gpu setting. I can help with this.
  • Wrap the text_encoder and vae in torch.no_grad as we don't train them.
  • Check if concatenating the batch for prior preservation loss causes issues in low memory GPUs. I can help here.

Apart from this, we can add a helper script to do rare token detection, that will be useful. I will look into it.

Let me know if you have any questions, thanks a lot!

examples/dreambooth/train_dreambooth.py Outdated Show resolved Hide resolved
examples/dreambooth/train_dreambooth.py Outdated Show resolved Hide resolved
examples/dreambooth/train_dreambooth.py Outdated Show resolved Hide resolved
examples/dreambooth/train_dreambooth.py Outdated Show resolved Hide resolved
examples/dreambooth/train_dreambooth.py Outdated Show resolved Hide resolved
examples/dreambooth/train_dreambooth.py Outdated Show resolved Hide resolved
examples/dreambooth/train_dreambooth.py Outdated Show resolved Hide resolved
examples/dreambooth/train_dreambooth.py Outdated Show resolved Hide resolved
Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, the PR looks good! Will run it on both single and multi-gpu to verify and then it should be good to merge. Thanks a lot for working on this.

examples/dreambooth/train_dreambooth.py Outdated Show resolved Hide resolved
examples/dreambooth/train_dreambooth.py Outdated Show resolved Hide resolved
@patil-suraj patil-suraj merged commit 3b747de into huggingface:main Sep 27, 2022
@ShivamShrirao
Copy link

ShivamShrirao commented Sep 27, 2022

Wow, Using the 8bit adam optimizer from bitsandbytes along with xformers reduces the memory usage to 12.5 GB.
Colab: https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb
Screenshot_20220927_213651
Code: https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/
Screenshot_20220927_185927

@chavinlo
Copy link
Contributor

chavinlo commented Sep 27, 2022

Wow, Using the 8bit adam optimizer from bitsandbytes along with xformers reduces the memory usage to 12.5 GB. Code: https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/ Screenshot_20220927_185927

I can confirm in even runs on colab free tier, T4 GPU.
nvidia-smi (before training):
Image
training, note the peak in VRAM:
Image

Edit: Failed on saving checkpoint, likely due to how it gets executed, but at least it works:

Traceback (most recent call last):
  File "train_dreambooth.py", line 606, in <module>
    main()
  File "train_dreambooth.py", line 408, in main
    unet.enable_gradient_checkpointing()
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1208, in __getattr__
    type(self).__name__, name))
AttributeError: 'UNet2DConditionModel' object has no attribute 'enable_gradient_checkpointing'
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=dog', '--class_data_dir=dog', '--output_dir=model', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=a photo of sks dog', '--class_prompt=a photo of dog', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=2', '--gradient_checkpointing', '--use_8bit_adam', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--max_train_steps=800']' returned non-zero exit status 1.

@Thomas-MMJ
Copy link

Hmm I'd of thought you'd mention my pull request on this...

ShivamShrirao#1

@ShivamShrirao
Copy link

@Thomas-MMJ oh hey, sorry I didn't see your pull request. I had it in mind to try it, just needed to sleep lol.

@vakker
Copy link

vakker commented Sep 27, 2022

This is really great, I played around a bit with it.
However, it produces quite low quality results with the default settings.
E.g. I tried to reproduce the results from the paper with this dog:
drawing

I used all 5 reference images from here.

The output for "A sleeping sks dog" is like:
montage-3

What made it work well for you?

Edit: better image montage

@rjadr
Copy link

rjadr commented Sep 28, 2022

Same here. Basically only A photo of sks delivers reasonable results. Any attempt to enhance quality, eg A photo of sks, trending on artstation already changes the semantics so sks is unidentifyable.

@n00mkrad
Copy link

Same here. Basically only A photo of sks delivers reasonable results. Any attempt to enhance quality, eg A photo of sks, trending on artstation already changes the semantics so sks is unidentifyable.

Same problem, my trained subject does not appear in 80% of results if I alter the prompt just a tiny bit.

@vakker
Copy link

vakker commented Sep 28, 2022

For the record, the correct prompt should be A photo of sks dog.
For me the results for that are (random 20 samples, the black pics are labelled as NSFW):

montage

@1blackbar
Copy link

sounds like overfitting which dreambooth was made to combat

@patil-suraj
Copy link
Contributor

Hi! If you see any issue in the script then please open an issue and for general discussions like this feel free to join the discord https://discord.gg/G7tWnz98XR.

@jslegers
Copy link

jslegers commented Sep 29, 2022

Same problem, my trained subject does not appear in 80% of results if I alter the prompt just a tiny bit.

Same here as well...

I tried to upload between 50 and 80 pics of myself, with 800 training steps.

A very basic prompt like photo of sks guy or Detailed portrait of sks guy produces somewhat reasonable results. It's kind-of hit-and-miss with some results looking a lot like me and some not at all, and most somewhere in between. When I use anything beyond such basic prompts, however, the results don't look even closely like me even a single time.

It seems there some missling link between the CompVis/stable-diffusion-v1-4 that was used to do the training and the model that was produced...

@patil-suraj
Copy link
Contributor

HI @jslegers and @n00mkrad could you please open an issue. Would be happy to take a look.

@jslegers
Copy link

jslegers commented Oct 2, 2022

HI @jslegers and @n00mkrad could you please open an issue. Would be happy to take a look.

The issue may just be related to poor choice of parameters.

Are you aware of any best practices regarding number of learning steps, number of class images generated, choice of class name & class prompt, choice of concept name, etc? I suspect my issues are more a matter of this than an issue with the actual code...

@Duemellon
Copy link

Are there any webui for Dreambooth yet? I mean, to use it to train.

@n00mkrad
Copy link

n00mkrad commented Oct 2, 2022

Are there any webui for Dreambooth yet? I mean, to use it to train.

Why would you want a GUI for training?

@jd-3d
Copy link

jd-3d commented Oct 2, 2022

Are there any webui for Dreambooth yet? I mean, to use it to train.

Why would you want a GUI for training?

I certainly would love one as I hate using the command prompt and there are a lot of steps to using dream booth. The GUI could even handle re-sizing of input photos to make things easier. And it could manage all the custom trained models.

@jhsu888
Copy link

jhsu888 commented Oct 2, 2022

Just wanted to say I've had a chance to try Shavim's colab notebook, and it worked great! I took 7 photos of myself around the house from different angles and in different lighting. I even changed my shirts. I was really surprised how fast the training took, ~10 min on a V100.

For anyone having trouble, I would suggest changing the token to your own name, or something that invokes what your subject is, I simply used my "firstnamlastname" as a token. And don't forget to change the name of the destination folder as well. I think some of the problems people might be having is from using the default "sks" which is actually a term for a type of rifle.

I had tried some initial tests using another set of random letters as the token, because I thought I would want something totally unique, but I feel like using my name actually gave more context for the model to draw from other faces associated with my name to help fill in the blanks.

I also didn't use the class at all, even though JoePenna's version says too use it, and I thought my results were very strong. Everything else was default for me.

My big request is having a .chkpt output so I can use it in other notebooks like Deforum and Warpfusion. I've heard rumor it's being worked on, so just want to add my support for the idea.

Another bonus would be to include a pruning function to compress the model further and take up less storage space, I have no idea if something is being implemented already. JoePenna's version is able to be compressed to 2GB, but I've heard his notebook takes more like ~1 hr to train. It would be great to have the best of both worlds.

Thanks for developing this, it's pretty amazing!

@jslegers
Copy link

jslegers commented Oct 3, 2022

My big request is having a .chkpt output so I can use it in other notebooks like Deforum and Warpfusion. I've heard rumor it's being worked on, so just want to add my support for the idea.

The first converter scripts are popping up already. See AUTOMATIC1111/stable-diffusion-webui#1429 (comment)_

I haven't tried any yet, but they look promising...

@jhsu888
Copy link

jhsu888 commented Oct 3, 2022

My big request is having a .chkpt output so I can use it in other notebooks like Deforum and Warpfusion. I've heard rumor it's being worked on, so just want to add my support for the idea.

The first converter scripts are popping up already. See AUTOMATIC1111/stable-diffusion-webui#1429 (comment)_

I haven't tried any yet, but they look promising...

Awesome, I'll check them out. Thanks!

@Duemellon
Copy link

Are there any webui for Dreambooth yet? I mean, to use it to train.

Why would you want a GUI for training?

#1 - Command line entries are archaic. We moved past those for the typical user in the late 80s. Syntax errors, bad prompt "grammar", etc. just inhibits wide use

#2 - There are a lot of features that can be automated that way as mentioned by jd-3d there are image conversions, data validations, runtime estimations, and batching you can do that way.

#3 - The more people who can use this...
The more get created...
The quicker everything get cataloged...
The sooner things are mostly converted to be at your fingertips

It's a win, all around
The quicker

@affableroots
Copy link

To users claiming bad results, I wonder if the dreambooth example is actually flawed,
I wrote some findings here #712
And maybe someone brighter than me can comment?

@jslegers
Copy link

jslegers commented Oct 4, 2022

I've been testing ShivamShrirao's fork for several days now, with my own Google Colab notebook to add some sprinkles on top.

For initial learning I jused used my full name "johnslegers" as a concept and "man" as a class. Then I tried to retrain the output model with different input pics, different class pics & different prompt settings to see test if it's possible to finetune a finetuned model further for the same concept.

Using this strategy, I've managed to generate some pretty decent renders of my younger self...

Some of the renders generated

image

Actual photos of me used as input for the training process

image

@patil-suraj
Copy link
Contributor

Thanks a lot for sharing this @jslegers !

@vakker
Copy link

vakker commented Oct 18, 2022

There's this site for running dreambooth: http://fine-tune-sd.com/

prathikr pushed a commit to prathikr/diffusers that referenced this pull request Oct 26, 2022
* Add training example for DreamBooth.

* Fix bugs.

* Update readme and default hyperparameters.

* Reformatting code with black.

* Update for multi-gpu trianing.

* Apply suggestions from code review

* improgve sampling

* fix autocast

* improve sampling more

* fix saving

* actuallu fix saving

* fix saving

* improve dataset

* fix collate fun

* fix collate_fn

* fix collate fn

* fix key name

* fix dataset

* fix collate fn

* concat batch in collate fn

* add grad ckpt

* add option for 8bit adam

* do two forward passes for prior preservation

* Revert "do two forward passes for prior preservation"

This reverts commit 661ca46.

* add option for prior_loss_weight

* add option for clip grad norm

* add more comments

* update readme

* update readme

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* add docstr for dataset

* update the saving logic

* Update examples/dreambooth/README.md

* remove unused imports

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
@rmac85
Copy link

rmac85 commented Nov 25, 2022

The inference cell on this colab is broken

/usr/local/lib/python3.7/dist-packages/diffusers/utils/deprecation_utils.py:35: FutureWarning: The configuration file of this scheduler: DDIMScheduler {
"_class_name": "DDIMScheduler",
"_diffusers_version": "0.9.0.dev0",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"clip_sample": false,
"num_train_timesteps": 1000,
"prediction_type": "epsilon",
"set_alpha_to_one": false,
"steps_offset": 0,
"trained_betas": null
}
is outdated. steps_offset should be set to 1 instead of 0. Please make sure to update the config accordingly as leaving steps_offset might led to incorrect results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull request for the scheduler/scheduler_config.json file
warnings.warn(warning + message, FutureWarning)
/usr/local/lib/python3.7/dist-packages/diffusers/utils/deprecation_utils.py:35: FutureWarning: The configuration file of the unet has set the default sample_size to smaller than 64 which seems highly unlikely .If you're checkpoint is a fine-tuned version of any of the following:

  • CompVis/stable-diffusion-v1-4
  • CompVis/stable-diffusion-v1-3
  • CompVis/stable-diffusion-v1-2
  • CompVis/stable-diffusion-v1-1
  • runwayml/stable-diffusion-v1-5
  • runwayml/stable-diffusion-inpainting
    you should change 'sample_size' to 64 in the configuration file. Please make sure to update the config accordingly as leaving sample_size=32 in the config might lead to incorrect results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull request for the unet/config.json file
    warnings.warn(warning + message, FutureWarning)

AttributeError Traceback (most recent call last)
in
7
8 scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
----> 9 pipe = StableDiffusionPipeline.from_pretrained(model_path, scheduler=scheduler, safety_checker=None, torch_dtype=torch.float16).to("cuda")
10
11 g_cuda = None

2 frames
/usr/local/lib/python3.7/dist-packages/diffusers/pipeline_utils.py in register_modules(self, **kwargs)
147 register_dict = {name: (None, None)}
148 else:
--> 149 library = module.module.split(".")[0]
150
151 # check if the module is a pipeline module

AttributeError: 'list' object has no attribute 'module'

It was working fine yesterday. I have not yet tried training a new model, but I assume generating samples may cause the same issues.

@patrickvonplaten
Copy link
Contributor

Hey @rmac85,

Could you please open a new issue? Note that this is a merged PR so we won't look into comments here anymore. If you open a new issue we're more than happy to take a look!

Copy link

@abhinavsrepository abhinavsrepository left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DreamBooth is a deep learning-based tool that can be used to personalize existing text-to-image models. It works by fine-tuning a text-to-image model on a few images of a specific subject. This allows the model to learn the unique characteristics of the subject and generate more personalized and realistic images of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet