Skip to content

Conversation

patil-suraj
Copy link
Contributor

@patil-suraj patil-suraj commented Sep 5, 2022

This PR adds fine-tuning script for StableDiffusion

@patil-suraj patil-suraj changed the title [WIP] stabel diffusion finetuning [WIP] stable diffusion fine-tuning Sep 5, 2022
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 5, 2022

The documentation is not available anymore as the PR was closed or merged.

@minimaxir
Copy link

Is it a good idea to finetune the entire unet? It seems like there is a risk of catastrophic forgetting, particularly if the input data is small and/or uniform. (this was an issue when I finetuned ruDALL-E on Pokemon; too long training and it ignored any specificity in the prompts)

It might be worth it to allow parametric freezing of embeddings and/or early layers of the unet as CLI args down the line, depending on how well this finetuning works.

@patil-suraj
Copy link
Contributor Author

patil-suraj commented Sep 7, 2022

Hey @minimaxir ! That's a good point. We are doing some initial runs now and and we could indeed allow parametric freezing of certain layers if we run into these issues.
Also, some community members have tried fine-tuning the full model and observed that the model can still retain its ‘general’ knowledge. cc @justinpinkney

@justinpinkney
Copy link

@minimaxir After too long fine-tuning catastrophic forgetting def does happen, but long enough that you can represent the target domain well before forgetting everything. I also think borrowing some of the training tricks from Dream Booth is probably a good way to try and retain the generality.

@ezhang7423
Copy link

Hi, could we get a sample command that we could use to try this script out?

@minimaxir
Copy link

@justinpinkney thanks for the info! Will be curious to see how things pan out (and of course thanks to the well-commented code it's pretty easy for the user to manually freeze layers if needed)

@ghost
Copy link

ghost commented Sep 11, 2022

I am eager to do some fine-tuning, but I guess this is still non-functional/in early stages? I am only able to produce black images, even after bypassing the safety filters. I will have to do a lot of catching up before I would be able to meaningfully contribute towards progress on this, but I'll keep an eye out on progress for this !

if global_step >= args.max_train_steps:
break

# Create the pipeline using the trained modules and save it.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the default configuration save a checkpoint on any type of interval in case of interruptions?

I looked through the script and I don't see any config for it, but I'm not familiar with the accelerator training lib.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not added yet, but will add support for it in a follow-up PR.

@patrickvonplaten
Copy link
Contributor

This is good to merge for me. @anton-l @pcuenca do you want to take a final look?

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I just have some questions about the types of datasets we support.

@patil-suraj patil-suraj merged commit 66a5279 into main Oct 11, 2022
@patil-suraj patil-suraj deleted the finetune-txt2img branch October 11, 2022 17:03
prathikr pushed a commit to prathikr/diffusers that referenced this pull request Oct 26, 2022
* begin text2image script

* loading the datasets, preprocessing & transforms

* handle input features correctly

* add gradient checkpointing support

* fix output names

* run unet in train mode not text encoder

* use no_grad instead of freezing params

* default max steps None

* pad to longest

* don't pad when tokenizing

* fix encode on multi gpu

* fix stupid bug

* add random flip

* add ema

* fix ema

* put ema on cpu

* improve EMA model

* contiguous_format

* don't warp vae and text encode in accelerate

* remove no_grad

* use randn_like

* fix resize

* improve few things

* log epoch loss

* set log level

* don't log each step

* remove max_length from collate

* style

* add report_to option

* make scale_lr false by default

* add grad clipping

* add an option to use 8bit adam

* fix logging in multi-gpu, log every step

* more comments

* remove eval for now

* adress review comments

* add requirements file

* begin readme

* begin readme

* fix typo

* fix push to hub

* populate readme

* update readme

* remove use_auth_token from the script

* address some review comments

* better mixed precision support

* remove redundant to

* create ema model early

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* better description for train_data_dir

* add diffusers in requirements

* update dataset_name_mapping

* update readme

* add inference example

Co-authored-by: anton-l <anton@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
@j-min
Copy link

j-min commented Nov 30, 2022

Hi, thanks for adding the finetuning script!
Could you please guide how I should edit the script to jointly fine-tune both unet and text encoder? I'm not familiar with the accelerator package.

@tengshaofeng
Copy link

tengshaofeng commented Feb 6, 2023

@patil-suraj @HuggingFaceDocBuilderDev @minimaxir @justinpinkney @ezhang7423 , Thanks for your ideas. I have learned so much from that. I still have a question. How many steps should I finetune the stable diffusion model with 10000images? To observe the loss changing, and when loss is not descend, then it is time to end the training? But I do not think the smaller the loss, the better performs.

@aihao2000
Copy link
Contributor

@patil-suraj I have some doubts as to why model.train() should be added at the beginning of each epoch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.