[Examples] Add support for Min-SNR weighting strategy for better convergence #2899

sayakpaul · 2023-03-30T05:57:10Z

[1] introduces the Min-SNR weighting strategy to rebalance the loss when training the diffusion model for faster convergence. The authors attribute the difficulty in getting diffusion models to converge faster to varying degrees of timesteps in the noise scheduling process. So, they introduce a simple way to balance the losses of the individual samples.

This PR refers to [1] and [2] to incorporate the Min-SNR weighting strategy in the text-to-image fine-tuning script. I believe this rebalancing can be incorporated in other examples too, where we fine-tune the Diffusion model (i.e., the UNet).

My experimentation results are available here: https://wandb.ai/sayakpaul/text2image-finetune-minsnr

Overall, this strategy helps to keep the loss surface less bumpy: https://wandb.ai/sayakpaul/text2image-finetune-minsnr/reports/train_loss-23-04-04-08-49-34---VmlldzozOTY3ODQ2

The number of training samples is definitely a factor that can make the effect of this strategy less pronounced. But, overall, I think it's nice to have as it's directly related to overcoming the training instabilities of diffusion models.

@patil-suraj if we're okay with this change, I will add a section on it in the README and also in https://huggingface.co/docs/diffusers/training/text2image. Let me know.

References

[1] Paper reference: https://arxiv.org/abs/2303.09556
[2] Code: https://github.com/TiankaiHang/Min-SNR-Diffusion-Training

HuggingFaceDocBuilderDev · 2023-03-30T06:02:09Z

The documentation is not available anymore as the PR was closed or merged.

…7bf4c24dee06b1a4a2f5c050\#commitcomment-106913193

patrickvonplaten

@patil-suraj can you take a look here?

sayakpaul · 2023-03-31T12:38:24Z

@patil-suraj can you take a look here?

Actually, I would suggest not to. I am still gathering evidence to see if the PR is worth merging or even reviewing. After I am done, I will update here. Till then, don't worry about it.

patrickvonplaten · 2023-04-04T14:42:51Z

examples/text_to_image/train_text_to_image.py

+    return fn
+
+
+def log_validation(vae, text_encoder, tokenizer, unet, args, accelerator, weight_dtype, epoch):


is this related to this PR title or does it just add logging?

The PR could have been without this utility but it was important to have it in the PR because otherwise, it was difficult to validate the effectiveness of the method.

FWIW, I am not a fan of adding unrelated changes in a PR but this seemed important.

examples/text_to_image/train_text_to_image.py

patrickvonplaten · 2023-04-04T14:50:19Z

Would be great to stick to a more imperative coding style which we prefer in libraries like diffusers and transformers (https://stackoverflow.com/questions/21895525/python-programming-functional-vs-imperative-code)

I don't think it's very easy to follow if a function returns another function etc... this is a bit too jaxy/TF-like to me

sayakpaul · 2023-04-05T04:26:31Z

@patrickvonplaten the latest changes should have addressed all your concerns. PTAL.

patil-suraj

Thanks a lot for adding support for this! The loss curve does look smoother than without using min-snr-weighting. Just left a small comment about v-prediction.

It would also be cool to add this train_unconditional.py, it would be easy to verify this there since we train from scratch.

examples/text_to_image/train_text_to_image.py

patil-suraj · 2023-04-05T09:20:38Z

examples/text_to_image/train_text_to_image.py

+                    mse_loss_weights = (
+                        torch.stack([snr, args.snr_gamma * torch.ones_like(timesteps)], dim=1).min(dim=1)[0] / snr
+                    )
+                    # We first calculate the original loss. Then we mean over the non-batch dimensions and
+                    # rebalance the sample-wise losses with their respective loss weights.
+                    # Finally, we take the mean of the rebalanced loss.
+                    loss = F.mse_loss(model_pred.float(), target.float(), reduction="none")
+                    loss = loss.mean(dim=list(range(1, len(loss.shape)))) * mse_loss_weights
+                    loss = loss.mean()


patil-suraj · 2023-04-05T09:22:02Z

examples/text_to_image/train_text_to_image.py

+                    # Compute loss-weights as per Section 3.4 of https://arxiv.org/abs/2303.09556.
+                    # Since we predict the noise instead of x_0, the original formulation is slightly changed.


some models (sd2.1 and above) use v-prediction, does this formulation also work with v-prediction?

Yes sir.

https://github.com/TiankaiHang/Min-SNR-Diffusion-Training/blob/521b624bd70c67cee4bdf49225915f5945a872e3/guided_diffusion/gaussian_diffusion.py#L854

sayakpaul · 2023-04-05T09:32:49Z

It would also be cool to add this train_unconditional.py, it would be easy to verify this there since we train from scratch.

Will run an experiment and add a PR for that.

Thanks for approving. Now, I will:

add a section on it in the README and also in https://huggingface.co/docs/diffusers/training/text2image.

sayakpaul · 2023-04-05T11:53:32Z

@patil-suraj when you get a moment could you review the changes introduced in 96e7254? All of them are related to documentation.

I think then we can merge. @patrickvonplaten maybe you also want to take a look.

sayakpaul · 2023-04-05T11:54:37Z

examples/text_to_image/train_text_to_image.py

+}
+
+
+def log_validation(vae, text_encoder, tokenizer, unet, args, accelerator, weight_dtype, epoch):


Added this method as a part of the PR as well. Handles EMA offload and unload properly to ensure inference is being done with the EMA'd checkpoints.

patil-suraj

Thanks a lot for adding the doc. Looks good!

docs/source/en/training/text2image.mdx

patil-suraj · 2023-04-05T12:26:29Z

docs/source/en/training/text2image.mdx

+
+We support training with the Min-SNR weighting strategy proposed in [Efficient Diffusion Training via Min-SNR Weighting Strategy](https://arxiv.org/abs/2303.09556) which helps to achieve faster convergence
+by rebalancing the loss. In order to use it, one needs to set the `--snr_gamma` argument. The recommended
+value when using it is 5.0. 


Out of curiosity, where is this value proposed? Is there a rule of thumb when choosing a value for this?

It's reported in the paper. A gamma of 5.0 always leads to better results in the experiments presented by the authors in the paper.

patil-suraj · 2023-04-05T12:27:23Z

docs/source/en/training/text2image.mdx

+
+<Tip warning={true}>
+
+Training with Min-SNR weighting strategy is only supported in PyTorch.


for future PR: Could be cool to add this in jax as well, will be useful for the jax event.

@yiyixuxu could you take a look?

examples/text_to_image/README.md

Co-authored-by: Suraj Patil <surajp815@gmail.com>

patrickvonplaten

Very cool! Thanks for iterating

…ergence (huggingface#2899) * improve stable unclip doc. * feat: support for applying min-snr weighting for faster convergence. * add: support for validation logging with wandb * make not a required arg. * fix: arg name. * fix: cli args. * fix: tracker config. * fix: loss calculation. * fix: validation logging. * fix: unwrap call. * fix: validation logging. * fix: internval. * fix: checkpointing push to hub. * fix: https://github.com/huggingface/diffusers/commit/c8a2856c6d5e45577bf4c24dee06b1a4a2f5c050\#commitcomment-106913193 * fix: norm group test for UNet3D. * address PR comments. * remove unneeded code. * add: entry in the readme and docs. * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> --------- Co-authored-by: Suraj Patil <surajp815@gmail.com>

sayakpaul added 6 commits March 25, 2023 09:37

improve stable unclip doc.

a009f1d

Merge branch 'main' of https://github.com/huggingface/diffusers

ecf008f

Merge branch 'main' of https://github.com/huggingface/diffusers

01b4d70

Merge branch 'main' of https://github.com/huggingface/diffusers

24cab1d

Merge branch 'main' of https://github.com/huggingface/diffusers

c73fdba

feat: support for applying min-snr weighting for faster convergence.

c8a2856

sayakpaul added 12 commits March 30, 2023 11:58

add: support for validation logging with wandb

ca0c158

make not a required arg.

76e9446

fix: arg name.

052bc88

fix: cli args.

c481147

fix: tracker config.

835b5ee

fix: loss calculation.

7c842f2

fix: validation logging.

3f078bc

fix: unwrap call.

1d9f3bc

fix: validation logging.

d2ce5e6

fix: internval.

667d23d

fix: checkpointing push to hub.

a154335

fix: https://github.com/huggingface/diffusers/commit/c8a2856c6d5e4557…

ad3fb92

…7bf4c24dee06b1a4a2f5c050\#commitcomment-106913193

patrickvonplaten reviewed Mar 31, 2023

View reviewed changes

Merge branch 'main' into feat/better-convergence

084a341

sayakpaul marked this pull request as ready for review April 4, 2023 03:14

sayakpaul requested a review from patil-suraj April 4, 2023 03:21

sayakpaul added 2 commits April 4, 2023 09:06

fix: norm group test for UNet3D.

f91f6bd

Merge branch 'main' of https://github.com/huggingface/diffusers

565566f

patrickvonplaten reviewed Apr 4, 2023

View reviewed changes

examples/text_to_image/train_text_to_image.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Apr 4, 2023

View reviewed changes

examples/text_to_image/train_text_to_image.py Outdated Show resolved Hide resolved

sayakpaul added 3 commits April 5, 2023 08:13

Merge branch 'main' of https://github.com/huggingface/diffusers

bf837f5

address PR comments.

7434dcd

resolve conflicts.

077c957

patil-suraj approved these changes Apr 5, 2023

View reviewed changes

remove unneeded code.

db8bbbd

add: entry in the readme and docs.

96e7254

sayakpaul requested a review from patil-suraj April 5, 2023 11:53

sayakpaul commented Apr 5, 2023

View reviewed changes

patil-suraj approved these changes Apr 5, 2023

View reviewed changes

sayakpaul commented Apr 5, 2023

View reviewed changes

examples/text_to_image/README.md Outdated Show resolved Hide resolved

Apply suggestions from code review

245b558

Co-authored-by: Suraj Patil <surajp815@gmail.com>

sayakpaul requested a review from patrickvonplaten April 6, 2023 04:37

patrickvonplaten approved these changes Apr 6, 2023

View reviewed changes

sayakpaul merged commit 2494731 into main Apr 6, 2023

sayakpaul deleted the feat/better-convergence branch April 6, 2023 13:38

wfng92 mentioned this pull request Apr 13, 2023

Add Min-SNR weighting strategy to lora script #3084

Closed

wfng92 mentioned this pull request May 17, 2023

Add min snr to text2img lora training script #3459

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Examples] Add support for Min-SNR weighting strategy for better convergence #2899

[Examples] Add support for Min-SNR weighting strategy for better convergence #2899

sayakpaul commented Mar 30, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 30, 2023 •

edited

Loading

patrickvonplaten left a comment

sayakpaul commented Mar 31, 2023

patrickvonplaten Apr 4, 2023

sayakpaul Apr 5, 2023

patrickvonplaten Apr 6, 2023

patrickvonplaten commented Apr 4, 2023

sayakpaul commented Apr 5, 2023

patil-suraj left a comment

patil-suraj Apr 5, 2023

patil-suraj Apr 5, 2023

sayakpaul Apr 5, 2023

sayakpaul commented Apr 5, 2023

sayakpaul commented Apr 5, 2023

sayakpaul Apr 5, 2023

patil-suraj left a comment

patil-suraj Apr 5, 2023

sayakpaul Apr 5, 2023

patil-suraj Apr 5, 2023

sayakpaul Apr 5, 2023

patrickvonplaten left a comment

		return fn


		def log_validation(vae, text_encoder, tokenizer, unet, args, accelerator, weight_dtype, epoch):

		# Compute loss-weights as per Section 3.4 of https://arxiv.org/abs/2303.09556.
		# Since we predict the noise instead of x_0, the original formulation is slightly changed.


		<Tip warning={true}>

		Training with Min-SNR weighting strategy is only supported in PyTorch.

[Examples] Add support for Min-SNR weighting strategy for better convergence #2899

[Examples] Add support for Min-SNR weighting strategy for better convergence #2899

Conversation

sayakpaul commented Mar 30, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Mar 30, 2023 • edited Loading

patrickvonplaten left a comment

Choose a reason for hiding this comment

sayakpaul commented Mar 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten commented Apr 4, 2023

sayakpaul commented Apr 5, 2023

patil-suraj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul commented Apr 5, 2023

sayakpaul commented Apr 5, 2023

Choose a reason for hiding this comment

patil-suraj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

sayakpaul commented Mar 30, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 30, 2023 •

edited

Loading