[WIP] adding Kandinsky training scripts #4890

yiyixuxu · 2023-09-04T21:47:05Z

🚨🚨🚨 Note: The main author of this PR is @cene555 and the Kandinsky team. In this PR we mainly just make some small modifications to the training script provided by the authors from original PR so that it's consistent with all our other training scripts. Thanks a million for the contribution @cene555 🚨🚨🚨

Authors of this PR:
Arseniy Shakhmatov
Anton Razzhigaev
Aleksandr Nikolich
Igor Pavlov
Andrey Kuznetsov
Denis Dimitrov

Note:
will include dreambooth in a separate PR

…ecoder.py

…or.py

…une_decoder_lora.py

…e_prior_lora.py

…-finetune

HuggingFaceDocBuilderDev · 2023-09-04T21:55:11Z

The documentation is not available anymore as the PR was closed or merged.

yiyixuxu · 2023-09-07T22:11:12Z

src/diffusers/models/prior_transformer.py



-class PriorTransformer(ModelMixin, ConfigMixin):
+class PriorTransformer(ModelMixin, ConfigMixin, UNet2DConditionLoadersMixin):


I'm a little bit surprised to see UNet2DConditionLoadersMixin works out of box for PriorTransformer:)

Just diffusers things 😎

examples/kandinsky2_2/text_to_image/README.md

sayakpaul · 2023-09-08T04:46:18Z

examples/kandinsky2_2/text_to_image/README.md

+### Training with xFormers:
+
+You can enable memory efficient attention by [installing xFormers](https://huggingface.co/docs/diffusers/main/en/optimization/xformers) and passing the `--enable_xformers_memory_efficient_attention` argument to the script.
+
+xFormers training is not available for Prior model fine-tune.
+
+**Note**:
+
+According to [this issue](https://github.com/huggingface/diffusers/issues/2234#issuecomment-1416931212), xFormers `v0.0.16` cannot be used for training in some GPUs. If you observe that problem, please install a development version as indicated in that comment.


We have LoRA too. Let's include a section on LoRA as well in the README.

sayakpaul · 2023-09-08T04:50:13Z

examples/kandinsky2_2/text_to_image/train_text_to_image_lora_decoder.py

+            block_id = int(name[len("down_blocks.")])
+            hidden_size = unet.config.block_out_channels[block_id]
+
+        lora_attn_procs[name] = LoRAAttnAddedKVProcessor(


Hmm. Seems like we don't have a faster implementation of this processor. Based on the usage of the training scripts, I think we can monitor and added them later if needed. WDYT?

sayakpaul · 2023-09-08T04:51:34Z

examples/kandinsky2_2/text_to_image/train_text_to_image_lora_decoder.py

+                model_pred = unet(noisy_latents, timesteps, None, added_cond_kwargs=added_cond_kwargs).sample[:, :4]
+
+                if args.snr_gamma is None:
+                    loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")


IIRC Kandinsky models weren't trained using the epsilon prediction objective. If so, does this still lead to reasonable results?

Actually I'm not sure what's the objective used in decoder training, I looked back into the dalle-2 paper, and I think it only mentioned that it predicts samples directly for prior

sayakpaul

Amazing work!

Questions / comments:

Do we not LoRA fine-tune the text encoders as typically done in the SD world?
Let's add tests! Pinging @DN6 for a second opinion here.
Let's port over the README to our training documentation too so that it reflects on https://huggingface.co/docs/diffusers/main/en/index.

stevhliu

Very nice, thanks for writing this training guide up! 😄

examples/kandinsky2_2/text_to_image/README.md

stevhliu · 2023-09-08T22:08:32Z

examples/kandinsky2_2/text_to_image/README.md

+from diffusers import AutoPipelineForText2Image
+import torch
+
+pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16)


Suggested change

pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16)

pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16)

Actually, here we want to create a combined pipeline, so will have to use the decoder checkpoint for that
once we have the pipe as the combined pipeline, we then access the prior with pipe.prior_prior

examples/kandinsky2_2/text_to_image/README.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

yiyixuxu · 2023-09-13T21:47:46Z

Is it okay to merge now? Maybe @sayakpaul can take a look again
I don't think we need to add this to doc; I added a link to this folder on the text-to-image training doc page instead

yiyixuxu · 2023-09-13T21:52:54Z

@sayakpaul about your questions

Do we not LoRA fine-tune the text encoders as typically done in the SD world?

kandinsky does not have a pure text-conditioned diffusion process - it decodes image embedding instead. it does use a text encoder but I'm not sure how important it is to fine-tune it. Maybe the community can try it out

sayakpaul

Dope! Thank you so much for seeing this through, Yiyi!

* Add files via upload Co-authored-by: Shahmatov Arseniy <62886550+cene555@users.noreply.github.com> Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

cene555 and others added 13 commits July 12, 2023 03:45

Add files via upload

eb8f2db

Rename examples/tune_decoder.py to examples/kandinsky2_2_train/tune_d…

13a409c

…ecoder.py

Rename examples/tune_prior.py to examples/kandinsky2_2_train/tune_pri…

03502dc

…or.py

Rename examples/tune_decoder_lora.py to examples/kandinsky2_2_train/t…

eb434ec

…une_decoder_lora.py

Rename examples/tune_prior_lora.py to examples/kandinsky2_2_train/tun…

e7924e6

…e_prior_lora.py

style

f17466f

Add files via upload

9c368c1

Merge branch 'main' of github.com:ai-forever/diffusers into kandinsky…

dce997a

…-finetune

update tune_decoder

272f789

Merge remote-tracking branch 'origin/main' into kandinsky-finetune

3d7f795

Merge remote-tracking branch 'origin/main' into kandinsky-finetune

64abbaf

rename

6e8faa4

style

d39efc7

yiyixuxu added 12 commits September 4, 2023 23:11

save only decoder pipeline

03c9da0

update text-to-image-lora

810425c

remove xformer

4e3a210

update train_prior

9ec48d2

fix clip_mean

3f8ea1c

fix

a7d136b

fix more

429498e

update prior lora

78812ef

fix

982b2a6

test lora loader

5ce8398

style + fix

9be8440

rename files and add readme

dd42e9d

yiyixuxu commented Sep 7, 2023

View reviewed changes

yiyixuxu requested review from sayakpaul and stevhliu September 7, 2023 22:14

sayakpaul reviewed Sep 8, 2023

View reviewed changes

examples/kandinsky2_2/text_to_image/README.md Outdated Show resolved Hide resolved

sayakpaul reviewed Sep 8, 2023

View reviewed changes

sayakpaul approved these changes Sep 8, 2023

View reviewed changes

stevhliu approved these changes Sep 8, 2023

View reviewed changes

yiyixuxu and others added 16 commits September 10, 2023 18:48

Update examples/kandinsky2_2/text_to_image/README.md

3545212

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Update examples/kandinsky2_2/text_to_image/README.md

a3581d5

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update examples/kandinsky2_2/text_to_image/README.md

121f3cb

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update examples/kandinsky2_2/text_to_image/README.md

8ee5597

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update examples/kandinsky2_2/text_to_image/README.md

c3707a7

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update examples/kandinsky2_2/text_to_image/README.md

3676185

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update examples/kandinsky2_2/text_to_image/README.md

8146003

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update examples/kandinsky2_2/text_to_image/README.md

af96d29

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update examples/kandinsky2_2/text_to_image/README.md

cdc83a4

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update examples/kandinsky2_2/text_to_image/README.md

d5b06b9

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update examples/kandinsky2_2/text_to_image/README.md

eaf6797

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update examples/kandinsky2_2/text_to_image/README.md

8e8cb1c

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update examples/kandinsky2_2/text_to_image/README.md

db25048

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update examples/kandinsky2_2/text_to_image/README.md

91c8367

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

add

7066be9

make style

cf8ea15

Merge branch 'main' into kandinsky-finetune

77c86be

sayakpaul approved these changes Sep 14, 2023

View reviewed changes

patrickvonplaten approved these changes Sep 14, 2023

View reviewed changes

yiyixuxu merged commit e70cb12 into main Sep 14, 2023

yiyixuxu deleted the kandinsky-finetune branch September 14, 2023 16:58



		class PriorTransformer(ModelMixin, ConfigMixin):
		class PriorTransformer(ModelMixin, ConfigMixin, UNet2DConditionLoadersMixin):

	pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16)
	pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16)

[WIP] adding Kandinsky training scripts #4890

[WIP] adding Kandinsky training scripts #4890

Uh oh!

Conversation

yiyixuxu commented Sep 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yiyixuxu commented Sep 13, 2023

Uh oh!

yiyixuxu commented Sep 13, 2023

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

yiyixuxu commented Sep 4, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 4, 2023 •

edited

Loading