v-prediction training support #1455

patil-suraj · 2022-11-28T15:21:38Z

This PR adds support for v-prediction training in

textual-inversion
dreambooth
text-to-image fine-tuning

This allows fine-tuning the SD2 768x768 model with these scripts.

To enable this, it adds get_velocity method to DDPM and DDIM scheduler to get the target during training. The type of training is automatically detected inside script using the noise_scheduler.config.prediction_type argument.

Users will just have set the right resolution in the command, 512 for all models except the 768 one. And 768 for the stable-diffusion-2 model.

HuggingFaceDocBuilderDev · 2022-11-28T15:28:48Z

The documentation is not available anymore as the PR was closed or merged.

pcuenca

Amazing!

It looks like the DDIM scheduler is never used for training, is that correct? Do we need a get_velocity function for it in that case?

pcuenca · 2022-11-28T15:34:40Z

examples/dreambooth/train_dreambooth.py


                    # Add the prior loss to the instance loss.
                    loss = loss + args.prior_loss_weight * prior_loss
                else:
-                    loss = F.mse_loss(noise_pred.float(), noise.float(), reduction="mean")
+                    loss = F.mse_loss(noise_pred.float(), target.float(), reduction="mean")


Shouldn't we rename noise_pred too? (Same comment for the other scripts)

Good catch, we could call it model_output instead of noise_pred, wdyt ?

model_output or pred both sound fine to me, whatever you think is clearer. I think we use model_output in more places though, so that'd be better then.

pcuenca · 2022-11-28T15:36:30Z

examples/text_to_image/train_text_to_image.py

+    parser.add_argument(
+        "--revision",
+        type=str,
+        default=None,
+        required=False,
+        help="Revision of pretrained model identifier from huggingface.co/models.",
+    )


pcuenca · 2022-11-28T15:42:32Z

src/diffusers/schedulers/scheduling_ddim.py

+    def get_velocity(
+        self, sample: torch.FloatTensor, noise: torch.FloatTensor, timesteps: torch.IntTensor
+    ) -> torch.FloatTensor:
+        # Make sure alphas_cumprod and timestep have same device and dtype as sample
+        self.alphas_cumprod = self.alphas_cumprod.to(device=sample.device, dtype=sample.dtype)
+        timesteps = timesteps.to(sample.device)
+
+        sqrt_alpha_prod = self.alphas_cumprod[timesteps] ** 0.5
+        sqrt_alpha_prod = sqrt_alpha_prod.flatten()
+        while len(sqrt_alpha_prod.shape) < len(sample.shape):
+            sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)
+
+        sqrt_one_minus_alpha_prod = (1 - self.alphas_cumprod[timesteps]) ** 0.5
+        sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.flatten()
+        while len(sqrt_one_minus_alpha_prod.shape) < len(sample.shape):
+            sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)
+
+        velocity = sqrt_alpha_prod * noise - sqrt_one_minus_alpha_prod * sample
+        return velocity


This looks very similar to add_noise doesn't it? Would it make sense to make both implementations rely on a common function?

Yes, could refactor it in a follow-up PR , wdyt @patrickvonplaten

Since we use both add_noise and get_velocity in the same training step for v-prediction, I'm ok with keeping them separate. But longer-term we might benefit from factoring out or condensing the alpha_prod code in both functions (everything above velocity =).

Agree ! I think it's clear to add it directly to add_noise and then maye use of self.config.prediction_type - could we do this in this PR maybe?

scratch that it doesn't work as expected

In favour of keeping get_velocity because for v-prediction we need both the velocity and noised image, so if we modify add_noise, we'll need to return a tuple as output, which will complicate it bit.

get_velocity is clearer to understand as it makes it clear that it's different from add_noise.

pcuenca · 2022-11-28T15:43:22Z

src/diffusers/schedulers/scheduling_ddpm.py

+    def get_velocity(
+        self, sample: torch.FloatTensor, noise: torch.FloatTensor, timesteps: torch.IntTensor
+    ) -> torch.FloatTensor:
+        # Make sure alphas_cumprod and timestep have same device and dtype as sample
+        self.alphas_cumprod = self.alphas_cumprod.to(device=sample.device, dtype=sample.dtype)
+        timesteps = timesteps.to(sample.device)
+
+        sqrt_alpha_prod = self.alphas_cumprod[timesteps] ** 0.5
+        sqrt_alpha_prod = sqrt_alpha_prod.flatten()
+        while len(sqrt_alpha_prod.shape) < len(sample.shape):
+            sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)
+
+        sqrt_one_minus_alpha_prod = (1 - self.alphas_cumprod[timesteps]) ** 0.5
+        sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.flatten()
+        while len(sqrt_one_minus_alpha_prod.shape) < len(sample.shape):
+            sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)
+
+        velocity = sqrt_alpha_prod * noise - sqrt_one_minus_alpha_prod * sample
+        return velocity
+


Same comment as in DDIM.

patil-suraj · 2022-11-28T15:48:43Z

It looks like the DDIM scheduler is never used for training, is that correct? Do we need a get_velocity function for it in that case?

It's not really used for training, but think it can be. Adding the method just for consistency.

pcuenca · 2022-11-28T15:50:15Z

It's not really used for training, but think it can be. Adding the method just for consistency.

Cool! We can add support for dpm solver too in a future PR :)

anton-l

LGTM!

anton-l · 2022-11-28T16:30:33Z

src/diffusers/schedulers/scheduling_ddim.py

+    def get_velocity(
+        self, sample: torch.FloatTensor, noise: torch.FloatTensor, timesteps: torch.IntTensor
+    ) -> torch.FloatTensor:
+        # Make sure alphas_cumprod and timestep have same device and dtype as sample
+        self.alphas_cumprod = self.alphas_cumprod.to(device=sample.device, dtype=sample.dtype)
+        timesteps = timesteps.to(sample.device)
+
+        sqrt_alpha_prod = self.alphas_cumprod[timesteps] ** 0.5
+        sqrt_alpha_prod = sqrt_alpha_prod.flatten()
+        while len(sqrt_alpha_prod.shape) < len(sample.shape):
+            sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)
+
+        sqrt_one_minus_alpha_prod = (1 - self.alphas_cumprod[timesteps]) ** 0.5
+        sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.flatten()
+        while len(sqrt_one_minus_alpha_prod.shape) < len(sample.shape):
+            sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)
+
+        velocity = sqrt_alpha_prod * noise - sqrt_one_minus_alpha_prod * sample
+        return velocity


Since we use both add_noise and get_velocity in the same training step for v-prediction, I'm ok with keeping them separate. But longer-term we might benefit from factoring out or condensing the alpha_prod code in both functions (everything above velocity =).

nlml · 2022-12-05T10:25:48Z

Hmmm, did this break the script for older stable diffusion models? When I try to run with MODEL_NAME="runwayml/stable-diffusion-v1-5" now, I get the following error:

│ ❱ 557 │   │   │   │   if noise_scheduler.config.prediction_type == "epsilon":                    │
│   558 │   │   │   │   │   target = noise                                                         │  
│   559 │   │   │   │   elif noise_scheduler.config.prediction_type == "v_prediction":             │
│   560 │   │   │   │   │   target = noise_scheduler.get_velocity(latents, noise, timesteps)       │                                         
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯                                         
AttributeError: 'FrozenDict' object has no attribute 'prediction_type'

because the noise_scheduler.config FrozenDict does not contain this key

patil-suraj · 2022-12-05T11:39:35Z

Hey @nlml , prediction_type is only recently added, so you'll need to install diffusers from main to run the examples scripts.

* add get_velocity * add v prediction for training * fix saving * add revision arg * fix saving * save checkpoints dreambooth * fix saving embeds * add instruction in readme * quality * noise_pred -> model_pred

pkurz3nd · 2023-01-03T15:51:38Z

Hello, i tried training using stable-diffusion-2-1 with resolution 768, but i get nan loss in the very first iteration
before optimizer step is performed.
i use this command for training:

!accelerate launch train_dreambooth.py
--pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1"
--revision="fp16"
--instance_data_dir="./data/jonny_depp"
--class_data_dir="./data/popular_person"
--output_dir=$OUTPUT_DIR
--with_prior_preservation --prior_loss_weight=1.0
--instance_prompt="photo of xyz jonny depp"
--class_prompt="professional photographic of popular person, high quality image"
--resolution=768
--train_batch_size=1
--gradient_accumulation_steps=4
--learning_rate=1e-6
--lr_scheduler="polynomial"
--lr_warmup_steps=0
--num_class_images=10
--max_train_steps=800
--train_text_encoder
--mixed_precision="fp16"
--use_8bit_adam
--gradient_checkpointing
--sample_batch_size=1

i am using
diffusers==0.11.1
torch==1.13.1+cu117
accelerate==0.12.0

i checked all the model parts, the problem is with unet, it outputs a all nan tensor. the inputs to unet dont contain any nans or infs
any ideas?

patil-suraj · 2023-01-04T12:44:58Z

Hey @pkurz3nd , this is a known issue with the table-diffusion-2-1 model. It overflows when using fp16 during training. To train this model, either

use fp32
or use xformers when using fp16

* add get_velocity * add v prediction for training * fix saving * add revision arg * fix saving * save checkpoints dreambooth * fix saving embeds * add instruction in readme * quality * noise_pred -> model_pred

patil-suraj added 8 commits November 26, 2022 14:32

add get_velocity

9aac7d7

add v prediction for training

65afd4e

fix saving

a1996d5

add revision arg

24f6bc9

fix saving

f2d6c71

save checkpoints dreambooth

576e0c1

fix saving embeds

415a96a

add instruction in readme

cd20d7d

patil-suraj requested review from patrickvonplaten, anton-l and pcuenca November 28, 2022 15:21

quality

6841a18

pcuenca approved these changes Nov 28, 2022

View reviewed changes

noise_pred -> model_pred

431d1e5

anton-l approved these changes Nov 28, 2022

View reviewed changes

patil-suraj merged commit 6c56f05 into main Nov 28, 2022

patil-suraj deleted the v-training branch November 28, 2022 16:47

patil-suraj mentioned this pull request Dec 1, 2022

Dreambooth example on SD2-768 model is producing weird results #1429

Closed

aksh-at mentioned this pull request Dec 5, 2022

Dreambooth examples updates modal-labs/modal-examples#128

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v-prediction training support #1455

v-prediction training support #1455

patil-suraj commented Nov 28, 2022

HuggingFaceDocBuilderDev commented Nov 28, 2022 •

edited

pcuenca left a comment

pcuenca Nov 28, 2022

patil-suraj Nov 28, 2022

pcuenca Nov 28, 2022

pcuenca Nov 28, 2022

pcuenca Nov 28, 2022

patil-suraj Nov 28, 2022

anton-l Nov 28, 2022

patrickvonplaten Nov 28, 2022

patrickvonplaten Nov 28, 2022

patil-suraj Nov 28, 2022

pcuenca Nov 28, 2022

patil-suraj commented Nov 28, 2022

pcuenca commented Nov 28, 2022

anton-l left a comment •

edited

anton-l Nov 28, 2022

nlml commented Dec 5, 2022 •

edited

patil-suraj commented Dec 5, 2022

pkurz3nd commented Jan 3, 2023 •

edited

patil-suraj commented Jan 4, 2023

v-prediction training support #1455

v-prediction training support #1455

Conversation

patil-suraj commented Nov 28, 2022

HuggingFaceDocBuilderDev commented Nov 28, 2022 • edited

pcuenca left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patil-suraj commented Nov 28, 2022

pcuenca commented Nov 28, 2022

anton-l left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nlml commented Dec 5, 2022 • edited

patil-suraj commented Dec 5, 2022

pkurz3nd commented Jan 3, 2023 • edited

patil-suraj commented Jan 4, 2023

HuggingFaceDocBuilderDev commented Nov 28, 2022 •

edited

anton-l left a comment •

edited

nlml commented Dec 5, 2022 •

edited

pkurz3nd commented Jan 3, 2023 •

edited