Potentially wrong scheduler in train_text_to_image_sdxl.py? #8281

christopher-beckham · 2024-05-26T19:18:23Z

Describe the bug

I see that in train_text_to_image_sdxl.py the scheduler being used is DDPMScheduler, which -- as the name suggests -- is the specific diffusion formulation in this paper. However, this contradicts the config of the actual SDXL model, it's using EulerDiscreteScheduler as you can see here: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/model_index.json

EulerDiscreteScheduler (and other compatible schedulers) can't just be arbitrarily swapped in with DDPMScheduler because the former computes a noised sample via x_t = x_0 + sigma_t*eps (i.e. it implements variance exploding SDE) whilst DDPM is the following variance preserving SDE, and x_t is computed as follows:

If DDPMScheduler was simply instantiated to just pull out the betas and derive sigmas from them that's fine (to compute the noisy sample from the exploding SDE), but if we look here we can see that scheduler.add_noise is being used which means we're computing x_t based on the DDPM formulation.

In fact, from other SDXL-based scripts (e.g. this and this) the variance exploding formulation is used, but with one extra caveat that the noisy inputs also get scaled with noisy_model_input / ((sigmas**2 + 1) ** 0.5) before being passed into the UNet (and this is extremely important).

Therefore, my question is: is there a particular reason why DDPMScheduler is used here or is it just an oversight/bug? I have seen issues raised about this script in the past, including this one #4827

Thanks.

Reproduction

n/a (asking a fundamental question related to the code)

Logs

No response

System Info

n/a

Who can help?

No response

The text was updated successfully, but these errors were encountered:

bghira · 2024-05-27T12:06:32Z

the training noise schedule is very different than the inference one. this is expected. the code you've linked to is specific to EDM model training, eg. Playground v2.5 or perhaps CosXL.

bghira · 2024-05-27T12:10:06Z

also, the Euler scheduler is used on the internal controlnet training script used by Hugging Face's Suraj Patil (cc @patil-suraj) and the word was that the initial samples looked better. but the long term results of training over several hundred thousand iterations is no better or different than using DDPM.

christopher-beckham · 2024-05-27T15:38:12Z

Thanks @bghira, indeed it was an oversight on my part that the scheduler specified in the config is intended to be the inference scheduler (not necessarily the training one). I understand that both can be different, but as far as the training scheduler is concerned it is counterintuitive to me if a model trained with DDPMScheduler (i.e. does mean scaling) is subsequently fine-tuned with a scheduler that doesn't (if you're implying that's the case with ControlNet script).

Otherwise, if SDXL was trained with DDPMScheduler then I have no issues fine-tuning that model on the same scheduler.

bghira · 2024-05-27T16:19:36Z

i'm also wanting more information on the matter, so i'm glad you started the discussion.

for perhaps a little more insight, it might be worth it to note that the DreamShaper series by Lykon has a recent version that requires DPM++ 2M Karras (i think the SDE variant) because he tuned it on that noise schedule.

it's entirely possible that training on DDPM makes the model more broadly compatible with other schedules at inference time.

leeruibin · 2024-05-30T09:30:41Z

I have the same question when I try to fineturning CosXL model. As they stated in CosXL, CosXL employ Cosine-Continuous EDM VPred scheduler to turn the model. I follow the instruction of train_dreambooth_lora_sdxl to fineturn CosXL model as there are official implementation of edm_style_training by diffuser. However, the model perfermance will become terrible after I try to fineturn it.

I have tried EDMEulerScheduler and DDPMScheduler but none of them seems work well. I have on idea whether the EDMEulerScheduler is exactly the Cosine-Continuous EDM VPred scheduler mentioned by StablilityAI or not. They only release the model parameters and give a very simple statement.

I would be grateful if you could give me some suggestions to make the fineturninig progress work normally. @bghira

bghira · 2024-05-30T11:45:07Z

see this for cosxl demo code

leeruibin · 2024-05-30T15:34:34Z

Thanks for your reply. Actually, I have referred to this code in my training code.

In my setting, I can edit image successfully when I use instruction like "change the cup to red" before fineturning. However, when I try to fineturn the CosXL_edit model with fusing/instructpix2pix-1000-samples dataset as a toy experiment, the model becomes weird. The loss can decrease normally but the editing fails and the output image contains many artifacts.

The training code is in here. Most of the codes are copied from train_dreambooth_lora_sdxl and cosxl and here are my fineturning results. I hope you could give me some advice to fix this problem.

Thanks a lot!

bghira · 2024-05-30T15:37:54Z

wow, that is a totally new style of failure mode to me and i'm not sure what would cause that other than scheduler configuration details. i was thinking maybe you're experiencing an issue with unconditional space. are you using negative prompts at inference time, and, does reducing/increasing the step count change anything?

bghira · 2024-05-30T15:38:39Z

also, 50k steps on 1k images seems like quite a lot of training. have you tested earlier checkpoints?

leeruibin · 2024-05-30T15:49:18Z

I also believe this is caused by the wrong setting of the training scheduler. Currently, I follow the setting of train_dreambooth_lora_sdxl to set the training scheduler to EDMEulerScheduler, I am not sure whether this is the Cosine-Continuous EDM VPred schedule mentioned by stabilityAI.

The inference setting of the fineturned model is the same as the first line. I use inference step 20, CFG=7, no negative prompt.

THe earlier checkpoints also fail. This is the evaluation result after 3K steps. The training batch size is 12.

christopher-beckham · 2024-05-30T15:55:13Z

I'm a bit out of my depth here (I don't know anything about CosXL) but is there a citation or some reference for this "cosine continuous" schedule? Also, as the name suggests, are you finetuning with vpred instead of epsilon prediction?

leeruibin · 2024-05-30T16:02:24Z

Yes, I finetune it with vpred.

This is also the problem that bothers me. StabilityAI just released the model and said the CosXL model is a finetuned version of SDXL with Cosine-Continuous EDM VPred scheduler. Currently, they haven't given some details about this model as well as citations.

I have tried this model and find this may be the most powerful model for conducting instruction-based editing to my knowledge. There is too little information in the official repo. Here is their model card CosXL

christopher-beckham · 2024-05-30T17:19:27Z

BTW we could maybe move this into its own separate discussion or issues since it's pertaining to CosXL.

I was originally planning on closing this issue since I had my own misunderstanding with what the appropriate scheduler is for SDXL. Now I know that it was trained as a DDPM model but I still find it odd that the inference scheduler is EulerDiscreteScheduler because that is actually based on Algorithm 2 of EDM (without the 2nd order step), but Algorithm 2 is actually assuming s(t)=1 and sigma(t) = t (see Table 1 in their paper), therefore it's not actually the DDPM formulation which has its own specific s(t) and sigma(t):

Of course, things seem to work out empirically and the samples are still good, but I want to mention it anyway.

Edit: I forgot about the empirical parts of the EDM paper stating that the choice of s(t)=1 and sigma(t) = t still performed better FID-wise than their actual appropriate values (see Figure 2 and the additional results in the appendix). Ok, so that alleviates my concern. :)

christopher-beckham added the bug Something isn't working label May 26, 2024

christopher-beckham closed this as completed May 30, 2024

leeruibin mentioned this issue May 31, 2024

EDMEulerScheduler fails when fineturning CosXL_edit model. #8356

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potentially wrong scheduler in train_text_to_image_sdxl.py? #8281

Potentially wrong scheduler in train_text_to_image_sdxl.py? #8281

christopher-beckham commented May 26, 2024 •

edited

Loading

bghira commented May 27, 2024 •

edited

Loading

bghira commented May 27, 2024

christopher-beckham commented May 27, 2024

bghira commented May 27, 2024

leeruibin commented May 30, 2024

bghira commented May 30, 2024

leeruibin commented May 30, 2024 •

edited

Loading

bghira commented May 30, 2024

bghira commented May 30, 2024

leeruibin commented May 30, 2024

christopher-beckham commented May 30, 2024

leeruibin commented May 30, 2024

christopher-beckham commented May 30, 2024 •

edited

Loading

Potentially wrong scheduler in train_text_to_image_sdxl.py? #8281

Potentially wrong scheduler in train_text_to_image_sdxl.py? #8281

Comments

christopher-beckham commented May 26, 2024 • edited Loading

Describe the bug

Reproduction

Logs

System Info

Who can help?

bghira commented May 27, 2024 • edited Loading

bghira commented May 27, 2024

christopher-beckham commented May 27, 2024

bghira commented May 27, 2024

leeruibin commented May 30, 2024

bghira commented May 30, 2024

leeruibin commented May 30, 2024 • edited Loading

bghira commented May 30, 2024

bghira commented May 30, 2024

leeruibin commented May 30, 2024

christopher-beckham commented May 30, 2024

leeruibin commented May 30, 2024

christopher-beckham commented May 30, 2024 • edited Loading

christopher-beckham commented May 26, 2024 •

edited

Loading

bghira commented May 27, 2024 •

edited

Loading

leeruibin commented May 30, 2024 •

edited

Loading

christopher-beckham commented May 30, 2024 •

edited

Loading