[Core] add: controlnet support for SDXL #4038

sayakpaul · 2023-07-11T10:50:40Z

This PR adds support for ControlNets with SDXL. The two primary components being added to this PR:

Training script train_controlnet_sdxl.py.
Pipeline StableDiffusionXLControlNetPipeline (with changes to ControlNetModel to accommodate the pipeline-level changes).

However, these seems to be something weird going on here.

I first started training on a small subset of dataset (the circles dataset) with the following command:

export MODEL_DIR="stabilityai/stable-diffusion-xl-base-0.9"
export OUTPUT_DIR="controlnet-sdxl-circles"

wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png

wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png

accelerate launch train_controlnet_sdxl.py \
 --pretrained_model_name_or_path=$MODEL_DIR \
 --output_dir=$OUTPUT_DIR \
 --dataset_name=fusing/fill50k \
 --mixed_precision="fp16" \
 --resolution=1024 \
 --learning_rate=5e-5 \
 --max_train_samples=500 \
 --max_train_steps=1000 \
 --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
 --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
 --validation_steps=25 \
 --train_batch_size=1 \
 --gradient_accumulation_steps=4 \
 --report_to="wandb" \
 --seed=42 \
 --push_to_hub

The trained checkpoints seem to only generate black images: https://huggingface.co/fusing/controlnet-sdxl-circles-fixed (only visible to the diffusers team members).

To further debug this, I tried:

from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
import torch 

base_ckpt_id = "stabilityai/stable-diffusion-xl-base-0.9"
controlnet_ckpt_id = "controlnet-sdxl-circles-fixed"

controlnet = ControlNetModel.from_pretrained(
	controlnet_ckpt_id, subfolder="checkpoint-500/controlnet", torch_dtype=torch.float16
).to("cuda")

pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
	base_ckpt_id, controlnet=controlnet, torch_dtype=torch.float16
).to("cuda")


cond_image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png"
)
prompt = "red circle with blue background"

image = pipeline(prompt, image=cond_image).images[0]
image.save("controlnet@ckpt-500.png")

This doesn't generate the expected results (which is expected since the number of training steps is quite low) but doesn't generate all black images either.

@patrickvonplaten @williamberman could you take a deeper look here?

TODOs

tests
docs
misc changes

HuggingFaceDocBuilderDev · 2023-07-11T10:59:40Z

The documentation is not available anymore as the PR was closed or merged.

gkorepanov · 2023-07-11T11:11:26Z

The trained checkpoints seem to only generate black images:

Hi! Do you mean that in validation during training all images are black, but if you manually load trained checkpoint using external script, the images are fine?

sayakpaul · 2023-07-11T11:41:35Z

Hi! Do you mean that in validation during training all images are black, but if you manually load trained checkpoint using external script, the images are fine?

Exactly.

gkorepanov · 2023-07-11T11:55:06Z

Exactly.

I think that might relate to SDXL VAE producing NANs in some cases with fp16 mode.

From https://github.com/kohya-ss/sd-scripts/tree/sdxl:

The image generation during training is now available. However, the VAE for SDXL seems to produce NaNs in some cases when using fp16. The images will be black. Currently, the NaNs cannot be avoided even with --no_half_vae option. It works with bf16 or without mixed precision.

Also:
https://huggingface.co/stabilityai/sdxl-vae/discussions/6
https://huggingface.co/madebyollin/sdxl-vae-fp16-fix

sayakpaul · 2023-07-11T12:01:28Z

Thanks for being willing to help.

I think the issue with VAE is handled. See: https://github.com/huggingface/diffusers/blob/db78a4cb4e3f105cbc7534890f606e25e906e23a/src/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl.py#L1118C1-L1133C38.

Also, when I run the manual validation, it's in FP16 only.

gkorepanov · 2023-07-11T12:07:43Z

I think the issue with VAE is handled

Ah, really, seems so, thanks. BTW, in the code you mentioned there might be a small bug with unnecessary not which was recently fixed in the main branch: #4019

Also, to run your code, I had to put extra StableDiffusionXLControlNetPipeline imports in few places, I think you have forgotten to include few __init__.py into PR

sayakpaul · 2023-07-11T14:50:51Z

@gkorepanov thanks so much for your catches. I incorporated the fixes. Let me run the dummy experiment one more time to check quickly.

laksjdjf · 2023-07-11T15:33:55Z

Is it because autocast is used to generate the validation image?

diffusers/examples/controlnet/train_controlnet_sdxl.py

Lines 123 to 126 in 68f2c38

    
           with torch.autocast("cuda"): 
        
               image = pipeline( 
        
                   validation_prompt, validation_image, num_inference_steps=20, generator=generator 
        
               ).images[0]

kohya-ss's problems also seem to have been caused by autocast.
kohya-ss/sd-scripts@814996b#diff-5f48c8e976d43e587007dc13a34100a96621cdc5fbe083ee772e920855648722R3877

patrickvonplaten · 2023-07-11T16:22:13Z

Cool! Let's make sure we have a working controlnet training run before merging this though :-)

sayakpaul · 2023-07-11T16:29:09Z

Cool! Let's make sure we have a working controlnet training run before merging this though :-)

There's a working script in the PR. The issues described in the original post is why I am seeking reviews for.

gkorepanov · 2023-07-11T17:50:53Z

There's a working script in the PR. The issues described in the original post is why I am seeking reviews for.

After disabling autocast in validation and using torch.float32 when loading the pipeline the validation looks better (at least images are not black anymore):

But images seem to be awkward.

gkorepanov · 2023-07-11T18:45:09Z

But images seem to be awkward.

The difference was caused by different resolution in inference. By default, controlnet pipeline takes height/width from control image

sayakpaul · 2023-07-12T02:01:53Z

@gkorepanov thanks again for your inputs! Very much appreciated.

Let me run a couple of experiments now.

sayakpaul · 2023-07-12T03:03:47Z

@gkorepanov may I know which GPU model did you use for your tests? I am currently using a 40GB A100 and when I try log_validation() in FP32, it OOMs.

src/diffusers/__init__.py

src/diffusers/pipelines/__init__.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

sayakpaul · 2023-07-18T12:08:59Z

@patrickvonplaten thanks for all the reviews. A final review and I think we're good to go. Let me know.

adhikjoshi · 2023-07-18T12:31:02Z

Will existing controlnet 1.1 checkpoints work here?

sayakpaul · 2023-07-18T12:32:32Z

Will existing controlnet 1.1 checkpoints work here?

No.

patrickvonplaten

Cool!

sayakpaul · 2023-07-18T12:56:25Z

@gkorepanov we start a PR for adding switching support and MultiControlNet too since switching likely impacts that more.

Let me know :)

sayakpaul · 2023-07-21T04:05:11Z

@williamberman #4188.

* add: controlnet sdxl. * modifications to controlnet. * run styling. * add: __init__.pys * incorporate huggingface#4019 changes. * run make fix-copies. * resize the conditioning images. * remove autocast. * run styling. * disable autocast. * debugging * device placement. * back to autocast. * remove comment. * save some memory by reusing the vae and unet in the pipeline. * apply styling. * Allow low precision sd xl * finish * finish * changes to accommodate the improved VAE. * modifications to how we handle vae encoding in the training. * make style * make existing controlnet fast tests pass. * change vae checkpoint cli arg. * fix: vae pretrained paths. * fix: steps in get_scheduler(). * debugging. * debugging./ * fix: weight conversion. * add: docs. * add: limited tests./ * add: datasets to the requirements. * update docstrings and incorporate the usage of watermarking. * incorporate fix from huggingface#4083 * fix watermarking dependency handling. * run make-fix-copies. * Empty-Commit * Update requirements_sdxl.txt * remove vae upcasting part. * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * run make style * run make fix-copies. * disable suppot for multicontrolnet. * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * run make fix-copies. * dtyle/. * fix-copies. --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

zdxpan · 2023-08-29T03:58:29Z

tranning met loss is nan, and the pred_noise cntain nan, which will case the tranning fail， the traing target is predict noise with given noise, which mse alwas nearby 1 (~= 1),
possible reason
1、 lr too large
2、 data had some nan value
3、is there anty other reason?

and met the log_validate validate image alwas balck

sayakpaul · 2023-08-29T05:11:38Z

Try passing the following as your VAE: madebyollin/sdxl-vae-fp16-fix.

Additionally, you can ask questions on the repositories like the following, which leveraged our training scripts to obtain nice results: https://huggingface.co/thibaud/controlnet-openpose-sdxl-1.0/discussions.

patrickvonplaten · 2023-08-29T07:23:24Z

tranning met loss is nan, and the pred_noise cntain nan, which will case the tranning fail， the traing target is predict noise with given noise, which mse alwas nearby 1 (~= 1), possible reason 1、 lr too large 2、 data had some nan value 3、is there anty other reason?

and met the log_validate validate image alwas balck

@zdxpan please make sure to open a new issue instead of commenting on the PR here

* add: controlnet sdxl. * modifications to controlnet. * run styling. * add: __init__.pys * incorporate huggingface#4019 changes. * run make fix-copies. * resize the conditioning images. * remove autocast. * run styling. * disable autocast. * debugging * device placement. * back to autocast. * remove comment. * save some memory by reusing the vae and unet in the pipeline. * apply styling. * Allow low precision sd xl * finish * finish * changes to accommodate the improved VAE. * modifications to how we handle vae encoding in the training. * make style * make existing controlnet fast tests pass. * change vae checkpoint cli arg. * fix: vae pretrained paths. * fix: steps in get_scheduler(). * debugging. * debugging./ * fix: weight conversion. * add: docs. * add: limited tests./ * add: datasets to the requirements. * update docstrings and incorporate the usage of watermarking. * incorporate fix from huggingface#4083 * fix watermarking dependency handling. * run make-fix-copies. * Empty-Commit * Update requirements_sdxl.txt * remove vae upcasting part. * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * run make style * run make fix-copies. * disable suppot for multicontrolnet. * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * run make fix-copies. * dtyle/. * fix-copies. --------- Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

sayakpaul added 2 commits July 11, 2023 16:05

add: controlnet sdxl.

c6c9f3a

modifications to controlnet.

185b67b

sayakpaul requested review from patrickvonplaten and williamberman July 11, 2023 10:51

run styling.

db78a4c

sayakpaul mentioned this pull request Jul 11, 2023

[SDXL 0.9] Quick question regarding SDXL and example training scripts #4013

Closed

sayakpaul added 3 commits July 11, 2023 20:15

add: __init__.pys

af8273a

incorporate #4019 changes.

c8b00de

run make fix-copies.

68f2c38

sayakpaul added 5 commits July 12, 2023 07:39

resize the conditioning images.

4b86d63

remove autocast.

dade681

run styling.

15d4afd

disable autocast.

5ed7d3e

debugging

64b0e20

sayakpaul added 2 commits July 12, 2023 08:43

device placement.

f482f46

back to autocast.

c2bbb2b

patrickvonplaten mentioned this pull request Jul 18, 2023

bug:convert controlnet model(custom datasets train) to diffusers failure #4101

Closed

patrickvonplaten reviewed Jul 18, 2023

View reviewed changes

src/diffusers/__init__.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Jul 18, 2023

View reviewed changes

src/diffusers/pipelines/__init__.py Outdated Show resolved Hide resolved

sayakpaul and others added 6 commits July 18, 2023 17:25

disable suppot for multicontrolnet.

5d53cb8

Merge branch 'main' into feat/sd-xl-controlnet-2

e430d1a

Apply suggestions from code review

740944a

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

run make fix-copies.

e35cf2b

dtyle/.

d7aecd2

fix-copies.

f129bc4

patrickvonplaten approved these changes Jul 18, 2023

View reviewed changes

sayakpaul merged commit 3eb498e into main Jul 18, 2023
10 checks passed

sayakpaul deleted the feat/sd-xl-controlnet-2 branch July 18, 2023 12:55

williamberman mentioned this pull request Jul 18, 2023

During multi-gpus training, each card will execute the cache text embedding operation once? #4089

Closed

gkorepanov mentioned this pull request Jul 19, 2023

[WIP] SDXL ControlNet pipeline follow-up fixes #4155

Closed

yutongli mentioned this pull request Jul 21, 2023

controlnet with sdxl infer black images even after rebasing #4038 #4185

Closed

sayakpaul mentioned this pull request Jul 21, 2023

[SDXL ControlNet Training] Follow-up fixes #4188

Merged

sayakpaul mentioned this pull request Jul 22, 2023

the main branch is broken for controlnet training with sdxl #4206

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] add: controlnet support for SDXL #4038

[Core] add: controlnet support for SDXL #4038

sayakpaul commented Jul 11, 2023 •

edited

HuggingFaceDocBuilderDev commented Jul 11, 2023 •

edited

gkorepanov commented Jul 11, 2023

sayakpaul commented Jul 11, 2023

gkorepanov commented Jul 11, 2023

sayakpaul commented Jul 11, 2023

gkorepanov commented Jul 11, 2023

sayakpaul commented Jul 11, 2023

laksjdjf commented Jul 11, 2023 •

edited

patrickvonplaten commented Jul 11, 2023

sayakpaul commented Jul 11, 2023

gkorepanov commented Jul 11, 2023 •

edited

gkorepanov commented Jul 11, 2023 •

edited

sayakpaul commented Jul 12, 2023

sayakpaul commented Jul 12, 2023

sayakpaul commented Jul 18, 2023

adhikjoshi commented Jul 18, 2023

sayakpaul commented Jul 18, 2023

patrickvonplaten left a comment

sayakpaul commented Jul 18, 2023

sayakpaul commented Jul 21, 2023

zdxpan commented Aug 29, 2023

sayakpaul commented Aug 29, 2023 •

edited

patrickvonplaten commented Aug 29, 2023

[Core] add: controlnet support for SDXL #4038

[Core] add: controlnet support for SDXL #4038

Conversation

sayakpaul commented Jul 11, 2023 • edited

TODOs

HuggingFaceDocBuilderDev commented Jul 11, 2023 • edited

gkorepanov commented Jul 11, 2023

sayakpaul commented Jul 11, 2023

gkorepanov commented Jul 11, 2023

sayakpaul commented Jul 11, 2023

gkorepanov commented Jul 11, 2023

sayakpaul commented Jul 11, 2023

laksjdjf commented Jul 11, 2023 • edited

patrickvonplaten commented Jul 11, 2023

sayakpaul commented Jul 11, 2023

gkorepanov commented Jul 11, 2023 • edited

gkorepanov commented Jul 11, 2023 • edited

sayakpaul commented Jul 12, 2023

sayakpaul commented Jul 12, 2023

sayakpaul commented Jul 18, 2023

adhikjoshi commented Jul 18, 2023

sayakpaul commented Jul 18, 2023

patrickvonplaten left a comment

Choose a reason for hiding this comment

sayakpaul commented Jul 18, 2023

sayakpaul commented Jul 21, 2023

zdxpan commented Aug 29, 2023

sayakpaul commented Aug 29, 2023 • edited

patrickvonplaten commented Aug 29, 2023

sayakpaul commented Jul 11, 2023 •

edited

HuggingFaceDocBuilderDev commented Jul 11, 2023 •

edited

laksjdjf commented Jul 11, 2023 •

edited

gkorepanov commented Jul 11, 2023 •

edited

gkorepanov commented Jul 11, 2023 •

edited

sayakpaul commented Aug 29, 2023 •

edited