Skip to content

Add support for Multiple ControlNetXSAdapters in SDXL pipeline#12100

Open
naomili0924 wants to merge 2 commits intohuggingface:mainfrom
naomili0924:unet_multicontrolnets_xs
Open

Add support for Multiple ControlNetXSAdapters in SDXL pipeline#12100
naomili0924 wants to merge 2 commits intohuggingface:mainfrom
naomili0924:unet_multicontrolnets_xs

Conversation

@naomili0924
Copy link
Copy Markdown

@naomili0924 naomili0924 commented Aug 8, 2025

What does this PR do?

This PR is addressing the feature request from an open good-first issue: #8434 It extends the current controlnet adaptor logic to support multiple controlnet adaptors injected into diffusion model.

Before this change, StableDiffusionXLControlNetXSPipeline loads UNet base model and only supports a single point injection from only one controlnet, as shown below.


pipe = StableDiffusionXLControlNetXSPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", controlnet=depth_controlnet, torch_dtype=torch.float16
).to("cuda")

With this change, we allows the StableDiffusionXLControlNetXSPipeline to take a new UnetConditionModel named MultiControlUnetConditionModel to load weights from multiple controlnets and inject every single controlnet output to base model through zero convolution layers.

pipe = StableDiffusionXLControlNetXSPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", controlnet=[depth_controlnet, canny_controlnet], torch_dtype=torch.float16
).to("cuda")

Due to we didn't find any test repo to verify this change, we provide the following code and used it to verify our change (not included in this repo).

import torch
from diffusers.models.autoencoders import AutoencoderKL
from diffusers.models.controlnets import ControlNetXSAdapter
from diffusers.pipelines.controlnet_xs import StableDiffusionXLControlNetXSPipeline
from PIL import Image
from diffusers.utils import load_image
import cv2
import numpy as np

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
canny_controlnet = ControlNetXSAdapter.from_pretrained(
    "UmerHA/Testing-ConrolNetXS-SDXL-canny", torch_dtype=torch.float16
).to("cuda")
depth_controlnet = ControlNetXSAdapter.from_pretrained(
    "UmerHA/Testing-ConrolNetXS-SDXL-depth", torch_dtype=torch.float16
).to("cuda")

pipe = StableDiffusionXLControlNetXSPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", controlnet=[depth_controlnet, canny_controlnet], torch_dtype=torch.float16
).to("cuda")

# Load your conditioning images
raw_image = load_image("/content/drive/MyDrive/diffusers/src/test.jpg").convert("RGB")

# Generate Canny edge
def get_canny(image):
    image_np = np.array(image.resize((512, 512)))
    image_gray = cv2.cvtColor(image_np, cv2.COLOR_RGB2GRAY)
    edges = cv2.Canny(image_gray, 100, 200)
    edges = np.stack([edges] * 3, axis=-1)  # Make it 3-channel
    return Image.fromarray(edges)

# Generate a fake depth map (in real cases use a depth estimator)
def get_fake_depth(image):
    image_np = np.array(image.resize((512, 512)))
    gray = cv2.cvtColor(image_np, cv2.COLOR_RGB2GRAY)
    depth = np.stack([gray] * 3, axis=-1)
    return Image.fromarray(depth)

canny_image = get_canny(raw_image)
depth_image = get_fake_depth(raw_image)

# Run inference
prompt = "A flower"
output = pipe(
    prompt=prompt,
    image=[canny_image, depth_image],  # order matches controlnet list
    controlnet_conditioning_scale = [0.2, 0.8],
    num_inference_steps=30,
    generator=torch.manual_seed(0),
)

output.images[0].save("output.png")

Design Details:

Stage 1: Prepare embeddings for base Unet and ControlNets

Each controlnet has its own controlnet_cond_embedding module and control_to_base_for_conv_in module to calculate control embeddings and add the embedding onto h_base.

With the new change, we will have one h_base (input for base unet model) and a list of h_ctrls (inputs for controlnets) with the same length of controlnets after this stage.

image

Stage 2: Up and Mid Unet and ControlNet blocks

Each controlnet has its own base_to_control and control_to_base convolution layers, and the number of base_to_control and control_to_base layer is the same as the number of base UNet layers.

For each layer, we concate h_ctrl with b2c(h_base) as the input for controlnet. We probability need to retrain the b2c model because the h_base is a linear combination of original h_base and all the h_ctrl from all the controlnets. (previously, h_base is only contains h_base and one controlnet output).

After each resnet and attention block, we add weighted linear combination c2b(h_ctrls) to the h_base.

After this stage, we're going to have one h_base and a list of h_ctrl as the stage one.

image

Stage 3: Decoding Stage

In the decoding stage, we only use control_to_base convolution layers from each controlnet.

In the following image, zero convolution layers from each controlnet are grouped by layer, the zero convolution layers with the same dashed colors are in the same group. The residual output are connected with lines in the same color.

For each layer, the controlnet residual were added to the h_base by zero convolution layers and weighted.

After adding weighted controlnet residuals, the h_base were passed to resnet and attention model to decode images. We won't use and actually sometimes we don't have the resnet+attention blocks from each controlnet upblocks here.

image

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@naomili0924 naomili0924 changed the title Add support for list of ControlNetXSAdapters in SDXL pipeline Add support for Multiple ControlNetXSAdapters in SDXL pipeline Aug 8, 2025
@DN6
Copy link
Copy Markdown
Collaborator

DN6 commented Aug 12, 2025

Hi @naomili0924 thank you for putting the time into this, unfortunately we had to deprecate ControlnetXS due to low usage (You'll notice that the pipeline inherits from DeprecatedPipelineMixin.). We are not actively updating/adding features to it at this time.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 9, 2026

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions Bot added the stale Issues that haven't received updates label Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Issues that haven't received updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants