Skip to content

fix(controlnet_hunyuan): wire num_layers, pooled_projection_dim, use_style_cond into from_transformer (#13641)#13658

Open
Anai-Guo wants to merge 1 commit intohuggingface:mainfrom
Anai-Guo:fix-hunyuandit-from-transformer
Open

fix(controlnet_hunyuan): wire num_layers, pooled_projection_dim, use_style_cond into from_transformer (#13641)#13658
Anai-Guo wants to merge 1 commit intohuggingface:mainfrom
Anai-Guo:fix-hunyuandit-from-transformer

Conversation

@Anai-Guo
Copy link
Copy Markdown

Summary

Fixes Issue 1 in #13641.

HunyuanDiT2DControlNetModel.from_transformer() was reading config.transformer_num_layers, but HunyuanDiT2DModel stores the layer count under config.num_layers. The default call path therefore raised AttributeError. Even when transformer_num_layers was passed manually, the method also dropped pooled_projection_dim and use_style_cond_and_image_meta_size, so a ControlNet built from a non-default transformer silently fell back to defaults (pooled_projection_dim=1024, use_style_cond_and_image_meta_size=True) and could fail weight loading with size mismatches.

Fix

  • Read config.num_layers (the actual HunyuanDiT2DModel config attribute) rather than the non-existent transformer_num_layers.
  • Forward pooled_projection_dim and use_style_cond_and_image_meta_size from the source transformer config, so the ControlNet is constructed with the same conditioning embedding shapes and style-cond setting as the transformer it derives from.

Verification

The reproduction snippet from #13641 now works:

from diffusers import HunyuanDiT2DControlNetModel, HunyuanDiT2DModel

transformer = HunyuanDiT2DModel(
    sample_size=8, num_layers=4, patch_size=2,
    attention_head_dim=4, num_attention_heads=2, in_channels=4,
    cross_attention_dim=8, cross_attention_dim_t5=8,
    pooled_projection_dim=4, hidden_size=8, text_len=4, text_len_t5=4,
    use_style_cond_and_image_meta_size=False,
)

cn = HunyuanDiT2DControlNetModel.from_transformer(transformer)  # no AttributeError
assert cn.config.transformer_num_layers == 4
assert cn.config.pooled_projection_dim == 4
assert cn.config.use_style_cond_and_image_meta_size is False

Existing default-config call sites (e.g. with the public Tencent-Hunyuan/HunyuanDiT-Diffusers checkpoint) are unchanged: the default pooled_projection_dim=1024 and use_style_cond_and_image_meta_size=True match the public config exactly, so they round-trip to the same values that were previously hardcoded by omission.

🤖 Generated with Claude Code

@github-actions github-actions Bot added models size/S PR with diff < 50 LOC labels Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

models size/S PR with diff < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant