Add Support for Z-Image Series #12703

JerryWu-code · 2025-11-23T20:29:01Z

What does this PR do?

This PR introduces Z-Image Series into the diffusers library. Z-Image is a powerful and highly efficient 6B-parameter image generation model that is friendly for consumer-grade hardware, with strong capabilities in photorealistic image generation, accurate rendering of both complex Chinese and English text, and robust adherence to bilingual instructions. The technical report and Z-Image-Turbo checkpoint will be released very soon.

Thanks for the support of @yiyixuxu @apolinario.

yiyixuxu

thanks so much for the PR!
I left some feedbacks, I think the most important change is the attention_backend, we should be able to refactor using dispatch_attention_fn and it work out of box with both naive and flash_varlen, insteand of manually handle both

src/diffusers/models/transformers/transformer_z_image.py

src/diffusers/pipelines/z_image/pipeline_z_image.py

…, Remove once func in pipeline.

…ryWu-code/z-image # Conflicts: # src/diffusers/models/transformers/transformer_z_image.py

…peat; Add hint for attn processor.

…ace its origin implement; Add DocString in pipeline for that.

yiyixuxu · 2025-11-24T20:21:47Z

@bot /style

github-actions · 2025-11-24T20:22:10Z

Style fix runs successfully without any file modified.

yiyixuxu · 2025-11-24T20:40:21Z

src/diffusers/models/attention_dispatch.py

    # Hardcoded for now because pytorch does not support tuple/int type hints
    window_size = (-1, -1)
-    out, lse, *_ = flash_attn_3_func(
+    max_seqlen_q = q.shape[2]


ohh, what's the reason to use _flash_attn_forward instead here?

Sorry that may due to mismatch version of flash-attntion 3 with yours, older version (only return out in the default flash_attn_func, Version "flash-attn-3==3.0.0b1") was not compatible with lse & *_ return, and we will fix this in the next commit.

Partially fixed flash_attn3 in https://github.com/JerryWu-code/diffusers/blob/8a6cb74e7319433126ab09526288ada496a83523/src/diffusers/models/attention_dispatch.py#L651, but not including in this pull request, easy for merging.

yiyixuxu · 2025-11-24T21:04:25Z

src/diffusers/models/transformers/transformer_z_image.py

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from einops import rearrange


ohh I missed this, can we remove einops dependency too?

Sure we would fix that ~

Fixed in 1dd8f3c.

…rd, replace its origin implement; Add DocString in pipeline for that." This reverts commit fbf26b7.

…al commit for fa3 compatibility.

… pre-encode as List of torch Tensor.

JerryWu-code · 2025-11-25T07:22:17Z

This newest branch a74a0c4 should be ready to merge, already merge with main branch right now~, check whether any further things to fix 😊

ChrisLiu6 · 2025-11-25T07:47:53Z

@yiyixuxu
By the way, while testing the _flash_3 and _flash_varlen_3 backends, we noticed that the current implementation in attention_dispatch.py is incompatible with the latest Flash Attention 3 APIs.

The recent FA3 commit (Dao-AILab/flash-attention@203b9b3) introduced a return_attn_probs argument and changed the default behavior. The functions now return a single output tensor by default (instead of a tuple), which causes the current tuple unpacking logic in diffusers to fail:

diffusers/src/diffusers/models/attention_dispatch.py

Lines 1547 to 1557 in d33d9f6

    
           out, lse, *_ = flash_attn_3_varlen_func( 
        
               q=query_packed, 
        
               k=key_packed, 
        
               v=value_packed, 
        
               cu_seqlens_q=cu_seqlens_q, 
        
               cu_seqlens_k=cu_seqlens_k, 
        
               max_seqlen_q=max_seqlen_q, 
        
               max_seqlen_k=max_seqlen_k, 
        
               softmax_scale=scale, 
        
               causal=is_causal, 
        
           )

diffusers/src/diffusers/models/attention_dispatch.py

Lines 628 to 645 in d33d9f6

    
           out, lse, *_ = flash_attn_3_func( 
        
               q=q, 
        
               k=k, 
        
               v=v, 
        
               softmax_scale=softmax_scale, 
        
               causal=causal, 
        
               qv=qv, 
        
               q_descale=q_descale, 
        
               k_descale=k_descale, 
        
               v_descale=v_descale, 
        
               window_size=window_size, 
        
               attention_chunk=attention_chunk, 
        
               softcap=softcap, 
        
               num_splits=num_splits, 
        
               pack_gqa=pack_gqa, 
        
               deterministic=deterministic, 
        
               sm_margin=sm_margin, 
        
           )

We have implemented a fix that handles this while maintaining backward compatibility:

JerryWu-code@de4c6f1#diff-b027e126a86a26981384b125714e0f3bd9923eaa8322f1ae5f6b53fe3e3481c2

Should we include this fix in the current PR, or would you prefer us to open a separate PR for it?

yiyixuxu

thanks!

src/diffusers/models/transformers/transformer_z_image.py

yiyixuxu · 2025-11-25T08:34:33Z

tests/pipelines/z_image/__init__.py

let's move this folder for now!
we can add test in a follow-up PR!

Sure, would do this unit-test for all cases in next commit ~ 😊

Another pr for unittest is available @ #12715. Thanks for reviewing yiyi @yiyixuxu 😊 ～

yiyixuxu · 2025-11-25T08:36:08Z

@ChrisLiu6
oh thanks! let's do this in a seperate PR since we are eager to get this one in!

Should we include this fix in the current PR, or would you prefer us to open a separate PR for it?

also need tests & doc in a follow-up PR too :)

yiyixuxu · 2025-11-25T08:38:04Z

also run make fix-copies and make style

HuggingFaceDocBuilderDev · 2025-11-25T08:40:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ChrisLiu6 · 2025-11-25T13:29:58Z

Hi @yiyixuxu , ready for merge😄

tin2tin · 2025-11-26T18:16:25Z

@JerryWu-code
Super cool. I would like to test it. Is there some code example somewhere on how to use it with Diffusers?

I found it on HF.

tin2tin · 2025-11-27T08:56:59Z

Another question: it doesn't seem to accept image input in the Diffusers implementation, but does the model accept image input? Someone in the ComfyUI community claims it does, but the HF Spaces doesn't have this feature.

So, is image input possible or not?

asomoza · 2025-11-27T11:49:45Z

@tin2tin the edit model is soon to be released, you can see it in the github repo. Not sure why is there someone claiming that the turbo model accepts image inputs, as is a pure t2i model

JerryWu-code added 2 commits November 23, 2025 19:54

Add Support for Z-Image.

42658fa

Reformatting with make style, black & isort.

3e74bb2

yiyixuxu reviewed Nov 24, 2025

View reviewed changes

JerryWu-code and others added 14 commits November 24, 2025 08:12

Remove init, Modify import utils, Merge forward in transformers block…

a4b89a0

…, Remove once func in pipeline.

modified main model forward, freqs_cis left

7df350d

Merge remote-tracking branch 'JerryWu-code/z-image-dev' into fork/Jer…

1dd587b

…ryWu-code/z-image # Conflicts: # src/diffusers/models/transformers/transformer_z_image.py

refactored to add B dim

aae03cf

fixed stack issue

21d8130

fixed modulation bug

e3dfa9e

fixed modulation bug

a7fa731

fix bug

1e0cefe

remove value_from_time_aware_config

7adaae8

styling

5b4c907

Fix neg embed and devide / bug; Reuse pad zero tensor; Turn cat -> re…

2bb39f4

…peat; Add hint for attn processor.

Replace padding with pad_sequence; Add gradient checkpointing.

71e8049

Fix flash_attn3 in dispatch attn backend by _flash_attn_forward, repl…

fbf26b7

…ace its origin implement; Add DocString in pipeline for that.

Fix Docstring and Make Style.

6c0c059

yiyixuxu reviewed Nov 24, 2025

View reviewed changes

ChrisLiu6 added 4 commits November 25, 2025 13:08

Revert "Fix flash_attn3 in dispatch attn backend by _flash_attn_forwa…

28685dd

…rd, replace its origin implement; Add DocString in pipeline for that." This reverts commit fbf26b7.

update z-image docstring

8e391b7

Revert attention dispatcher

3b22e84

update z-image docstring

3d1a7aa

JerryWu-code force-pushed the z-image branch from 6c0c059 to 71e8049 Compare November 25, 2025 05:41

ChrisLiu6 and others added 4 commits November 25, 2025 13:43

styling

336c5ce

Recover attention_dispatch.py with its origin impl, later would speci…

38a89ed

…al commit for fa3 compatibility.

Fix prev bug, and support for prompt_embeds pass in args after prompt…

69d61e5

… pre-encode as List of torch Tensor.

Merge branch 'z-image-dev-ql' into z-image-dev

549ad57

JerryWu-code added 3 commits November 25, 2025 06:43

Remove einop dependency.

1dd8f3c

Merge branch 'z-image-dev' into z-image

2f2d8c3

Merge remote-tracking branch 'origin/main' into z-image

a74a0c4

yiyixuxu approved these changes Nov 25, 2025

View reviewed changes

src/diffusers/models/transformers/transformer_z_image.py Outdated Show resolved Hide resolved

yiyixuxu reviewed Nov 25, 2025

View reviewed changes

ChrisLiu6 added 2 commits November 25, 2025 21:15

remove redundant imports & make fix-copies

e49a1f9

fix import

1048d0a

yiyixuxu merged commit 4088e8a into huggingface:main Nov 25, 2025
9 of 11 checks passed

JerryWu-code mentioned this pull request Nov 26, 2025

Support unittest for Z-image ⚡️ #12715

Merged

4 tasks

Add Support for Z-Image Series #12703

Add Support for Z-Image Series #12703

Conversation

JerryWu-code commented Nov 23, 2025

What does this PR do?

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiyixuxu commented Nov 24, 2025

Uh oh!

github-actions bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JerryWu-code Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JerryWu-code commented Nov 25, 2025

Uh oh!

ChrisLiu6 commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Nov 25, 2025

Uh oh!

yiyixuxu commented Nov 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 25, 2025

Uh oh!

ChrisLiu6 commented Nov 25, 2025

Uh oh!

Uh oh!

tin2tin commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tin2tin commented Nov 27, 2025

Uh oh!

asomoza commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

github-actions bot commented Nov 24, 2025 •

edited

Loading

JerryWu-code Nov 25, 2025 •

edited

Loading

ChrisLiu6 commented Nov 25, 2025 •

edited

Loading

tin2tin commented Nov 26, 2025 •

edited

Loading