-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Add Support for Z-Image Series #12703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks so much for the PR!
I left some feedbacks, I think the most important change is the attention_backend, we should be able to refactor using dispatch_attention_fn and it work out of box with both naive and flash_varlen, insteand of manually handle both
…, Remove once func in pipeline.
…ryWu-code/z-image # Conflicts: # src/diffusers/models/transformers/transformer_z_image.py
…peat; Add hint for attn processor.
…ace its origin implement; Add DocString in pipeline for that.
|
@bot /style |
|
Style fix runs successfully without any file modified. |
| # Hardcoded for now because pytorch does not support tuple/int type hints | ||
| window_size = (-1, -1) | ||
| out, lse, *_ = flash_attn_3_func( | ||
| max_seqlen_q = q.shape[2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh, what's the reason to use _flash_attn_forward instead here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that may due to mismatch version of flash-attntion 3 with yours, older version (only return out in the default flash_attn_func, Version "flash-attn-3==3.0.0b1") was not compatible with lse & *_ return, and we will fix this in the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially fixed flash_attn3 in https://github.com/JerryWu-code/diffusers/blob/8a6cb74e7319433126ab09526288ada496a83523/src/diffusers/models/attention_dispatch.py#L651, but not including in this pull request, easy for merging.
| import torch | ||
| import torch.nn as nn | ||
| import torch.nn.functional as F | ||
| from einops import rearrange |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh I missed this, can we remove einops dependency too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure we would fix that ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 1dd8f3c.
…rd, replace its origin implement; Add DocString in pipeline for that." This reverts commit fbf26b7.
6c0c059 to
71e8049
Compare
…al commit for fa3 compatibility.
… pre-encode as List of torch Tensor.
|
This newest branch a74a0c4 should be ready to merge, already merge with main branch right now~, check whether any further things to fix 😊 |
|
@yiyixuxu The recent FA3 commit (Dao-AILab/flash-attention@203b9b3) introduced a
We have implemented a fix that handles this while maintaining backward compatibility: JerryWu-code@de4c6f1#diff-b027e126a86a26981384b125714e0f3bd9923eaa8322f1ae5f6b53fe3e3481c2 Should we include this fix in the current PR, or would you prefer us to open a separate PR for it? |
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move this folder for now!
we can add test in a follow-up PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, would do this unit-test for all cases in next commit ~ 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@ChrisLiu6
also need tests & doc in a follow-up PR too :) |
|
also run |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hi @yiyixuxu , ready for merge😄 |
|
@JerryWu-code I found it on HF. |
|
Another question: it doesn't seem to accept image input in the Diffusers implementation, but does the model accept image input? Someone in the ComfyUI community claims it does, but the HF Spaces doesn't have this feature. So, is image input possible or not? |
|
@tin2tin the edit model is |
What does this PR do?
This PR introduces Z-Image Series into the diffusers library. Z-Image is a powerful and highly efficient 6B-parameter image generation model that is friendly for consumer-grade hardware, with strong capabilities in photorealistic image generation, accurate rendering of both complex Chinese and English text, and robust adherence to bilingual instructions. The technical report and Z-Image-Turbo checkpoint will be released very soon.
Thanks for the support of @yiyixuxu @apolinario.