-
Notifications
You must be signed in to change notification settings - Fork 464
add z-image support #1020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add z-image support #1020
Conversation
leejet
commented
Nov 29, 2025
|
There seems to be an issue with the text rendering (or possibly long prompts). Comparing with rmatif's implementation from #1018 :
(Vulkan+radv) |
|
To support llama.cpp Qwen3 quants: |
|
I’ll update it later, including compatibility with different kinds of LoRA models. |
|
I'm getting fully black images on ROCm (Linux), even on the first image preview. My card: |
|
Same with the other pr, in most cases I get black images after the second step. (cuda rtx2070) |
@Green-Sky Could you share the cmd? |
|
Works nicely on an MI50 with ROCm. Thanks! |
Also black at step3 with lower resolution, without flash attention. That specific quant mix is from https://huggingface.co/jayn7/Z-Image-Turbo-GGUF/blob/main/z_image_turbo-Q3_K_M.gguf.
👉 But it works with I only tested up to 1024x1024 Driver Version: 575.51.02 |
|
I uploaded a few quantized models to https://huggingface.co/wbruna/Z-Image-Turbo-sdcpp-GGUF/tree/main , to make testing the lower quants a bit easier. Even legacy q4 can work very well: this is with 5 steps, q8_0 llm, euler+sgm_uniform, hitting ~4G peak VRAM with Vulkan +
|
@Green-Sky This should be fixed. Could you try again?
|
|
@wbruna Fixed. |
Comparison of Different Quantization TypesGGUF: https://huggingface.co/leejet/Z-Image-Turbo-GGUF/tree/main
|
|
I think this PR can be merged now. Thank you everyone! |
|
It seems it has the same problem for ROCm as described in #968 |
|
Since I don’t have a ROCm device to reproduce the issue, it’s difficult for me to locate and fix the problem. |
|
@leejet , if you could guide me through it, I could try to debug it on my card. I tried to run the |
|
@leejet I can setup SSH access to my ROCm PC for you, just drop me an e-mail or something. |
|
I can confirm that the issue from #968 is also happening with this model. I also don't know how to help troubleshooting. The nan issue is interesting because it is why I switched from python/pytorch, etc. to this to run Qwen, I kept running into this issue where all values where nan. |
|
./build/bin/sd --diffusion-model models/z_image_turbo-Q4_K_M.gguf --vae models/vae/diffusion_pytorch_model.safetensors --llm models/Qwen3-4B-Q8_0.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 1.0 -v -H 1024 -W 512 --steps 9 --seed 1061061743296960 --scheduler simple [DEBUG] ggml_extend.hpp:1688 - z_image compute buffer size: 704.48 MB(RAM) |
The model is in RAM, not VRAM, which indicates that the program is running on your CPU, not on the 4090. You should use the CUDA or Vulkan builds if you want to take advantage of the GPU. |
@wbruna You can try changing the default value of the scale parameter in the If the issue is not caused by precision in the |
|
This worked for me: sd \
--diffusion-model models/z_image_turbo-Q6_K.gguf \
--vae ae.safetensors \
--llm models/Qwen3-4B-Q8_0.gguf \
--cfg-scale 1.0 -v \
--offload-to-cpu \
--diffusion-fa \
-H 1024 -W 512 \
-p "a lovely plump cat"But it took ages... I'm on an M1 macbook air (16GB memory) Is there any way to make this quicker?
|
|
@leejet , found it! diff --git a/z_image.hpp b/z_image.hpp
index b692a14..5d07f6f 100644
--- a/z_image.hpp
+++ b/z_image.hpp
@@ -30,7 +30,7 @@ namespace ZImage {
JointAttention(int64_t hidden_size, int64_t head_dim, int64_t num_heads, int64_t num_kv_heads, bool qk_norm)
: head_dim(head_dim), num_heads(num_heads), num_kv_heads(num_kv_heads), qk_norm(qk_norm) {
blocks["qkv"] = std::make_shared<Linear>(hidden_size, (num_heads + num_kv_heads * 2) * head_dim, false);
- blocks["out"] = std::make_shared<Linear>(num_heads * head_dim, hidden_size, false);
+ blocks["out"] = std::make_shared<Linear>(num_heads * head_dim, hidden_size, false, false, false, 1.f / 8.f);
if (qk_norm) {
blocks["q_norm"] = std::make_shared<RMSNorm>(head_dim);
blocks["k_norm"] = std::make_shared<RMSNorm>(head_dim);That 1/8 was enough to generate images on my card. Setting force_prec_f32 worked too, while being around 30% slower. I could make a PR, but I'm not sure how to enable it only when needed? |
|
@shakfu Are you sure you have Metal enabled? If you run it with |
|
People getting black images on ROCm, especially right at the beginning: please give #1034 a try. |
|
@stduhpf Thanks for your help. I'll run it again and check if metal is enabled. |
|
Why am I getting this error: My command line: My device is Nvidia GTX 3060 with 12GB of vmem, OS Windows 11. |
|
.\build\bin\sd.exe --diffusion-model models/z-image/z_image_turbo-Q8_0.gguf --vae models/flux-extra/diffusion_pytorch_model.safetensors --llm models/z-image/qwen_3_4b.safetensors --preview proj -p "girl" |
@RealHacker I believe the latest release has already addressed this issue. For details, please refer to this PR: #1062. You can give it another try. |
@popters Please create an issue describing your environment, such as the commit you are using, the backend, etc., or search for existing issues and provide the relevant information. |
你好,我用的是这种 ### 这是用GPU编译后生成的日志信息: |
目前最新的情况换成(z_image_turbo_bf16.safetensors)这个模型可以生成图片,换成其他的量化gguf模型都是黑图 |
|
@popters , please create a separate issue, so your problem can be properly tracked. |



















