Skip to content

[Bug] Slow on gfx1150 with both Vulkan and ROCm builds #1049

@bitgamma

Description

@bitgamma

Git commit

985aedd

Operating System & Version

Arch

GGML backends

Vulkan, HIP

Command-line arguments used

./bin/sd --diffusion-model /data/comfyui/models/diffusion_models/z_image_turbo_bf16.safetensors --vae /data/comfyui/models/vae/ae.safetensors --llm /data/comfyui/models/text_encoders/qwen_3_4b.safetensors -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 1.0 -v --offload-to-cpu -H 512 -W 512 --rng cpu --steps 5 --rng cpu --seed 1061061743296960 --scheduler simple

Steps to reproduce

Run the commands with either the Vulkan or ROCm build of sd on gfx1150

What you expected to happen

Get a little over 2s/it like in ComfyUI (comfyui is using ROCm)

What actually happened

Get 12s/it with Vulkan and 14s/it with ROCm

Logs / error messages / stack trace

System Info:
SSE3 = 1 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | VSX = 0 | SDCliParams {
mode: img_gen,
output_path: "output.png",
verbose: true,
color: false,
canny_preprocess: false,
preview_method: none,
preview_interval: 1,
preview_path: "preview.png",
preview_fps: 16,
taesd_preview: false,
preview_noisy: false
}
SDContextParams {
n_threads: 12,
model_path: "",
clip_l_path: "",
clip_g_path: "",
clip_vision_path: "",
t5xxl_path: "",
llm_path: "/data/comfyui/models/text_encoders/qwen_3_4b.safetensors",
llm_vision_path: "",
diffusion_model_path: "/data/comfyui/models/diffusion_models/z_image_turbo_bf16.safetensors",
high_noise_diffusion_model_path: "",
vae_path: "/data/comfyui/models/vae/ae.safetensors",
taesd_path: "",
esrgan_path: "",
control_net_path: "",
embedding_dir: "",
wtype: NONE,
tensor_type_rules: "",
lora_model_dir: "",
photo_maker_path: "",
rng_type: cpu,
sampler_rng_type: NONE,
flow_shift: INF
offload_params_to_cpu: true,
control_net_cpu: false,
clip_on_cpu: false,
vae_on_cpu: false,
diffusion_flash_attn: false,
diffusion_conv_direct: false,
vae_conv_direct: false,
chroma_use_dit_mask: true,
chroma_use_t5_mask: false,
chroma_t5_mask_pad: 1,
prediction: NONE,
lora_apply_mode: auto,
vae_tiling_params: { 0, 0, 0, 0.5, 0, 0 },
force_sdxl_vae_conv_scale: false
}
SDGenerationParams {
prompt: "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic",
negative_prompt: "",
clip_skip: -1,
width: 512,
height: 512,
batch_count: 1,
init_image_path: "",
end_image_path: "",
mask_image_path: "",
control_image_path: "",
ref_image_paths: [],
control_video_path: "",
auto_resize_ref_image: true,
increase_ref_index: false,
pm_id_images_dir: "",
pm_id_embed_path: "",
pm_style_strength: 20,
skip_layers: [7, 8, 9],
sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: simple, sample_method: NONE, sample_steps: 5, eta: 0.00, shifted_timestep: 0),
high_noise_skip_layers: [7, 8, 9],
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: 0.00, shifted_timestep: 0),
easycache_option: "",
easycache: disabled (threshold=0, start=0, end=0),
moe_boundary: 0.875,
video_frames: 1,
fps: 16,
vace_strength: 1,
strength: 0.75,
control_strength: 0.9,
seed: 1061061743296960,
upscale_repeats: 1,
}
[DEBUG] stable-diffusion.cpp:167 - Using Vulkan backend
[DEBUG] ggml_extend.hpp:66 - ggml_vulkan: Found 1 Vulkan devices:
[DEBUG] ggml_extend.hpp:66 - ggml_vulkan: 0 = AMD Radeon 890M Graphics (RADV GFX1150) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
[INFO ] stable-diffusion.cpp:234 - loading diffusion model from '/data/comfyui/models/diffusion_models/z_image_turbo_bf16.safetensors'
[INFO ] model.cpp:385 - load /data/comfyui/models/diffusion_models/z_image_turbo_bf16.safetensors using safetensors format
[DEBUG] model.cpp:515 - init from '/data/comfyui/models/diffusion_models/z_image_turbo_bf16.safetensors', prefix = 'model.diffusion_model.'
[INFO ] stable-diffusion.cpp:281 - loading llm from '/data/comfyui/models/text_encoders/qwen_3_4b.safetensors'
[INFO ] model.cpp:385 - load /data/comfyui/models/text_encoders/qwen_3_4b.safetensors using safetensors format
[DEBUG] model.cpp:515 - init from '/data/comfyui/models/text_encoders/qwen_3_4b.safetensors', prefix = 'text_encoders.llm.'
[INFO ] stable-diffusion.cpp:295 - loading vae from '/data/comfyui/models/vae/ae.safetensors'
[INFO ] model.cpp:385 - load /data/comfyui/models/vae/ae.safetensors using safetensors format
[DEBUG] model.cpp:515 - init from '/data/comfyui/models/vae/ae.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:318 - Version: Z-Image
[INFO ] stable-diffusion.cpp:346 - Weight type stat: f32: 1095
[INFO ] stable-diffusion.cpp:347 - Conditioner weight type stat: f32: 398
[INFO ] stable-diffusion.cpp:348 - Diffusion model weight type stat: f32: 453
[INFO ] stable-diffusion.cpp:349 - VAE weight type stat: f32: 244
[DEBUG] stable-diffusion.cpp:351 - ggml tensor size = 400 bytes
[DEBUG] llm.hpp:285 - merges size 151387
[DEBUG] llm.hpp:317 - vocab size: 151665
[DEBUG] ggml_extend.hpp:1873 - qwen3 params backend buffer size = 7672.62 MB(RAM) (398 tensors)
[DEBUG] ggml_extend.hpp:1873 - z_image params backend buffer size = 23479.11 MB(RAM) (453 tensors)
[DEBUG] ggml_extend.hpp:1873 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:683 - loading weights
[DEBUG] model.cpp:1363 - using 12 threads for model loading
[DEBUG] model.cpp:1385 - loading tensors from /data/comfyui/models/diffusion_models/z_image_turbo_bf16.safetensors
|====================> | 453/1095 - 109.84it/s
[DEBUG] model.cpp:1385 - loading tensors from /data/comfyui/models/text_encoders/qwen_3_4b.safetensors
|======================================> | 851/1095 - 116.51it/s
[DEBUG] model.cpp:1385 - loading tensors from /data/comfyui/models/vae/ae.safetensors
|==================================================| 1095/1095 - 145.92it/s
[INFO ] model.cpp:1588 - loading tensors completed, taking 7.50s (process: 0.00s, read: 5.71s, memcpy: 0.00s, convert: 1.05s, copy_to_backend: 0.00s)
[DEBUG] stable-diffusion.cpp:710 - finished loaded file
[INFO ] stable-diffusion.cpp:767 - total params memory size = 31246.31MB (VRAM 31246.31MB, RAM 0.00MB): text_encoders 7672.62MB(VRAM), diffusion_model 23479.11MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:850 - running in FLOW mode
[DEBUG] stable-diffusion.cpp:3146 - generate_image 512x512
[INFO ] stable-diffusion.cpp:3177 - sampling using Euler method
[INFO ] denoiser.hpp:388 - get_sigmas with Simple scheduler
[INFO ] stable-diffusion.cpp:3290 - TXT2IMG
[DEBUG] conditioner.hpp:1701 - parse '<|im_start|>user
A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic<|im_end|>
<|im_start|>assistant
' to [['<|im_start|>user
', 1], ['A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic', 1], ['<|im_end|>
<|im_start|>assistant
', 1], ]
[DEBUG] llm.hpp:259 - split prompt "<|im_start|>user
" to tokens ["<|im_start|>", "user", "Ċ", ]
[DEBUG] llm.hpp:259 - split prompt "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" to tokens ["A", "Ġcinematic", ",", "Ġmelanch", "olic", "Ġphotograph", "Ġof", "Ġa", "Ġsolitary", "Ġhood", "ed", "Ġfigure", "Ġwalking", "Ġthrough", "Ġa", "Ġsprawling", ",", "Ġrain", "-s", "lick", "ed", "Ġmet", "ropolis", "Ġat", "Ġnight", ".", "ĠThe", "Ġcity", "Ġlights", "Ġare", "Ġa", "Ġchaotic", "Ġblur", "Ġof", "Ġneon", "Ġorange", "Ġand", "Ġcool", "Ġblue", ",", "Ġreflecting", "Ġon", "Ġthe", "Ġwet", "Ġasphalt", ".", "ĠThe", "Ġscene", "Ġev", "okes", "Ġa", "Ġsense", "Ġof", "Ġbeing", "Ġa", "Ġsingle", "Ġcomponent", "Ġin", "Ġa", "Ġvast", "Ġmachine", ".", "ĠSuper", "im", "posed", "Ġover", "Ġthe", "Ġimage", "Ġin", "Ġa", "Ġsleek", ",", "Ġmodern", ",", "Ġslightly", "Ġglitch", "ed", "Ġfont", "Ġis", "Ġthe", "Ġphilosophical", "Ġquote", ":", "Ġ'", "THE", "ĠCITY", "ĠIS", "ĠA", "ĠC", "IR", "CU", "IT", "ĠBOARD", ",", "ĠAND", "ĠI", "ĠAM", "ĠA", "ĠBRO", "KEN", "ĠTRANS", "IST", "OR", ".'", "Ġ--", "Ġmo", "ody", ",", "Ġatmospheric", ",", "Ġprofound", ",", "Ġdark", "Ġacademic", ]
[DEBUG] llm.hpp:259 - split prompt "<|im_end|>
<|im_start|>assistant
" to tokens ["<|im_end|>", "Ċ", "<|im_start|>", "assistant", "Ċ", ]
[INFO ] ggml_extend.hpp:1786 - qwen3 offload params (7672.62 MB, 398 tensors) to runtime backend (Vulkan0), taking 1.07s
[DEBUG] ggml_extend.hpp:1688 - qwen3 compute buffer size: 13.34 MB(VRAM)
[DEBUG] conditioner.hpp:1896 - computing condition graph completed, taking 1638 ms
[INFO ] stable-diffusion.cpp:2921 - get_learned_condition completed, taking 1640 ms
[INFO ] stable-diffusion.cpp:3032 - generating image: 1/1 - seed 1061061743296960
[INFO ] ggml_extend.hpp:1786 - z_image offload params (23479.11 MB, 453 tensors) to runtime backend (Vulkan0), taking 2.67s
[DEBUG] ggml_extend.hpp:1688 - z_image compute buffer size: 255.60 MB(VRAM)
|==================================================| 5/5 - 12.12s/it
[INFO ] stable-diffusion.cpp:3074 - sampling completed, taking 60.58s
[INFO ] stable-diffusion.cpp:3085 - generating 1 latent images completed, taking 60.66s
[INFO ] stable-diffusion.cpp:3088 - decoding 1 latents
[INFO ] ggml_extend.hpp:1786 - vae offload params ( 94.57 MB, 138 tensors) to runtime backend (Vulkan0), taking 0.02s
[DEBUG] ggml_extend.hpp:1688 - vae compute buffer size: 2112.25 MB(VRAM)
[DEBUG] stable-diffusion.cpp:2291 - computing vae decode graph completed, taking 9.97s
[INFO ] stable-diffusion.cpp:3098 - latent 1 decoded, taking 10.00s
[INFO ] stable-diffusion.cpp:3102 - decode_first_stage completed, taking 10.00s
[INFO ] stable-diffusion.cpp:3398 - generate_image completed in 72.30s
save result PNG image to 'output.png' (success)

Additional context / environment details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions