fix: remove kv padding from flash attention wrapper by leejet · Pull Request #1453 · leejet/stable-diffusion.cpp

leejet · 2026-04-22T18:00:34Z

Most backends already handle non-256 KV lengths internally or fall back via backend support checks. Avoid generating synthetic padding masks, which can trigger incorrect Vulkan flash attention output for short prompt lengths.

Fix #1431.

daniandtheweb · 2026-04-23T00:08:39Z

I've just tested the changes and this still doesn't fix the issue. The issue is not only happening on short prompts but on any prompt lenght using flash attention on vulkan for Ernie and Anima models.

leejet · 2026-04-23T14:08:43Z

I’ve tried to fix it, and it’s working properly on my device now. @daniandtheweb Could you pull the latest commit and give it another try? Also, don’t forget to sync the ggml submodule.

git submodule sync --recursive
git submodule update --init --recursive --force

leejet · 2026-04-23T14:16:22Z

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\ernie-image-UD-Q4_K_M.gguf --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\ministral-3-3b.safetensors -p "a lovely cat" --cfg-scale 5.0 -v --offload-to-cpu --diffusion-fa

before ggml update

after ggml update

daniandtheweb · 2026-04-23T14:26:49Z

I've done a clean build using this branch and the issue is still there. My current prompt is taken from civitai:

./sd-cli -M img_gen -p "year 2023, year 2024, year 2025, highres,masterpiece, best quality, score_7, score_8, score_9, @miclot, safe, a group of five anime girls and a small dog posing for a selfie in a snowy landscape, the girl in the foreground has long pink hair and purple eyes, wearing a teal beanie with white stripes and a white puffer jacket, smiling widely and making a peace sign with her left hand, the girl to the left has short brown hair and glasses, wearing a maroon beanie and a light blue jacket, raising her right hand in a peace sign, the girl in the center has black hair and brown eyes, wearing a black beanie and a dark jacket, looking at the camera, the girl to the right has short purple hair and purple eyes, wearing a striped scarf and a green jacket, looking at the camera, the girl in the back has blonde hair and green eyes, wearing a white beanie and a yellow jacket, making a peace sign with her left hand, the dog has brown and white fur, sticking its tongue out, the background features a clear blue sky and snow-covered mountains, the scene is bright and sunny with natural lighting, the colors are vibrant with a mix of cool and warm tones, the composition is a close-up shot with the characters filling most of the frame, the focus is on the group's happy expressions and the snowy environment. <lora:anima-turbo-lora-v0.1:1>" -n "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia,watermark, mosaic censoring, bar censor," --sampling-method euler --steps 8 -W 1024 -H 1024 -b 1 --cfg-scale 1 -s 1327417454 --clip-skip -1 --embd-dir /home/daniandtheweb/Workspace/sd.cpp-webui/models/embeddings/ --lora-model-dir /home/daniandtheweb/Workspace/sd.cpp-webui/models/loras/ -t 0 --rng cuda --sampler-rng cuda --lora-apply-mode auto -o /home/daniandtheweb/Workspace/sd.cpp-webui/outputs/txt2img/1765151592.png --diffusion-model /home/daniandtheweb/Workspace/sd.cpp-webui/models/unet/anima-preview3-base.safetensors --vae /home/daniandtheweb/Workspace/sd.cpp-webui/models/vae/qwen_image_vae.safetensors --llm /home/daniandtheweb/Workspace/sd.cpp-webui/models/text_encoders/qwen_3_06b_base.safetensors --scheduler simple --vae-tile-overlap 0.5 --vae-tile-size 32x32 --preview proj --preview-path /home/daniandtheweb/Workspace/sd.cpp-webui/outputs/txt2img/1765151592_preview.png --preview-interval 1 --vae-tiling --fa --vae-conv-direct --mmap --color

Here's the progression of the preview, in case it can help solving the issue:

1 step	2 steps	3 steps	4 steps	5 steps	6 steps	7 steps	8 steps

Without flash attention the resulting image comes out just fine:

The same issue still remains on Ernie on my end.

This has been tested on Linux on a radeon rx 7800xt with both the official mesa drivers and the git ones. I also tried disabling cooperative matrix and int dot acceleration for vulkan but the result is the same, with flash attention the generation breaks down.

leejet · 2026-04-23T14:54:17Z

Does the simplest txt2img pipeline—like the one below—also cause issues on your side?

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\anima-preview.safetensors --vae ..\..\ComfyUI\models\vae\qwen_image_vae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_06b_base.safetensors  -p "a lovely cat holding a sign says 'anima.cpp'" --cfg-scale 6.0 --sampling-method euler -v --offload-to-cpu

daniandtheweb · 2026-04-23T15:50:04Z

Does the simplest txt2img pipeline—like the one below—also cause issues on your side?

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\anima-preview.safetensors --vae ..\..\ComfyUI\models\vae\qwen_image_vae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_06b_base.safetensors  -p "a lovely cat holding a sign says 'anima.cpp'" --cfg-scale 6.0 --sampling-method euler -v --offload-to-cpu

This specific command works as expected.

Here's the simplest reproduction of the issue that I've been able to achieve for now:

This works:

./sd-cli -p "a cat" --sampling-method euler --steps 8 --cfg-scale 1 --diffusion-model /home/daniandtheweb/Workspace/sd.cpp-webui/models/unet/ernie-image-turbo-Q8_0.gguf --vae /home/daniandtheweb/Workspace/sd.cpp-webui/models/vae/flux2-vae.safetensors --llm /home/daniandtheweb/Workspace/sd.cpp-webui/models/text_encoders/Ministral-3-3B-Instruct-2512-UD-Q8_K_XL.gguf --fa --vae-conv-direct

This doesn't work:

./sd-cli -p "a cat" --sampling-method euler --steps 8 -W 1024 -H 1024 --cfg-scale 1 --diffusion-model /home/daniandtheweb/Workspace/sd.cpp-webui/models/unet/ernie-image-turbo-Q8_0.gguf --vae /home/daniandtheweb/Workspace/sd.cpp-webui/models/vae/flux2-vae.safetensors --llm /home/daniandtheweb/Workspace/sd.cpp-webui/models/text_encoders/Ministral-3-3B-Instruct-2512-UD-Q8_K_XL.gguf --fa --vae-conv-direct

But removing flash attention makes it work also on 1024x1024:

./sd-cli -p "a cat" --sampling-method euler --steps 8 -W 1024 -H 1024 --cfg-scale 1 --diffusion-model /home/daniandtheweb/Workspace/sd.cpp-webui/models/unet/ernie-image-turbo-Q8_0.gguf --vae /home/daniandtheweb/Workspace/sd.cpp-webui/models/vae/flux2-vae.safetensors --llm /home/daniandtheweb/Workspace/sd.cpp-webui/models/text_encoders/Ministral-3-3B-Instruct-2512-UD-Q8_K_XL.gguf --vae-conv-direct

fix: remove kv padding from flash attention wrapper

a5dde30

update ggml

a7c56d3

fix: install SPIR-V headers for Vulkan builds

4ef76b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove kv padding from flash attention wrapper#1453

fix: remove kv padding from flash attention wrapper#1453
leejet wants to merge 3 commits intomasterfrom
remove-kv-pad-for-flash-attn

leejet commented Apr 22, 2026

Uh oh!

daniandtheweb commented Apr 23, 2026 •

edited

Loading

Uh oh!

leejet commented Apr 23, 2026

Uh oh!

leejet commented Apr 23, 2026

Uh oh!

daniandtheweb commented Apr 23, 2026 •

edited

Loading

Uh oh!

leejet commented Apr 23, 2026

Uh oh!

daniandtheweb commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leejet commented Apr 22, 2026

Uh oh!

daniandtheweb commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leejet commented Apr 23, 2026

Uh oh!

leejet commented Apr 23, 2026

before ggml update

after ggml update

Uh oh!

daniandtheweb commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leejet commented Apr 23, 2026

Uh oh!

daniandtheweb commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daniandtheweb commented Apr 23, 2026 •

edited

Loading

daniandtheweb commented Apr 23, 2026 •

edited

Loading

daniandtheweb commented Apr 23, 2026 •

edited

Loading