vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 #17455

jeffbolznv · 2025-11-23T19:31:21Z

Stacked on #17443.

This FA size came from #17012, and was getting a bunch of register spilling. Perf on RTX 4070 with coopmat1/2 disabled.:

before

FLASH_ATTN_EXT(hsk=72,hsv=72,nh=16,nr23=[1,1],kv=5776,nb=5776,mask=0,sinks=0,max_bias=0.000000,logit_softcap=0.000000,prec=f32,type_KV=f16,permute=[0,1,2,3]):                   6 runs - 167320.33 us/run - 153.73 GFLOP/run - 918.79 GFLOPS

after

FLASH_ATTN_EXT(hsk=72,hsv=72,nh=16,nr23=[1,1],kv=5776,nb=5776,mask=0,sinks=0,max_bias=0.000000,logit_softcap=0.000000,prec=f32,type_KV=f16,permute=[0,1,2,3]):                  34 runs - 30001.44 us/run - 153.73 GFLOP/run -   5.12 TFLOPS

jeffbolznv added 2 commits November 23, 2025 12:15

vulkan: more FA details in vk_perf_logger

813d43e

vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16

f6ed9e0

jeffbolznv requested review from 0cc4m and slaren as code owners November 23, 2025 19:31

github-actions bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Nov 23, 2025

jeffbolznv mentioned this pull request Nov 23, 2025

Eval bug: Qwen3-VL-8B freezes on image processing tasks #17012

Open

loci-dev mentioned this pull request Nov 23, 2025

UPSTREAM PR #17455: vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 auroralabs-loci/llama.cpp#297

Open

0cc4m approved these changes Nov 25, 2025

View reviewed changes

0cc4m merged commit d414db0 into ggml-org:master Nov 25, 2025
71 of 74 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 #17455

vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 #17455

Uh oh!

jeffbolznv commented Nov 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 #17455

vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 #17455

Uh oh!

Conversation

jeffbolznv commented Nov 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants