Tweaking tile geometry for MoE on AMD KHR_coopmat (Vulkan) #22598
accaldwell
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I've been testing a change to the
MUL_MAT_IDVulkan kernel, adding a non-square tile for AMD hardware with coopmat support. Currently the M tile (64x64) is used for the workloads I was exploring and the original L tile (128x128) is disabled. I confirmed 128x128 performs worse than 64x64 on my hardware, so this seems sensible. However, for some MoE workloads, 128x32 is a decent amount faster than 64x64 - heavy users ofMUL_MAT_IDsee 7-10% increases in prompt processing speed on Strix Halo:The math here is unchanged, so perplexity and token generation are identical before and after this change. Token generation speed is unchanged.
Any thoughts on if this is worth continuing to work on? The code was written with LLM assistance, which I understand is frowned upon here, but its also a fairly small change (+40 lines, about half of which are comments). I reviewed the code, did all the benchmarking and other testing, and wrote this post.
My change is currently gated to AMD hardware with KHR_coopmat, if anyone is interested in testing, the code is here: https://github.com/accaldwell/llama.cpp/tree/ac/vulkan-mmid-l-tile-amd
Beta Was this translation helpful? Give feedback.
All reactions