Skip to content

Conversation

@jan-service-account
Copy link

Updates dev branch with latest release (b5145) from ggml-org/llama.cpp

noemotiovon and others added 3 commits April 16, 2025 16:21
…gml-org#12931)

The grouped query attention optmization doesn't require a power of two ratio,
the only thing relying on it was the modulo operation written as bitwise &.

split_k need not depend on gqa_ratio - enable it any time there's only one
workgroup in the X dimension. The shader gets the split index from the x coord,
and multiple workgroups in the X dimension (pre-split) indicates a larger
FA operation that wouldn't need splitting.
@jan-service-account jan-service-account merged commit e9c6088 into dev Apr 17, 2025
9 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2025-04-17-00-08 branch April 17, 2025 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants