Feature Request: Add kv-quant fa kernel variants for head sizes other than 128

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Currently llama cpp has many variants of fa kernel for hs 128, but only a few for hs 64 and 256, which causes fallback to the CPU in case of using -ctk != f16 with models with hs != 128

### Motivation

Llama 3.2 1b and gemma 3-12b use hs 64 and 256 respectively, and these seem to be quite popular models for some applications

### Possible Implementation

More kernel templates need to be added, also flag like  GGML_CUDA_FA_ALL_KVQ_HS can be added for enabling these templates, because I understand that adding these templates will increase compilation times dramatically

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Add kv-quant fa kernel variants for head sizes other than 128 #12989

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Add kv-quant fa kernel variants for head sizes other than 128 #12989

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions