CUDA: enable FA for FP32 KV cache #16546

JohannesGaessler · 2025-10-12T21:22:59Z

Adds CUDA FlashAttention support for FP32 KV cache. The FP32 data is converted to FP16 for the kernel which is inefficient but simple.

CUDA: enable FA for FP32 KV cache

31f2d45

JohannesGaessler mentioned this pull request Oct 12, 2025

metal : FA support F32 K and V #16531

Merged

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Oct 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: enable FA for FP32 KV cache #16546

CUDA: enable FA for FP32 KV cache #16546

JohannesGaessler commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CUDA: enable FA for FP32 KV cache #16546

Are you sure you want to change the base?

CUDA: enable FA for FP32 KV cache #16546

Conversation

JohannesGaessler commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant