Support FP8 KV cache for prefill #486

Valentine233 · 2025-09-03T05:44:54Z

The PR supports the FP8 KV cache during prefill phase. This ensures the good functionality.
For performance, we do not see any gain because the FP8 conversion-related instructions are not available now.

Valentine233 · 2025-09-09T08:48:46Z

@tdeng5 Hi, I see that you have canceled the CIs. Can you please kindly explain the reason?

rolandschulz · 2025-09-09T17:15:05Z

our CI was overloaded. I just started them again.

Antonyvance · 2025-09-10T22:30:40Z

examples/sycl/06_bmg_flash_attention/06_bmg_prefill_attention_prefill_cachedKV_fp8.cpp

@@ -0,0 +1,122 @@
+/***************************************************************************************************
+ * Copyright (c) 2024 - 2025 Codeplay Software Ltd. All rights reserved.


Revise the copyright with Intel.

Modified, thanks!

Antonyvance · 2025-09-10T23:03:56Z

applications/flash_attention_v2/collective/xe_flash_attn_prefill_mma_cachedKV.hpp

      copy(gmem_tiled_copy_k, tKgK(_,_,_,k_tile), tKrK);
-      cute::gemm(tiled_mma, accum, tCrQ, tCrK, frag_src);
+      if constexpr (is_fp8_v<ElementQ> && is_fp8_v<ElementK>) {
+        auto tCrQ_ = make_fragment_like<half_t>(tCrQ);


Could you please be descriptive with variable names it seems like all FP8 related tensors are sufficed with "_", is there any reason? Instead can you keep the descriptive suffix?

Thanks for the suggestion. The suffix is changed to _fp16 to indicate that the dtype is fp16.

pengzhao-intel · 2025-09-11T21:55:21Z

Any performance data for this PR?

Valentine233 · 2025-09-12T05:06:15Z

Any performance data for this PR?

Thanks for the review!
Currently, we see a performance regression with FP8 KV cache, compare to the BF16 one.
This is caused by the slow conversion between FP8 and FP16. Actually, the FP8 conversion-related instructions are only available with Xe3 architecture. We will continue to tune the performance in the future when these instructions are available. For now, only the functionality is ready.

Valentine233 · 2025-09-16T02:29:27Z

Hi @rolandschulz, could you help re-trigger the canceled CI again? Thanks!

Valentine233 · 2025-09-16T02:30:24Z

@pengzhao-intel @Antonyvance Please kindly help review again.

rolandschulz · 2025-09-16T02:59:00Z

The BMG machine is offline at the moment.

Valentine233 · 2025-09-17T01:42:23Z

Hi @rolandschulz, has the BMG machine come back?

rolandschulz · 2025-09-17T01:50:02Z

no but we are merging PRs with just PVC passing. So this is only blocked by reviews not the BMG CI.

Valentine233 · 2025-09-17T01:54:21Z

no but we are merging PRs with just PVC passing. So this is only blocked by reviews not the BMG CI.

@rolandschulz, I saw some CI issues on BMG before, so it is better to have the BMG CI passed for this PR.
Please kindly help re-trigger the CI when the machine is back, thanks!

Valentine233 · 2025-09-19T01:16:44Z

@pengzhao-intel @Antonyvance Could you please review again?

Valentine233 · 2025-09-19T01:17:21Z

Hi @rolandschulz, is the BMG machine back?

Valentine233 marked this pull request as draft September 3, 2025 06:01

Valentine233 force-pushed the fp8_kvcache branch from 7dc1e36 to c44ff9e Compare September 9, 2025 08:06

Valentine233 marked this pull request as ready for review September 9, 2025 08:46

Antonyvance requested changes Sep 10, 2025

View reviewed changes

Valentine233 force-pushed the fp8_kvcache branch from c44ff9e to 5c6d0a9 Compare September 12, 2025 03:13

sunjiweiswift and others added 4 commits September 12, 2025 17:01

support fp8 kvcache

6453daf

support fp8 kvcache

c9f0f7c

support fp8 kvcache

32f49cf

apply review comments

5c6d0a9

Valentine233 requested a review from Antonyvance September 16, 2025 01:24

		@@ -0,0 +1,122 @@
		/***************************************************************************************************
		* Copyright (c) 2024 - 2025 Codeplay Software Ltd. All rights reserved.

Support FP8 KV cache for prefill #486

Are you sure you want to change the base?

Support FP8 KV cache for prefill #486

Uh oh!

Conversation

Valentine233 commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Valentine233 commented Sep 9, 2025

Uh oh!

rolandschulz commented Sep 9, 2025

Uh oh!

Antonyvance Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Valentine233 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Antonyvance Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Valentine233 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

pengzhao-intel commented Sep 11, 2025

Uh oh!

Valentine233 commented Sep 12, 2025

Uh oh!

Valentine233 commented Sep 16, 2025

Uh oh!

Valentine233 commented Sep 16, 2025

Uh oh!

rolandschulz commented Sep 16, 2025

Uh oh!

Valentine233 commented Sep 17, 2025

Uh oh!

rolandschulz commented Sep 17, 2025

Uh oh!

Valentine233 commented Sep 17, 2025

Uh oh!

Valentine233 commented Sep 19, 2025

Uh oh!

Valentine233 commented Sep 19, 2025

Uh oh!

Uh oh!

Valentine233 commented Sep 3, 2025 •

edited

Loading