-
Notifications
You must be signed in to change notification settings - Fork 56
Support FP8 KV cache for prefill #486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7dc1e36
to
c44ff9e
Compare
@tdeng5 Hi, I see that you have canceled the CIs. Can you please kindly explain the reason? |
our CI was overloaded. I just started them again. |
@@ -0,0 +1,122 @@ | |||
/*************************************************************************************************** | |||
* Copyright (c) 2024 - 2025 Codeplay Software Ltd. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revise the copyright with Intel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified, thanks!
copy(gmem_tiled_copy_k, tKgK(_,_,_,k_tile), tKrK); | ||
cute::gemm(tiled_mma, accum, tCrQ, tCrK, frag_src); | ||
if constexpr (is_fp8_v<ElementQ> && is_fp8_v<ElementK>) { | ||
auto tCrQ_ = make_fragment_like<half_t>(tCrQ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please be descriptive with variable names it seems like all FP8 related tensors are sufficed with "_", is there any reason? Instead can you keep the descriptive suffix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. The suffix is changed to _fp16
to indicate that the dtype is fp16.
Any performance data for this PR? |
c44ff9e
to
5c6d0a9
Compare
Thanks for the review! |
Hi @rolandschulz, could you help re-trigger the canceled CI again? Thanks! |
@pengzhao-intel @Antonyvance Please kindly help review again. |
The BMG machine is offline at the moment. |
Hi @rolandschulz, has the BMG machine come back? |
no but we are merging PRs with just PVC passing. So this is only blocked by reviews not the BMG CI. |
@rolandschulz, I saw some CI issues on BMG before, so it is better to have the BMG CI passed for this PR. |
@pengzhao-intel @Antonyvance Could you please review again? |
Hi @rolandschulz, is the BMG machine back? |
The PR supports the FP8 KV cache during prefill phase. This ensures the good functionality.
For performance, we do not see any gain because the FP8 conversion-related instructions are not available now.