We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi! I'm playing with batch_decode_with_padded_kv_cache and wanted to test out the FP8 KVCache. I couldn't find some good instructions on the docs,
I've tried the following:
num_qo_heads = 32 num_kv_heads = 32 batch_size = 16 head_dim = 128 padded_kv_len = 1024 q = torch.empty( batch_size, num_qo_heads, head_dim, device=torch.device("cuda"), dtype=torch.float8_e4m3fn, ) k_padded = torch.randn(batch_size, padded_kv_len, num_kv_heads, head_dim).to("cuda:0").to(torch.float8_e4m3fn) v_padded = torch.randn(batch_size, padded_kv_len, num_kv_heads, head_dim).to("cuda:0").to(torch.float8_e4m3fn) o = flashinfer.batch_decode_with_padded_kv_cache( q, k_padded, v_padded, "NHD", "NONE" )
But it gives me a BatchDecodeWithPaddedKVCache kernel launch failed: supported data type.
How can I enable FP8 KV cache? Thanks in advance!
The text was updated successfully, but these errors were encountered:
refer to #150
Sorry, something went wrong.
feat: pytorch api of fp8 kv-cache (#156)
66ee066
requested in #150 #155 #125
@YLGH done in #156 .
No branches or pull requests
Hi! I'm playing with batch_decode_with_padded_kv_cache and wanted to test out the FP8 KVCache. I couldn't find some good instructions on the docs,
I've tried the following:
But it gives me a BatchDecodeWithPaddedKVCache kernel launch failed: supported data type.
How can I enable FP8 KV cache? Thanks in advance!
The text was updated successfully, but these errors were encountered: