Skip to content

Conversation

rishi-yadav
Copy link

@rishi-yadav rishi-yadav commented Sep 22, 2025

Model Data Type Head Sizes KV Size Causal Varlen (ms) Status
whisper_v3_large FP16, BF16 64, 96, 128,192 512 2302–2625 (FP16)
2267–2612 (BF16)
Passed
llama3_8b FP16, BF16 64, 96, 128,192 512 3609–4155 (FP16)
3614–4135 (BF16)
Passed
llama3_405b FP16, BF16 64, 96, 128,192 512 6555–7506 (FP16)
6509–7501 (BF16)
Passed
qwen2_5_72b FP16, BF16 64, 96, 128,192 512 3286–3787 (FP16)
3275–3782 (BF16)
Passed
deepseek_r1 FP16, BF16 64, 96, 128,192 512 3296–3787 (FP16)
3289–3776 (BF16)
Passed

@rishi-yadav rishi-yadav marked this pull request as draft September 22, 2025 09:46
@rishi-yadav rishi-yadav force-pushed the rishi_fa_decode_models branch from 1a84f55 to 84f60cb Compare September 22, 2025 11:54
@rishi-yadav rishi-yadav marked this pull request as ready for review September 22, 2025 11:57
@rishi-yadav rishi-yadav force-pushed the rishi_fa_decode_models branch from a0a766b to 86ca910 Compare September 22, 2025 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant