Contributing: Are these TODOs still on the roadmap? #2608

Bias92 · 2026-02-21T15:32:37Z

Bias92
Feb 21, 2026

Hi, I'm interested in contributing to FlashInfer. I found a few TODOs/FIXMEs in the codebase and wanted to confirm they're still relevant before starting work:

include/flashinfer/norm.cuh — // TODO(kaixih): add support for fp8 quantization if needed
include/flashinfer/attention/persistent.cuh — // FIXME: fix the offset calculation
include/flashinfer/sampling.cuh — // TODO: compare the speed of log2 and log

For context, I have experience with FP8 RMSNorm CUDA kernel development (merged PRs to vLLM) and would be happy to take on any of these if they're still needed.

Would appreciate any guidance on which of these would be most valuable to the project. Thanks!

Bias92 · 2026-02-21T16:38:46Z

Bias92
Feb 21, 2026
Author

I also found a potential correctness issue in RMSNormQuantKernel and FusedAddRMSNormQuantKernel (include/flashinfer/norm.cuh).

Both kernels hardcode the FP8 E4M3 max value for clamping:

output_vec[j] = fmaxf(-448.0f, fminf(output_vec[j], 448.0f));

However, the output type O is dispatched via DISPATCH_DLPACK_DTYPE_TO_CTYPE_FP8 in csrc/norm.cu, which includes both E4M3 (max=448) and E5M2 (max=57344). When the output dtype is E5M2, this clamp silently truncates ~99% of the representable range.

I have a fix ready — will submit a PR shortly.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlashInfer

Contributing: Are these TODOs still on the roadmap? #2608

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

FlashInfer

Contributing: Are these TODOs still on the roadmap? #2608

Uh oh!

Bias92 Feb 21, 2026

Replies: 1 comment

Uh oh!

Bias92 Feb 21, 2026 Author

Bias92
Feb 21, 2026

Bias92
Feb 21, 2026
Author