Improve API for f4f4bf16 #5163

cthi · 2025-11-21T18:24:02Z

Summary:
We add some improvements for FP4 gemm.

Remove the need to pass use_mx, we can infer this based on global_scale
- As a follow up we should improve the assertions on the proper FP4 dtypes, similar to what we have with FP4 group gemm.
Add optional output to API, which is in-line with other torch APIs.
Small code cleans ups

Misc

Later we should likely re-evaluate the heuristic for the kernel, right now its almost identical (and duplicated) for MX and NV FP4, and we are likely instantiating more instances than needed.

Differential Revision: D87655845

meta-codesync · 2025-11-21T18:24:29Z

@cthi has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87655845.

Summary: Pull Request resolved: pytorch#5163 X-link: https://github.com/facebookresearch/FBGEMM/pull/2162 We add some improvements for FP4 gemm. - Remove the need to pass `use_mx`, we can infer this based on `global_scale` - As a follow up we should improve the assertions on the proper FP4 dtypes, similar to what we have with [FP4 group gemm](https://www.internalfb.com/code/fbsource/[addad803d330]/fbcode/deeplearning/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16_grouped.cu?lines=388-409). - Add optional `output` to API, which is in-line with other torch APIs. - Move function declaration to `torch_ops.h` which will remove the need for the forward declaration in Blas.cpp - Small code cleans ups Misc - Later we should likely clean-up & re-evaluate the heuristic for the kernel, right now its almost identical (and duplicated) for MX and NV FP4, and we are likely instantiating more instances than needed. Reviewed By: slayton58 Differential Revision: D87655845

meta-codesync · 2025-11-24T21:15:10Z

This pull request has been merged in 903002a.

meta-cla bot added the cla signed label Nov 21, 2025

meta-codesync bot added fb-exported meta-exported labels Nov 21, 2025

cthi force-pushed the export-D87655845 branch from a3644a2 to c7c8473 Compare November 21, 2025 18:33

cthi force-pushed the export-D87655845 branch from c7c8473 to 4ba9997 Compare November 24, 2025 15:50

meta-codesync bot closed this in 903002a Nov 24, 2025

facebook-github-bot added the Merged label Nov 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve API for f4f4bf16 #5163

Improve API for f4f4bf16 #5163

Uh oh!

cthi commented Nov 21, 2025

Uh oh!

meta-codesync bot commented Nov 21, 2025

Uh oh!

meta-codesync bot commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve API for f4f4bf16 #5163

Improve API for f4f4bf16 #5163

Uh oh!

Conversation

cthi commented Nov 21, 2025

Uh oh!

meta-codesync bot commented Nov 21, 2025

Uh oh!

meta-codesync bot commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants