Skip to content

Conversation

@cthi
Copy link
Contributor

@cthi cthi commented Nov 21, 2025

Summary:
We add some improvements for FP4 gemm.

  • Remove the need to pass use_mx, we can infer this based on global_scale
    • As a follow up we should improve the assertions on the proper FP4 dtypes, similar to what we have with FP4 group gemm.
  • Add optional output to API, which is in-line with other torch APIs.
  • Small code cleans ups

Misc

  • Later we should likely re-evaluate the heuristic for the kernel, right now its almost identical (and duplicated) for MX and NV FP4, and we are likely instantiating more instances than needed.

Differential Revision: D87655845

@meta-cla meta-cla bot added the cla signed label Nov 21, 2025
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 21, 2025

@cthi has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87655845.

Summary:
Pull Request resolved: pytorch#5163

X-link: https://github.com/facebookresearch/FBGEMM/pull/2162

We add some improvements for FP4 gemm.
- Remove the need to pass `use_mx`, we can infer this based on `global_scale`
  - As a follow up we should improve the assertions on the proper FP4 dtypes, similar to what we have with [FP4 group gemm](https://www.internalfb.com/code/fbsource/[addad803d330]/fbcode/deeplearning/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16_grouped.cu?lines=388-409).
- Add optional `output` to API, which is in-line with other torch APIs.
- Move function declaration to `torch_ops.h` which will remove the need for the forward declaration in Blas.cpp
- Small code cleans ups

Misc
- Later we should likely clean-up & re-evaluate the heuristic for the kernel, right now its almost identical (and duplicated) for MX and NV FP4, and we are likely instantiating more instances than needed.

Reviewed By: slayton58

Differential Revision: D87655845
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 24, 2025

This pull request has been merged in 903002a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants