extend the MX cast benchmark to include casting to mxfp4 #2693

vkuzo · 2025-08-05T18:40:48Z

Summary:

This is important for supporting mxfp4 training and inference, having a
benchmark is good to first see where we are.

Test Plan:

(pytorch) [vasiliy@devgpu007.eag6 ~/local/ao (main)]$ python benchmarks/mx_formats/cast_bench.py --mode dim0_mxfp4_floor
M 16384 K 16384 BLOCK_SIZE 32
GPU: NVIDIA H100
torch version: 2.9.0a0+git0142d5f
triton version: 3.3.0
mode: dim0_mxfp4_floor
/data/users/vasiliy/pytorch/torch/backends/cuda/__init__.py:131: UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old settings, e.g, torch.backends.cuda.matmul.allow_tf32 = True, torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() will be deprecated after Pytorch 2.9. Please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /data/users/vasiliy/pytorch/aten/src/ATen/Context.cpp:80.)
  return torch._C._get_cublas_allow_tf32()
time_us 848.9919900894165
mem_bw_gbps 800.3341090749714

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

vkuzo · 2025-08-05T18:40:49Z

Stack from ghstack (oldest at bottom):

-> extend the MX cast benchmark to include casting to mxfp4 #2693

pytorch-bot · 2025-08-05T18:40:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2693

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ghstack-mergeability-check and Check labels failing with 'Resource not accessible by integration'

❌ 1 New Failure

As of commit a2d0bc9 with merge base 18edd01 ():

NEW FAILURE - The following job has failed:

Run 1xH100 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio --index-url https://download.pytor... / linux-job (gh)
test/dtypes/test_affine_quantized_float.py::TestAffineQuantizedFloat8Compile::test_expected_kernels_on_gpu_granularity1_torch_compile_mode_reduce-overhead

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: This is important for supporting mxfp4 training and inference, having a benchmark is good to first see where we are. Test Plan: ```bash (pytorch) [vasiliy@devgpu007.eag6 ~/local/ao (main)]$ python benchmarks/mx_formats/cast_bench.py --mode dim0_mxfp4_floor M 16384 K 16384 BLOCK_SIZE 32 GPU: NVIDIA H100 torch version: 2.9.0a0+git0142d5f triton version: 3.3.0 mode: dim0_mxfp4_floor /data/users/vasiliy/pytorch/torch/backends/cuda/__init__.py:131: UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old settings, e.g, torch.backends.cuda.matmul.allow_tf32 = True, torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() will be deprecated after Pytorch 2.9. Please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /data/users/vasiliy/pytorch/aten/src/ATen/Context.cpp:80.) return torch._C._get_cublas_allow_tf32() time_us 848.9919900894165 mem_bw_gbps 800.3341090749714 ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 99ca4ba ghstack-comment-id: 3156206608 Pull Request resolved: #2693

Update [ghstack-poisoned]

Update

a2d0bc9

[ghstack-poisoned]

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 5, 2025

vkuzo added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Aug 5, 2025

vkuzo requested review from danielvegamyhre and eellison August 5, 2025 18:42

eellison approved these changes Aug 5, 2025

View reviewed changes

vkuzo merged commit 5d99ce4 into main Aug 6, 2025
53 of 56 checks passed

liangel-02 pushed a commit that referenced this pull request Aug 25, 2025

extend the MX cast benchmark to include casting to mxfp4 (#2693)

488490e

Update [ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

extend the MX cast benchmark to include casting to mxfp4 #2693

extend the MX cast benchmark to include casting to mxfp4 #2693

Uh oh!

vkuzo commented Aug 5, 2025

Uh oh!

vkuzo commented Aug 5, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

extend the MX cast benchmark to include casting to mxfp4 #2693

extend the MX cast benchmark to include casting to mxfp4 #2693

Uh oh!

Conversation

vkuzo commented Aug 5, 2025

Uh oh!

vkuzo commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2693

❗ 1 Active SEVs

❌ 1 New Failure

Uh oh!

Uh oh!

Uh oh!

vkuzo commented Aug 5, 2025 •

edited

Loading

pytorch-bot bot commented Aug 5, 2025 •

edited

Loading