Skip to content

Conversation

danielvegamyhre
Copy link
Contributor

@danielvegamyhre danielvegamyhre commented Oct 7, 2025

Stacked PRs:


[mxfp8 moe training] add triton kernel for mxfp8 quantization along dim0

Summary

  • torch.compile codegen has had on and off perf regressions during the development process of mxfp8 moe training, it would be nice to have a simple triton kernel to have consistently good perf we can use

Test plan

  • pytest test/prototype/mx_formats/test_kernels.py

Benchmarks

existing torch.compile/to_mx() benchmark:

(torch) [danvm@devgpu031.atn1 ~/ao/benchmarks (main)]$ CUDA_VISIBLE_DEVICES=7 python mx_formats/cast_bench.py --mode dim0_mxfp8_floor
M 16384 K 16384 BLOCK_SIZE 32
GPU: NVIDIA B200
torch version: 2.10.0.dev20251008+cu128
triton version: 3.5.0
mode: dim0_mxfp8_floor
time_us 156.5759927034378
mem_bw_gbps 5196.805474139168

new triton dim0 mxfp8 kernel (~10% higher peak memory bandwidth utilization):

(torch) [danvm@devgpu031.atn1 ~/ao/benchmarks (danielvegamyhre/stack/76)]$ CUDA_VISIBLE_DEVICES=7 python mx_formats/cast_bench.py --mode dim0_mxfp8_triton_floor
M 16384 K 16384 BLOCK_SIZE 32
GPU: NVIDIA B200
torch version: 2.10.0.dev20251008+cu128
triton version: 3.5.0
mode: dim0_mxfp8_triton_floor
time_us 140.35199582576752
mem_bw_gbps 5797.530496182742

stack-info: PR: #3128, branch: danielvegamyhre/stack/75
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/75 branch from 009a6b8 to c62b0f0 Compare October 7, 2025 17:34
Copy link

pytorch-bot bot commented Oct 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3128

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit c62b0f0 with merge base cd21d0e (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 7, 2025
@danielvegamyhre danielvegamyhre added mx topic: not user facing Use this tag if you don't want this PR to show up in release notes moe labels Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. moe mx topic: not user facing Use this tag if you don't want this PR to show up in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant