Update mx_formats README.md #2777

vkuzo · 2025-08-15T18:54:41Z

add e2e torchtitan benchmarks on LLaMa 3 8B for mxfp8 training

pytorch-bot · 2025-08-15T18:54:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2777

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 6 Pending

As of commit 3f6623f with merge base 49cb18a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

drisspg · 2025-08-15T18:56:33Z

torchao/prototype/mx_formats/README.md


 ℹ️ <em>See the [feature tracker](https://github.com/pytorch/ao/issues/556) and the [performance tracker](https://github.com/pytorch/ao/issues/1768) for upcoming features.</em>

+## Training e2e benchmarks on NVIDIA B200
+
+- Single-node training on 8xB100 GPUs, batch size 1, sequence length 8192, steps 100, `torch.compile`, FSDP2, per-op SAC


Can you specify that this is a power throttled variant

should also say B200 instead of B100, will fix

danielvegamyhre · 2025-08-15T18:59:03Z

torchao/prototype/mx_formats/README.md

+| Llama3-8b     |  none (bfloat16)                   | 33.71             |  8307.5              | -
+| Llama3-8b     |  float8 tensorwise (f8 all-gather) | 33.38             |  10417.0             | 25.4%
+| Llama3-8b     |  mxfp8_cublas                      | 33.88             |  9969.0              | 20.0%
+| Llama3-8b     |  mxfp8_cublas_rceil                | 33.88             |  9642.0              | 16.1%


this is odd, rceil uses the hardware acclerated fp32 -> e8m0 casting instruction, it should be faster than floor, and when i did benchmarking it was faster than floor. any idea what could be going on here?

I think it only uses the accelerated instruction in the dim1 kernel. Once we make it also use that instruction in dim0, it should beat floor.

danielvegamyhre · 2025-08-15T18:59:39Z

torchao/prototype/mx_formats/README.md

+| Model         | Scaling                            | Peak Memory (GB)  | Median tokens/second | Speedup over baseline
+| ------------- | ---------------------------------- | ------------------| -------------------- | ---------------------
+| Llama3-8b     |  none (bfloat16)                   | 33.71             |  8307.5              | -
+| Llama3-8b     |  float8 tensorwise (f8 all-gather) | 33.38             |  10417.0             | 25.4%


ah, when i benchmarked fp8 tensorwise vs mxfp8 and found they were roughly the same throughput, i didn't use fp8 all-gather, i wonder if that explains this difference

danielvegamyhre · 2025-08-15T19:00:08Z

torchao/prototype/mx_formats/README.md


 ℹ️ <em>See the [feature tracker](https://github.com/pytorch/ao/issues/556) and the [performance tracker](https://github.com/pytorch/ao/issues/1768) for upcoming features.</em>

+## Training e2e benchmarks on NVIDIA B200
+
+- Single-node training on 8xB100 GPUs, batch size 1, sequence length 8192, steps 100, `torch.compile`, FSDP2, per-op SAC


Are we not using torch.compile here? When I ran benchmarks during mxfp8 dim1 cast cuda kernel development i was getting ~13.5k TPS on FSDP2 only 8xb200.. can double check the exact config

Update mx_formats README.md

9a9109b

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 15, 2025

vkuzo added the topic: documentation Use this tag if this PR adds or improves documentation label Aug 15, 2025

drisspg reviewed Aug 15, 2025

View reviewed changes

drisspg approved these changes Aug 15, 2025

View reviewed changes

Update README.md

3f6623f

vkuzo merged commit d8bb51f into main Aug 15, 2025
7 of 9 checks passed

danielvegamyhre reviewed Aug 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update mx_formats README.md #2777

Update mx_formats README.md #2777

Uh oh!

vkuzo commented Aug 15, 2025

Uh oh!

pytorch-bot bot commented Aug 15, 2025 •

edited

Loading

Uh oh!

drisspg Aug 15, 2025

Uh oh!

vkuzo Aug 15, 2025

Uh oh!

Uh oh!

danielvegamyhre Aug 15, 2025

Uh oh!

vkuzo Aug 15, 2025

Uh oh!

danielvegamyhre Aug 15, 2025

Uh oh!

danielvegamyhre Aug 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Update mx_formats README.md #2777

Update mx_formats README.md #2777

Uh oh!

Conversation

vkuzo commented Aug 15, 2025

Uh oh!

pytorch-bot bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2777

⏳ No Failures, 6 Pending

Uh oh!

drisspg Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danielvegamyhre Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

danielvegamyhre Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

danielvegamyhre Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 15, 2025 •

edited

Loading

danielvegamyhre Aug 15, 2025 •

edited

Loading