mxtensor: add pre-swizzle support #3200

vkuzo · 2025-10-17T15:06:31Z

Summary:

Adds the ability to pre-swizzle scales for MXTensor,
and turns it on for the inference workflow.

For activations, this is no-change for now but if we write a fused
kernel we'll hook into the pre-swizzled path.

For weights, this is a performance win already as now we swizzle ahead of
time.

Rough magnitude of the weight pre-swizzling win:
on M, K, N == 4096, 4096, 4096, the inference fwd speedup on mxfp8
increases from 1.24x to 1.30x

Name of _is_swizzled_scales is not final, but IMO we should finalize it in a future PR together with NVFP4Tensor. For now I'm staying consistent with NVFP4Tensor.

Test Plan:

// correctness
CUDA_VISIBLE_DEVICES=5 pytest test/prototype/mx_formats/ -s

// performance
CUDA_VISIBLE_DEVICES=5 python benchmarks/float8/float8_inference_roofline.py ~/local/tmp/20251017_test.csv --recipe_name mxfp8_cublas --shape_gen_name pow2_extended
// before: https://www.internalfb.com/phabricator/paste/view/P1996942931
// after: https://www.internalfb.com/phabricator/paste/view/P1996941798

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

vkuzo · 2025-10-17T15:06:32Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2025-10-17T15:06:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3200

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Adds the ability to pre-swizzle scales for `MXTensor`, and turns it on for the inference workflow. For activations, this is no-change for now but if we write a fused kernel we'll hook into the pre-swizzled path. For weights, this is a performance win in this PR as now we swizzle ahead of time. Rough magnitude of the weight pre-swizzling win: on M, K, N == 4096, 4096, 4096, the inference fwd speedup on mxfp8 increases from 1.24x to 1.30x Test Plan: ```bash // correctness CUDA_VISIBLE_DEVICES=5 pytest test/prototype/mx_formats/ -s // performance CUDA_VISIBLE_DEVICES=5 python benchmarks/float8/float8_inference_roofline.py ~/local/tmp/20251017_test.csv --recipe_name mxfp8_cublas --shape_gen_name pow2_extended // before: https://www.internalfb.com/phabricator/paste/view/P1996942931 // after: https://www.internalfb.com/phabricator/paste/view/P1996941798 ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 46b8d23 ghstack-comment-id: 3415966576 Pull-Request: #3200

vkuzo · 2025-10-17T15:08:20Z

torchao/prototype/mx_formats/mx_tensor.py

            torch.float8_e5m2,
            torch.uint8,
        ), "unsupported"
-        if elem_dtype in (


this code doesn't really have a strong purpose, removing instead of making it handle swizzling

vkuzo · 2025-10-17T15:09:08Z

torchao/prototype/mx_formats/utils.py

    return mx_tensor
+
+
+def _swizzle_aware_slice(


extracted this out, the only things that change are the various shape calculations (fp8 vs fp4 data, 32 vs 16 block size)

drisspg · 2025-10-17T16:19:44Z

torchao/prototype/mx_formats/utils.py

+    Output: sliced qdata and scale, does the right thing for unswizzled and swizzled scales
+    """
+
+    M, K = x.shape[0], x.shape[1]


Nit I should probably have used a (m/n) generic term like rows, columns

drisspg

Looks good, nice refactor

[ghstack-poisoned]

vkuzo added 11 commits October 16, 2025 07:41

Update

821bd2b

[ghstack-poisoned]

Update

5bd4e3b

[ghstack-poisoned]

Update

ea2d54f

[ghstack-poisoned]

Update

b88850f

[ghstack-poisoned]

Update

0774974

[ghstack-poisoned]

Update

87acf97

[ghstack-poisoned]

Update

e481a6b

[ghstack-poisoned]

Update

21ba1a6

[ghstack-poisoned]

Update

438b35e

[ghstack-poisoned]

Update

ea0df71

[ghstack-poisoned]

Update

00c007c

[ghstack-poisoned]

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 17, 2025

This was referenced Oct 17, 2025

add option to save profiling traces in inference roofline script #3196

Merged

mxtensor: make scale shape match qdata #3198

Merged

vkuzo commented Oct 17, 2025

View reviewed changes

vkuzo added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Oct 17, 2025

drisspg reviewed Oct 17, 2025

View reviewed changes

drisspg approved these changes Oct 17, 2025

View reviewed changes

vkuzo added 2 commits October 17, 2025 10:04

Update

8d0ef58

[ghstack-poisoned]

Update

3c20872

[ghstack-poisoned]

vkuzo mentioned this pull request Oct 17, 2025

[wip] speed up nvfp4 triton kernel #3202

Open

Update

11006ab

[ghstack-poisoned]

vkuzo changed the base branch from gh/vkuzo/150/head to main October 17, 2025 17:05

vkuzo merged commit 2d8a4c1 into main Oct 17, 2025
24 of 35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mxtensor: add pre-swizzle support #3200

mxtensor: add pre-swizzle support #3200

Uh oh!

vkuzo commented Oct 17, 2025 •

edited

Loading

Uh oh!

vkuzo commented Oct 17, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 17, 2025 •

edited

Loading

Uh oh!

vkuzo Oct 17, 2025

Uh oh!

vkuzo Oct 17, 2025

Uh oh!

drisspg Oct 17, 2025

Uh oh!

drisspg left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mxtensor: add pre-swizzle support #3200

mxtensor: add pre-swizzle support #3200

Uh oh!

Conversation

vkuzo commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkuzo commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3200

Uh oh!

vkuzo Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

vkuzo Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

drisspg Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

drisspg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vkuzo commented Oct 17, 2025 •

edited

Loading

vkuzo commented Oct 17, 2025 •

edited

Loading

pytorch-bot bot commented Oct 17, 2025 •

edited

Loading