Roofline quantized conv3d/2d layer by jainapurva · Pull Request #3419 · pytorch/ao

jainapurva · 2025-12-03T03:11:27Z

This pull request extends the float8 inference roofline benchmarking code to support convolution operations (conv2d and conv3d) in addition to linear layers. It introduces new utilities and refactors the workflow to enable roofline modeling and kernel benchmarking for convolutional operations, including calculation of equivalent GEMM dimensions and measurement of kernel times. A conv kernel (FBGEMM) is a combination of im2col + implicit GEMM

pytorch-bot · 2025-12-03T03:11:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3419

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 5f3abb1 with merge base 095a7e6 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh) (trunk failure)
test/test_low_bit_optim.py::TestFSDP2::test_fsdp2

This comment was automatically generated by Dr. CI and updates every 15 minutes.

benchmarks/float8/float8_inference_roofline.py

jbschlosser · 2025-12-22T20:52:57Z

benchmarks/float8/utils.py

+    # Filter out aten::fill_ and other non-conv operations
+    filtered_data = {k: v for k, v in data.items() if k in expected_conv_kernels}
+
+    assert len(filtered_data) >= 1, f"unexpected data: {data}"


maybe the error message should indicate something about potential incompleteness of the above expected conv kernel list?

jbschlosser

looks pretty good from what I can tell. Thanks for the hard work here!

benchmarks/float8/utils.py

jerryzh168 · 2025-12-23T00:47:25Z

benchmarks/float8/utils.py

+        "aten::slow_conv_dilated3d",
+        "torchao::_conv2d_fp8_inner",
+        "torchao::_conv3d_fp8_inner",
+        "fbgemm::f8f8bf16_conv",


we just updated this to mslk btw

jerryzh168

rest looks good

jerryzh168 · 2025-12-23T01:00:25Z

benchmarks/float8/float8_inference_roofline.py

+    )
+
+    # For fp8 conv timing, we need to use fbgemm operator
+    if recipe_name in ("mxfp4_cutlass", "nvfp4"):


these are not supported, so we don't need to check for these I think, we can add a check to say what are the supported recipe_names

jerryzh168 · 2025-12-23T01:01:08Z

benchmarks/float8/float8_inference_roofline.py

+        # Try to use fbgemm fp8 conv operator
+        try:
+            # Check if fbgemm fp8 conv is available
+            if not hasattr(torch.ops.fbgemm, "f8f8bf16_conv"):


should use

ao/torchao/quantization/quantize_/workflows/float8/float8_tensor.py

Line 44 in e4bfca1

_is_mslk_available,

to check I think

benchmarks/float8/float8_inference_roofline.py

jerryzh168 · 2025-12-23T01:03:07Z

benchmarks/float8/float8_inference_roofline.py

+                print(
+                    "Warning: fbgemm.f8f8bf16_conv not available, skipping fp8 conv timing"
+                )
+                f8_time_s = 0.0


nit: I think we can just error out, and talk about how user can turn off the conv benchmarking by setting do_benchmarks to False?

this will also reduce the indentation here

vkuzo · 2025-12-23T19:59:01Z

benchmarks/float8/float8_inference_roofline.py

+                    )
+                    f8_time_s = 0.0
+
+        except Exception as e:


just raise the exception

- Replace hasattr check with _is_mslk_available() utility - Add recipe validation for conv operations (only tensorwise supported) - Add early error checks with helpful messages for conv2d and mslk availability - Remove redundant exception handling in get_conv_times() - Improve defense in depth with validation at multiple levels - Re-raise validation errors (NotImplementedError, RuntimeError, ValueError) to fail fast - Remove unused kernel names from utils.py (torchao::_conv2d_fp8_inner, torchao::_conv3d_fp8_inner)

This commit fixes multiple issues in the conv3d fp8 benchmarking code to align with the updated mslk operator (from commit 095a7e6): 1. Remove outdated permute operations - The mslk operator was updated to support standard PyTorch tensor shapes with channels_last_3d memory format - Permute operations are no longer needed and were causing errors - Tensors now stay in shape (N, C_in, D, H, W) with channels_last_3d memory format, matching Float8Tensor implementation 2. Add mslk.conv import - Import mslk.conv module to properly register the fp8 conv operator - This ensures the operator is available for benchmarking 3. Add kernel_size=1 validation - kernel_size=1 creates ambiguous memory layouts (both contiguous and channels_last_3d simultaneously) - The mslk operator cannot correctly identify channel dimensions in this edge case - Added validation to reject kernel_size=1 with clear error message - Consistent with test suite constraints 4. Remove redundant try-except wrapper - Simplified error handling by removing unnecessary exception catching around get_conv_times() call - Validation errors now propagate directly with clear messages Test Plan: - Verified kernel_size=3 configurations run successfully - Verified kernel_size=1 configurations fail with clear error message - Aligned with Float8Tensor test suite behavior

Replace the epsilon-based division (1e-20) with explicit conditional logic for calculating b_fp8_e2e_speedup: - Returns -1 when benchmarks weren't run (clearer sentinel value) - Only calculates speedup when both bf16 and fp8 times are valid (> 0) - More explicit and consistent with other ratio calculations like rb_bf16_gemm_ratio and rb_fp8_gemm_ratio This makes the code clearer and avoids calculating meaningless huge numbers when b_fp8_e2e_time_s is 0.

Add conv roofline

3bc5d37

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 3, 2025

Add conv roofline

79cdaec

jainapurva force-pushed the conv_roofline branch from 8fba23d to 79cdaec Compare December 3, 2025 06:26

jainapurva added 4 commits December 3, 2025 18:08

updates

f4c7a6e

updates

828cb02

updates

e815fd4

minor fixes

30dc793

jerryzh168 reviewed Dec 4, 2025

View reviewed changes

benchmarks/float8/float8_inference_roofline.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Dec 4, 2025

View reviewed changes

benchmarks/float8/float8_inference_roofline.py Show resolved Hide resolved

jerryzh168 reviewed Dec 4, 2025

View reviewed changes

benchmarks/float8/float8_inference_roofline.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Dec 4, 2025

View reviewed changes

benchmarks/float8/float8_inference_roofline.py Outdated Show resolved Hide resolved

jainapurva requested a review from jbschlosser December 9, 2025 17:23

jainapurva added 4 commits December 9, 2025 19:24

updates

160ee2d

updates

b82c186

updates

e298a4d

updates

44ce800

jainapurva marked this pull request as ready for review December 10, 2025 05:57

updates

97fe6c0

jainapurva requested review from jerryzh168 and vkuzo December 22, 2025 17:19

ruff fixes

68bea0e

jbschlosser reviewed Dec 22, 2025

View reviewed changes

jbschlosser approved these changes Dec 22, 2025

View reviewed changes

jerryzh168 reviewed Dec 23, 2025

View reviewed changes

benchmarks/float8/utils.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Dec 23, 2025

View reviewed changes

jerryzh168 approved these changes Dec 23, 2025

View reviewed changes

jerryzh168 reviewed Dec 23, 2025

View reviewed changes

benchmarks/float8/float8_inference_roofline.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Dec 23, 2025

View reviewed changes

vkuzo reviewed Dec 23, 2025

View reviewed changes

jainapurva added 3 commits December 23, 2025 13:27

Merge remote-tracking branch 'origin/main' into conv_roofline

7c826c5

jainapurva added the topic: performance Use this tag if this PR improves the performance of a feature label Dec 24, 2025

jainapurva added 2 commits December 24, 2025 00:34

Update float8_inference_roofline.py

5f3abb1

jainapurva merged commit 0fd0872 into main Dec 24, 2025
20 of 21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roofline quantized conv3d/2d layer #3419

Roofline quantized conv3d/2d layer #3419
jainapurva merged 17 commits intomainfrom
conv_roofline

jainapurva commented Dec 3, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbschlosser Dec 22, 2025

Uh oh!

jbschlosser left a comment

Uh oh!

Uh oh!

jerryzh168 Dec 23, 2025

Uh oh!

jerryzh168 left a comment

Uh oh!

jerryzh168 Dec 23, 2025

Uh oh!

jerryzh168 Dec 23, 2025

Uh oh!

Uh oh!

jerryzh168 Dec 23, 2025 •

edited

Loading

Uh oh!

vkuzo Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jainapurva commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3419

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbschlosser Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

jbschlosser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vkuzo Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jainapurva commented Dec 3, 2025 •

edited

Loading

pytorch-bot bot commented Dec 3, 2025 •

edited

Loading

jerryzh168 Dec 23, 2025 •

edited

Loading