[ROCm] remove extra transposes in NHWC convolutions on MIOpen #160435

dnikolaev-amd · 2025-08-12T17:52:45Z

remove aten::contiguous for NHWC convolutions on ROCm

Tests:

nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32
nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16

Before:

After:

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

remove aten::contiguous for NHWC convolutions on ROCm Tests: - nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 Before: <img width="1255" height="228" alt="image" src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57" /> After: <img width="874" height="153" alt="image" src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818" />

pytorch-bot · 2025-08-12T17:52:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160435

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 1 Unrelated Failure

As of commit 1812855 with merge base ee9f8ba ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / win-vs2022-cuda12.6-py3 / build (gh) (trunk failure)
ninja: build stopped: subcommand failed

This comment was automatically generated by Dr. CI and updates every 15 minutes.

dnikolaev-amd · 2025-08-12T17:53:06Z

Simplified convolution test for collecting profile based on nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32:

# file name test_extra_transposes.py
import os
import torch
import torch.nn as nn

#enable NHWC Conv for MIOpen
os.environ["PYTORCH_MIOPEN_SUGGEST_NHWC"] = "1"

def helper(n, c, h, w, out_channels, dtype, kernel_size, groups):
        input = torch.randint(-3, 3, (n, c, h, w), dtype=dtype, device="cuda").to(
            memory_format=torch.channels_last).requires_grad_()
        conv = nn.Conv2d(c, out_channels, kernel_size, groups=groups).to(
            device="cuda", dtype=dtype, memory_format=torch.channels_last
        )
        for p in conv.parameters():
            p.data = torch.randint_like(p, -3, 3)
        out = conv(input)
        grad = torch.randint_like(out, -3, 3)
        out.backward(grad)

# start torch.profiler to capture kernels
prof = torch.profiler.profile()
prof.start()

helper(2, 8, 4, 4, out_channels=8, dtype=torch.float32, kernel_size=3, groups=8)

prof.stop()
#save profiling results
prof.export_chrome_trace(f"conv_profile_decode.json")
#save profiling stats to a text file
with open(f"conv_stats_decode.txt", "w") as f:
    print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_time_total", row_limit=-1), file=f)

The difference can be observed with commands:
Before PR:

python test_extra_transposes.py

grep contiguous conv_stats_decode.txt
  aten::contiguous         0.00%       6.501us         0.10%     179.171us      89.585us       0.000us         0.00%       0.000us       0.000us

After PR (empty output):

python test_extra_transposes.py

grep contiguous conv_stats_decode.txt

conv_stats_decode.txt file contains profile stats
conv_profile_decode.json file contains profile and can be visualized using https://ui.perfetto.dev/

jeffdaily · 2025-08-13T14:55:38Z

@pytorchbot merge

pytorchmergebot · 2025-08-13T14:58:59Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

dnikolaev-amd · 2025-08-13T15:09:35Z

@pytorchbot label "topic: not user facing"

jeffdaily · 2025-08-13T15:46:10Z

@pytorchbot merge

pytorchmergebot · 2025-08-13T15:48:48Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-08-13T17:40:21Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2022-cuda12.6-py3 / build

Details for Dev Infra team

Raised by workflow job

jeffdaily · 2025-08-13T17:56:28Z

@pytorchbot merge -f "rocm-only change, only CI failure is from merge base but not flagged as such"

pytorchmergebot · 2025-08-13T17:58:05Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

remove aten::contiguous for NHWC convolutions on ROCm Tests: - nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 Before: <img width="1255" height="228" alt="image" src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57" /> After: <img width="874" height="153" alt="image" src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818" /> Pull Request resolved: #160435 Approved by: https://github.com/jeffdaily

…h#160435) remove aten::contiguous for NHWC convolutions on ROCm Tests: - nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 Before: <img width="1255" height="228" alt="image" src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57" /> After: <img width="874" height="153" alt="image" src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818" /> Pull Request resolved: pytorch#160435 Approved by: https://github.com/jeffdaily

pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Aug 12, 2025

pytorchbot added the open source label Aug 12, 2025

jeffdaily approved these changes Aug 12, 2025

View reviewed changes

jeffdaily added the ciflow/rocm Trigger "default" config CI on ROCm label Aug 12, 2025

dnikolaev-amd marked this pull request as ready for review August 13, 2025 14:30

dnikolaev-amd requested a review from jithunnair-amd as a code owner August 13, 2025 14:30

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 13, 2025

pytorchmergebot added the merging label Aug 13, 2025

pytorchmergebot removed the merging label Aug 13, 2025

pytorch-bot bot added the topic: not user facing topic category label Aug 13, 2025

pytorchmergebot added the merging label Aug 13, 2025

pytorchmergebot removed the merging label Aug 13, 2025

pytorchmergebot added the merging label Aug 13, 2025

pytorchmergebot closed this in 01584d2 Aug 13, 2025

pytorchmergebot added Merged and removed merging labels Aug 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] remove extra transposes in NHWC convolutions on MIOpen #160435

[ROCm] remove extra transposes in NHWC convolutions on MIOpen #160435

Uh oh!

dnikolaev-amd commented Aug 12, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Aug 12, 2025 •

edited

Loading

Uh oh!

dnikolaev-amd commented Aug 12, 2025

Uh oh!

jeffdaily commented Aug 13, 2025

Uh oh!

pytorchmergebot commented Aug 13, 2025

Uh oh!

dnikolaev-amd commented Aug 13, 2025

Uh oh!

jeffdaily commented Aug 13, 2025

Uh oh!

pytorchmergebot commented Aug 13, 2025

Uh oh!

pytorchmergebot commented Aug 13, 2025

Uh oh!

jeffdaily commented Aug 13, 2025

Uh oh!

pytorchmergebot commented Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ROCm] remove extra transposes in NHWC convolutions on MIOpen #160435

[ROCm] remove extra transposes in NHWC convolutions on MIOpen #160435

Uh oh!

Conversation

dnikolaev-amd commented Aug 12, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160435

⏳ 1 Pending, 1 Unrelated Failure

Uh oh!

dnikolaev-amd commented Aug 12, 2025

Uh oh!

jeffdaily commented Aug 13, 2025

Uh oh!

pytorchmergebot commented Aug 13, 2025

Merge failed

Uh oh!

dnikolaev-amd commented Aug 13, 2025

Uh oh!

jeffdaily commented Aug 13, 2025

Uh oh!

pytorchmergebot commented Aug 13, 2025

Merge started

Uh oh!

pytorchmergebot commented Aug 13, 2025

Merge failed

Uh oh!

jeffdaily commented Aug 13, 2025

Uh oh!

pytorchmergebot commented Aug 13, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dnikolaev-amd commented Aug 12, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 12, 2025 •

edited

Loading