Skip to content

Conversation

dnikolaev-amd
Copy link
Contributor

@dnikolaev-amd dnikolaev-amd commented Aug 12, 2025

remove aten::contiguous for NHWC convolutions on ROCm

Tests:

  • nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32
  • nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16

Before:
image

After:
image

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

remove aten::contiguous for NHWC convolutions on ROCm

Tests:
-
nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32
-
nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16

Before:
<img width="1255" height="228" alt="image"
src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57"
/>

After:
<img width="874" height="153" alt="image"
src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818"
/>
Copy link

pytorch-bot bot commented Aug 12, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160435

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 1 Unrelated Failure

As of commit 1812855 with merge base ee9f8ba (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Aug 12, 2025
@dnikolaev-amd
Copy link
Contributor Author

Simplified convolution test for collecting profile based on nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32:

# file name test_extra_transposes.py
import os
import torch
import torch.nn as nn

#enable NHWC Conv for MIOpen
os.environ["PYTORCH_MIOPEN_SUGGEST_NHWC"] = "1"

def helper(n, c, h, w, out_channels, dtype, kernel_size, groups):
        input = torch.randint(-3, 3, (n, c, h, w), dtype=dtype, device="cuda").to(
            memory_format=torch.channels_last).requires_grad_()
        conv = nn.Conv2d(c, out_channels, kernel_size, groups=groups).to(
            device="cuda", dtype=dtype, memory_format=torch.channels_last
        )
        for p in conv.parameters():
            p.data = torch.randint_like(p, -3, 3)
        out = conv(input)
        grad = torch.randint_like(out, -3, 3)
        out.backward(grad)

# start torch.profiler to capture kernels
prof = torch.profiler.profile()
prof.start()

helper(2, 8, 4, 4, out_channels=8, dtype=torch.float32, kernel_size=3, groups=8)

prof.stop()
#save profiling results
prof.export_chrome_trace(f"conv_profile_decode.json")
#save profiling stats to a text file
with open(f"conv_stats_decode.txt", "w") as f:
    print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_time_total", row_limit=-1), file=f)

The difference can be observed with commands:
Before PR:

python test_extra_transposes.py

grep contiguous conv_stats_decode.txt
  aten::contiguous         0.00%       6.501us         0.10%     179.171us      89.585us       0.000us         0.00%       0.000us       0.000us

After PR (empty output):

python test_extra_transposes.py

grep contiguous conv_stats_decode.txt

conv_stats_decode.txt file contains profile stats
conv_profile_decode.json file contains profile and can be visualized using https://ui.perfetto.dev/

@jeffdaily jeffdaily added the ciflow/rocm Trigger "default" config CI on ROCm label Aug 12, 2025
@dnikolaev-amd dnikolaev-amd marked this pull request as ready for review August 13, 2025 14:30
@jeffdaily
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 13, 2025
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@dnikolaev-amd
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Aug 13, 2025
@jeffdaily
Copy link
Collaborator

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2022-cuda12.6-py3 / build

Details for Dev Infra team Raised by workflow job

@jeffdaily
Copy link
Collaborator

@pytorchbot merge -f "rocm-only change, only CI failure is from merge base but not flagged as such"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

chuanhaozhuge pushed a commit that referenced this pull request Aug 14, 2025
remove aten::contiguous for NHWC convolutions on ROCm

Tests:
- nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32
- nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16

Before:
<img width="1255" height="228" alt="image"
src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57" />

After:
<img width="874" height="153" alt="image"
src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818" />

Pull Request resolved: #160435
Approved by: https://github.com/jeffdaily
chuanhaozhuge pushed a commit that referenced this pull request Aug 18, 2025
remove aten::contiguous for NHWC convolutions on ROCm

Tests:
- nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32
- nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16

Before:
<img width="1255" height="228" alt="image"
src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57" />

After:
<img width="874" height="153" alt="image"
src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818" />

Pull Request resolved: #160435
Approved by: https://github.com/jeffdaily
can-gaa-hou pushed a commit to can-gaa-hou/pytorch that referenced this pull request Aug 22, 2025
…h#160435)

remove aten::contiguous for NHWC convolutions on ROCm

Tests:
- nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32
- nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16

Before:
<img width="1255" height="228" alt="image"
src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57" />

After:
<img width="874" height="153" alt="image"
src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818" />

Pull Request resolved: pytorch#160435
Approved by: https://github.com/jeffdaily
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…h#160435)

remove aten::contiguous for NHWC convolutions on ROCm

Tests:
- nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32
- nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16

Before:
<img width="1255" height="228" alt="image"
src="https://github.com/user-attachments/assets/b125ccab-00c2-4d3a-a341-4583e51d8d57" />

After:
<img width="874" height="153" alt="image"
src="https://github.com/user-attachments/assets/ec200754-3622-488e-8762-bff1c2d22818" />

Pull Request resolved: pytorch#160435
Approved by: https://github.com/jeffdaily
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants