Skip to content

Conversation

chunyuan-w
Copy link
Collaborator

@chunyuan-w chunyuan-w commented Mar 7, 2024

Stack from ghstack (oldest at bottom):

Fixes #120873.
Fixes the output stride of Conv in the case of dynamic shapes. The previous logic in inductor assumed that the output stride of Conv is always channels last while it is actually contiguous if dynamic_shapes and is_contiguous_storage_and_layout(x).

Static shape

In static shape cases, since weight is prepacked (weight_t.is_mkldnn() will be true), we'll always force output to be channels last in the Conv kernel, thus it's fine to have the assumption in Inductor that the output stride of Conv is always channels last.

bool use_channels_last =
weight_t.is_mkldnn() || mkldnn_conv_use_channels_last(input_t, weight_t);

Dynamic shape

In dynamic shape cases, we won't do weight prepack for Conv, in this case, the Conv kernel decides the output layout based on the input and weight layout.

# For dynamic shape case, we need to pack weight in runtime.
packed_weight_node = args[1]

For input with channels = 1, like tensor of size (s0, 1, 28, 28) and stride (784, 784, 28, 1), in Inductor, with req_stride_order in channels last order, the require_stride_order on x of such size and stride won't change the stride of the tensor since stride for dimensions of size 1 is ignored

x = cls.require_stride_order(x, req_stride_order)

While in Conv kernel, such tensor is consider it as contiguous tensor instead of channels last tensor thus the output of the Conv kernel will be in contiguous format.

bool can_use_mkldnn_channels_last_2d =
(input_memory_format == at::MemoryFormat::ChannelsLast) ||
(weight_memory_format == at::MemoryFormat::ChannelsLast);
bool can_use_mkldnn_channels_last_3d =
(input_memory_format == at::MemoryFormat::ChannelsLast3d) ||
(weight_memory_format == at::MemoryFormat::ChannelsLast3d);
return can_use_mkldnn_channels_last_2d || can_use_mkldnn_channels_last_3d;

To align the behavior of the Conv kernel, we set the output_stride in such case to be contiguous instead of channels last.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang

Copy link

pytorch-bot bot commented Mar 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121400

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5df9178 with merge base 953c6c3 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

chunyuan-w added a commit that referenced this pull request Mar 7, 2024
ghstack-source-id: 148818d
Pull Request resolved: #121400
@chunyuan-w chunyuan-w marked this pull request as draft March 7, 2024 09:43
Fixes #120873

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@chunyuan-w chunyuan-w changed the title Inductor: fix channels last format check Inductor: fix Conv output stride for dynamic shapes Mar 8, 2024
@chunyuan-w
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/chunyuan-w/2/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/121400)

Fixes #120873.
Fixes the output stride of Conv in the case of dynamic shapes. The previous logic in inductor assumed that the output stride of Conv is always channels last while it is actually contiguous if `dynamic_shapes and is_contiguous_storage_and_layout(x)`.

### Static shape
In static shape cases, since weight is prepacked (`weight_t.is_mkldnn()` will be `true`), we'll always force output to be channels last in the Conv kernel, thus it's fine to have the assumption in Inductor that the output stride of Conv is always channels last.
https://github.com/pytorch/pytorch/blob/96ed37ac13366cc9a7e6645b8955061d0a14f80b/aten/src/ATen/native/mkldnn/Conv.cpp#L357-L358

### Dynamic shape
In dynamic shape cases, we won't do weight prepack for Conv, in this case, the Conv kernel decides the output layout based on the input and weight layout.
https://github.com/pytorch/pytorch/blob/96ed37ac13366cc9a7e6645b8955061d0a14f80b/torch/_inductor/fx_passes/mkldnn_fusion.py#L1024-L1025

For input with `channels = 1`, like tensor of size `(s0, 1, 28, 28)` and stride `(784, 784, 28, 1)`, in Inductor, with `req_stride_order` in channels last order, the `require_stride_order` on `x` of such size and stride won't change the stride of the tensor since stride for dimensions of size 1 is ignored
https://github.com/pytorch/pytorch/blob/96ed37ac13366cc9a7e6645b8955061d0a14f80b/torch/_inductor/ir.py#L5451

While in Conv kernel, such tensor is consider it as **contiguous** tensor instead of channels last tensor thus the output of the Conv kernel will be in contiguous format.
https://github.com/pytorch/pytorch/blob/96ed37ac13366cc9a7e6645b8955061d0a14f80b/aten/src/ATen/native/ConvUtils.h#L396-L404

To align the behavior of the Conv kernel, we set the output_stride in such case to be contiguous instead of channels last.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Mar 8, 2024
ghstack-source-id: 68913cd
Pull Request resolved: #121400
@chunyuan-w chunyuan-w marked this pull request as ready for review March 11, 2024 01:56
@chunyuan-w chunyuan-w requested a review from jgong5 March 11, 2024 01:58
@chunyuan-w chunyuan-w requested review from desertfire and jansel March 15, 2024 11:08
Copy link
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just kicked off a performance run for this here:
https://github.com/pytorch/pytorch/actions/runs/8299187243

Once that workflow finishes:

  1. Check the performance here (select the branch in the dropdown)
  2. If the perf looks good, please comment and re-request my review.

@chunyuan-w
Copy link
Collaborator Author

I just kicked off a performance run for this here: https://github.com/pytorch/pytorch/actions/runs/8299187243

Once that workflow finishes:

  1. Check the performance here (select the branch in the dropdown)
  2. If the perf looks good, please comment and re-request my review.

Hi @jansel, thanks for triggering the performance run. I selected 291ce86a6c as the Base commit (which is 1 commit behind the base commit 953c6c37cb of the current PR) and 016788e on the gh/chunyuan-w/2/head as the New Commit, below is the comparison result. I only found the result of cudagraphs and cudagraphs_dynamic of the below Mode and Precision and seems there's no obvious performance change. Let me know if I'm reading the performance dashboard correctly.

Mode inference and Precision bfloat16:

image

Mode training and Precision amp:

image

@chunyuan-w chunyuan-w requested a review from jansel March 18, 2024 01:58
Copy link
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just wanted to double check that.

@chunyuan-w
Copy link
Collaborator Author

@pytorchbot rebase

@chunyuan-w chunyuan-w added topic: not user facing topic category ciflow/trunk Trigger trunk jobs on your pull request labels Mar 18, 2024
@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/chunyuan-w/2/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/121400)

pytorchmergebot pushed a commit that referenced this pull request Mar 18, 2024
ghstack-source-id: 91d810a
Pull Request resolved: #121400
@chunyuan-w
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants