Inductor: fix Conv output stride for dynamic shapes #121400

chunyuan-w · 2024-03-07T09:42:30Z

Stack from ghstack (oldest at bottom):

-> Inductor: fix Conv output stride for dynamic shapes #121400

Fixes #120873.
Fixes the output stride of Conv in the case of dynamic shapes. The previous logic in inductor assumed that the output stride of Conv is always channels last while it is actually contiguous if dynamic_shapes and is_contiguous_storage_and_layout(x).

Static shape

In static shape cases, since weight is prepacked (weight_t.is_mkldnn() will be true), we'll always force output to be channels last in the Conv kernel, thus it's fine to have the assumption in Inductor that the output stride of Conv is always channels last.

pytorch/aten/src/ATen/native/mkldnn/Conv.cpp

Lines 357 to 358 in 96ed37a

    
           bool use_channels_last = 
        
               weight_t.is_mkldnn() || mkldnn_conv_use_channels_last(input_t, weight_t);

Dynamic shape

In dynamic shape cases, we won't do weight prepack for Conv, in this case, the Conv kernel decides the output layout based on the input and weight layout.

pytorch/torch/_inductor/fx_passes/mkldnn_fusion.py

Lines 1024 to 1025 in 96ed37a

    
           # For dynamic shape case, we need to pack weight in runtime. 
        
           packed_weight_node = args[1]

For input with channels = 1, like tensor of size (s0, 1, 28, 28) and stride (784, 784, 28, 1), in Inductor, with req_stride_order in channels last order, the require_stride_order on x of such size and stride won't change the stride of the tensor since stride for dimensions of size 1 is ignored

pytorch/torch/_inductor/ir.py

Line 5451 in 96ed37a

x = cls.require_stride_order(x, req_stride_order)

While in Conv kernel, such tensor is consider it as contiguous tensor instead of channels last tensor thus the output of the Conv kernel will be in contiguous format.

pytorch/aten/src/ATen/native/ConvUtils.h

Lines 396 to 404 in 96ed37a

    
           bool can_use_mkldnn_channels_last_2d = 
        
               (input_memory_format  == at::MemoryFormat::ChannelsLast) || 
        
               (weight_memory_format == at::MemoryFormat::ChannelsLast); 
        
           bool can_use_mkldnn_channels_last_3d = 
        
               (input_memory_format  == at::MemoryFormat::ChannelsLast3d) || 
        
               (weight_memory_format == at::MemoryFormat::ChannelsLast3d); 
        
           return can_use_mkldnn_channels_last_2d || can_use_mkldnn_channels_last_3d;

To align the behavior of the Conv kernel, we set the output_stride in such case to be contiguous instead of channels last.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-03-07T09:42:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121400

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5df9178 with merge base 953c6c3 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 148818d Pull Request resolved: #121400

Fixes #120873 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

chunyuan-w · 2024-03-08T08:44:38Z

@pytorchbot rebase

pytorchmergebot · 2024-03-08T08:46:10Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]

pytorchmergebot · 2024-03-08T08:46:26Z

Successfully rebased gh/chunyuan-w/2/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/121400)

Fixes #120873. Fixes the output stride of Conv in the case of dynamic shapes. The previous logic in inductor assumed that the output stride of Conv is always channels last while it is actually contiguous if `dynamic_shapes and is_contiguous_storage_and_layout(x)`. ### Static shape In static shape cases, since weight is prepacked (`weight_t.is_mkldnn()` will be `true`), we'll always force output to be channels last in the Conv kernel, thus it's fine to have the assumption in Inductor that the output stride of Conv is always channels last. https://github.com/pytorch/pytorch/blob/96ed37ac13366cc9a7e6645b8955061d0a14f80b/aten/src/ATen/native/mkldnn/Conv.cpp#L357-L358 ### Dynamic shape In dynamic shape cases, we won't do weight prepack for Conv, in this case, the Conv kernel decides the output layout based on the input and weight layout. https://github.com/pytorch/pytorch/blob/96ed37ac13366cc9a7e6645b8955061d0a14f80b/torch/_inductor/fx_passes/mkldnn_fusion.py#L1024-L1025 For input with `channels = 1`, like tensor of size `(s0, 1, 28, 28)` and stride `(784, 784, 28, 1)`, in Inductor, with `req_stride_order` in channels last order, the `require_stride_order` on `x` of such size and stride won't change the stride of the tensor since stride for dimensions of size 1 is ignored https://github.com/pytorch/pytorch/blob/96ed37ac13366cc9a7e6645b8955061d0a14f80b/torch/_inductor/ir.py#L5451 While in Conv kernel, such tensor is consider it as **contiguous** tensor instead of channels last tensor thus the output of the Conv kernel will be in contiguous format. https://github.com/pytorch/pytorch/blob/96ed37ac13366cc9a7e6645b8955061d0a14f80b/aten/src/ATen/native/ConvUtils.h#L396-L404 To align the behavior of the Conv kernel, we set the output_stride in such case to be contiguous instead of channels last. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: 68913cd Pull Request resolved: #121400

torch/_inductor/ir.py

jansel

I just kicked off a performance run for this here:
https://github.com/pytorch/pytorch/actions/runs/8299187243

Once that workflow finishes:

Check the performance here (select the branch in the dropdown)
If the perf looks good, please comment and re-request my review.

chunyuan-w · 2024-03-18T01:57:52Z

I just kicked off a performance run for this here: https://github.com/pytorch/pytorch/actions/runs/8299187243

Once that workflow finishes:

Check the performance here (select the branch in the dropdown)

If the perf looks good, please comment and re-request my review.

Hi @jansel, thanks for triggering the performance run. I selected 291ce86a6c as the Base commit (which is 1 commit behind the base commit 953c6c37cb of the current PR) and 016788e on the gh/chunyuan-w/2/head as the New Commit, below is the comparison result. I only found the result of cudagraphs and cudagraphs_dynamic of the below Mode and Precision and seems there's no obvious performance change. Let me know if I'm reading the performance dashboard correctly.

Mode `inference` and Precision `bfloat16`:

Mode `training` and Precision `amp`:

jansel

Thanks! Just wanted to double check that.

chunyuan-w · 2024-03-18T05:18:34Z

@pytorchbot rebase

pytorchmergebot · 2024-03-18T05:20:12Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]

pytorchmergebot · 2024-03-18T05:20:28Z

Successfully rebased gh/chunyuan-w/2/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/121400)

ghstack-source-id: 91d810a Pull Request resolved: #121400

chunyuan-w · 2024-03-18T10:54:37Z

@pytorchbot merge

pytorchmergebot · 2024-03-18T10:56:42Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Inductor: fix channels last format check

7ab1d1f

[ghstack-poisoned]

chunyuan-w added a commit that referenced this pull request Mar 7, 2024

Inductor: fix channels last format check

05ec487

ghstack-source-id: 148818d Pull Request resolved: #121400

github-actions bot added the module: inductor label Mar 7, 2024

chunyuan-w marked this pull request as draft March 7, 2024 09:43

pytorchbot added the open source label Mar 7, 2024

github-actions bot added the ciflow/inductor label Mar 8, 2024

chunyuan-w changed the title ~~Inductor: fix channels last format check~~ Inductor: fix Conv output stride for dynamic shapes Mar 8, 2024

Update

1c04546

[ghstack-poisoned]

chunyuan-w added a commit that referenced this pull request Mar 8, 2024

Inductor: fix channels last format check

8b5a3f7

ghstack-source-id: 68913cd Pull Request resolved: #121400

chunyuan-w marked this pull request as ready for review March 11, 2024 01:56

chunyuan-w requested a review from jgong5 March 11, 2024 01:58

jgong5 requested changes Mar 15, 2024

View reviewed changes

torch/_inductor/ir.py Show resolved Hide resolved

jgong5 approved these changes Mar 15, 2024

View reviewed changes

chunyuan-w requested review from desertfire and jansel March 15, 2024 11:08

jansel requested changes Mar 15, 2024

View reviewed changes

chunyuan-w requested a review from jansel March 18, 2024 01:58

jansel approved these changes Mar 18, 2024

View reviewed changes

chunyuan-w added topic: not user facing topic category ciflow/trunk Trigger trunk jobs on your pull request labels Mar 18, 2024

Update

5df9178

[ghstack-poisoned]

pytorchmergebot pushed a commit that referenced this pull request Mar 18, 2024

Inductor: fix channels last format check

17e30e2

ghstack-source-id: 91d810a Pull Request resolved: #121400

pytorchmergebot added the merging label Mar 18, 2024

pytorchmergebot closed this in f2f8eee Mar 18, 2024

pytorchmergebot added Merged and removed merging labels Mar 18, 2024

chunyuan-w mentioned this pull request Mar 21, 2024

[inductor][cpu] maml_omniglot fp32 Dynamic shape default wrapper accuracy crashed #122285

Closed

github-actions bot deleted the gh/chunyuan-w/2/head branch April 18, 2024 01:53

	bool use_channels_last =
	weight_t.is_mkldnn() \|\| mkldnn_conv_use_channels_last(input_t, weight_t);

	# For dynamic shape case, we need to pack weight in runtime.
	packed_weight_node = args[1]

	bool can_use_mkldnn_channels_last_2d =
	(input_memory_format == at::MemoryFormat::ChannelsLast) \|\|
	(weight_memory_format == at::MemoryFormat::ChannelsLast);

	bool can_use_mkldnn_channels_last_3d =
	(input_memory_format == at::MemoryFormat::ChannelsLast3d) \|\|
	(weight_memory_format == at::MemoryFormat::ChannelsLast3d);

	return can_use_mkldnn_channels_last_2d \|\| can_use_mkldnn_channels_last_3d;

Inductor: fix Conv output stride for dynamic shapes #121400

Inductor: fix Conv output stride for dynamic shapes #121400

Uh oh!

Conversation

chunyuan-w commented Mar 7, 2024 • edited by pytorchmergebot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Static shape

Dynamic shape

Uh oh!

pytorch-bot bot commented Mar 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121400

✅ No Failures

Uh oh!

chunyuan-w commented Mar 8, 2024

Uh oh!

pytorchmergebot commented Mar 8, 2024

Uh oh!

pytorchmergebot commented Mar 8, 2024

Uh oh!

Uh oh!

jansel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chunyuan-w commented Mar 18, 2024

Mode inference and Precision bfloat16:

Mode training and Precision amp:

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

chunyuan-w commented Mar 18, 2024

Uh oh!

pytorchmergebot commented Mar 18, 2024

Uh oh!

pytorchmergebot commented Mar 18, 2024

Uh oh!

chunyuan-w commented Mar 18, 2024

Uh oh!

pytorchmergebot commented Mar 18, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chunyuan-w commented Mar 7, 2024 •

edited by pytorchmergebot

Loading

pytorch-bot bot commented Mar 7, 2024 •

edited

Loading

jansel left a comment •

edited

Loading

Mode `inference` and Precision `bfloat16`:

Mode `training` and Precision `amp`: