-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Inductor: fix Conv output stride for dynamic shapes #121400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/121400
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 5df9178 with merge base 953c6c3 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Fixes #120873 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
Fixes #120873. Fixes the output stride of Conv in the case of dynamic shapes. The previous logic in inductor assumed that the output stride of Conv is always channels last while it is actually contiguous if `dynamic_shapes and is_contiguous_storage_and_layout(x)`. ### Static shape In static shape cases, since weight is prepacked (`weight_t.is_mkldnn()` will be `true`), we'll always force output to be channels last in the Conv kernel, thus it's fine to have the assumption in Inductor that the output stride of Conv is always channels last. https://github.com/pytorch/pytorch/blob/96ed37ac13366cc9a7e6645b8955061d0a14f80b/aten/src/ATen/native/mkldnn/Conv.cpp#L357-L358 ### Dynamic shape In dynamic shape cases, we won't do weight prepack for Conv, in this case, the Conv kernel decides the output layout based on the input and weight layout. https://github.com/pytorch/pytorch/blob/96ed37ac13366cc9a7e6645b8955061d0a14f80b/torch/_inductor/fx_passes/mkldnn_fusion.py#L1024-L1025 For input with `channels = 1`, like tensor of size `(s0, 1, 28, 28)` and stride `(784, 784, 28, 1)`, in Inductor, with `req_stride_order` in channels last order, the `require_stride_order` on `x` of such size and stride won't change the stride of the tensor since stride for dimensions of size 1 is ignored https://github.com/pytorch/pytorch/blob/96ed37ac13366cc9a7e6645b8955061d0a14f80b/torch/_inductor/ir.py#L5451 While in Conv kernel, such tensor is consider it as **contiguous** tensor instead of channels last tensor thus the output of the Conv kernel will be in contiguous format. https://github.com/pytorch/pytorch/blob/96ed37ac13366cc9a7e6645b8955061d0a14f80b/aten/src/ATen/native/ConvUtils.h#L396-L404 To align the behavior of the Conv kernel, we set the output_stride in such case to be contiguous instead of channels last. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just kicked off a performance run for this here:
https://github.com/pytorch/pytorch/actions/runs/8299187243
Once that workflow finishes:
- Check the performance here (select the branch in the dropdown)
- If the perf looks good, please comment and re-request my review.
Hi @jansel, thanks for triggering the performance run. I selected 291ce86a6c as the Base commit (which is 1 commit behind the base commit 953c6c37cb of the current PR) and 016788e on the Mode
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just wanted to double check that.
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Fixes #120873.
Fixes the output stride of Conv in the case of dynamic shapes. The previous logic in inductor assumed that the output stride of Conv is always channels last while it is actually contiguous if
dynamic_shapes and is_contiguous_storage_and_layout(x)
.Static shape
In static shape cases, since weight is prepacked (
weight_t.is_mkldnn()
will betrue
), we'll always force output to be channels last in the Conv kernel, thus it's fine to have the assumption in Inductor that the output stride of Conv is always channels last.pytorch/aten/src/ATen/native/mkldnn/Conv.cpp
Lines 357 to 358 in 96ed37a
Dynamic shape
In dynamic shape cases, we won't do weight prepack for Conv, in this case, the Conv kernel decides the output layout based on the input and weight layout.
pytorch/torch/_inductor/fx_passes/mkldnn_fusion.py
Lines 1024 to 1025 in 96ed37a
For input with
channels = 1
, like tensor of size(s0, 1, 28, 28)
and stride(784, 784, 28, 1)
, in Inductor, withreq_stride_order
in channels last order, therequire_stride_order
onx
of such size and stride won't change the stride of the tensor since stride for dimensions of size 1 is ignoredpytorch/torch/_inductor/ir.py
Line 5451 in 96ed37a
While in Conv kernel, such tensor is consider it as contiguous tensor instead of channels last tensor thus the output of the Conv kernel will be in contiguous format.
pytorch/aten/src/ATen/native/ConvUtils.h
Lines 396 to 404 in 96ed37a
To align the behavior of the Conv kernel, we set the output_stride in such case to be contiguous instead of channels last.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang