Skip to content

Conversation

malfet
Copy link
Contributor

@malfet malfet commented Apr 10, 2025

Stack from ghstack (oldest at bottom):

If input is channels last than MPS will return a channels last output

This fixed GPUTests.test_convolution_4_mps from test_torchinductor.py

That previous failed with

AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3

As FakeTensor implementation of conv returned Contiguous, rather than ChannelLast layout on MacOS-15 or later.
This doesn't seem to be very well documented, so will try to document the call path for ExternKernel invocation for aten::convolution:

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

If input is channels last than MPS will return a channels last output

This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py

That previous failed with
```
AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3
```
As FakeTensor implementation of conv returned Contiguous, rather than
ChannelLast layout

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Apr 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151042

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2a649df with merge base 1a1a32c (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor ciflow/mps Run MPS tests (subset of trunk) module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor labels Apr 10, 2025
@malfet malfet requested review from Skylion007, dcci and jansel April 10, 2025 20:32
@malfet malfet added the topic: bug fixes topic category label Apr 10, 2025
If input is channels last than MPS will return a channels last output

This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py

That previous failed with
```
AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3
```
As FakeTensor implementation of conv returned Contiguous, rather than
ChannelLast layout

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
@malfet malfet added the release notes: mps Release notes category label Apr 10, 2025
@malfet
Copy link
Contributor Author

malfet commented Apr 11, 2025

@pytorchbot merge -f "Lint + MPS are green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Apr 11, 2025
Literal Python-to-Metal translation of
https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225

Fixed missing barrier in `welford_combine`
And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass

Pull Request resolved: #150824
Approved by: https://github.com/dcci, https://github.com/jansel
ghstack dependencies: #151042
pytorchmergebot pushed a commit that referenced this pull request Apr 12, 2025
Literal Python-to-Metal translation of
https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225

Fixed missing barrier in `welford_combine`
And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass

Pull Request resolved: #150824
Approved by: https://github.com/dcci, https://github.com/jansel
ghstack dependencies: #151042
pytorchmergebot pushed a commit that referenced this pull request Apr 12, 2025
Literal Python-to-Metal translation of
https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225

Fixed missing barrier in `welford_combine`
And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass

Pull Request resolved: #150824
Approved by: https://github.com/dcci, https://github.com/jansel
ghstack dependencies: #151042
pytorchmergebot pushed a commit that referenced this pull request Apr 12, 2025
That avoids double/triple invocation of welford reductions when both
mean and deviation must be returned

Code has been copy-n-pasted for Halide implementation
https://github.com/pytorch/pytorch/blob/575f348965abe8ea428eba7098f67ec9764a7f9a/torch/_inductor/codegen/halide.py#L1189-L1191

Pull Request resolved: #151151
Approved by: https://github.com/jansel
ghstack dependencies: #151042, #150824
pytorchmergebot pushed a commit that referenced this pull request Apr 12, 2025
By using `welford_combine` primitive in the loop
This fixes `GPUTests.test_multilayer_var_lowp_mps`

Pull Request resolved: #151152
Approved by: https://github.com/jansel
ghstack dependencies: #151042, #150824, #151151
pytorchmergebot pushed a commit that referenced this pull request Apr 12, 2025
By using `welford_combine` primitive in the loop
This fixes `GPUTests.test_multilayer_var_lowp_mps`

Pull Request resolved: #151152
Approved by: https://github.com/jansel
ghstack dependencies: #151042, #150824, #151151
timocafe pushed a commit to timocafe/pytorch that referenced this pull request Apr 16, 2025
If input is channels last than MPS will return a channels last output

This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py

That previous failed with
```
AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3
```
As FakeTensor implementation of conv returned `Contiguous`, rather than `ChannelLast` layout on MacOS-15 or later.
This doesn't seem to be very well documented, so will try to document the call path for `ExternKernel` invocation for `aten::convolution`:
 - First inductor decomp defined here is called
 https://github.com/pytorch/pytorch/blob/c93e4b829072c96e64f5d85f8f71c10f17771c06/torch/_inductor/kernel/conv.py#L424-L425

- Then it goes thru FakeTensor decomposition implemented here
https://github.com/pytorch/pytorch/blob/320914f1b6ce7303548f84ea1bdc3d3ce5cb6e55/torch/_subclasses/fake_impls.py#L739-L740
- Finally it goes down to convolution meta registrations implemented here
https://github.com/pytorch/pytorch/blob/320914f1b6ce7303548f84ea1bdc3d3ce5cb6e55/torch/_meta_registrations.py#L2416-L2417

Pull Request resolved: pytorch#151042
Approved by: https://github.com/dcci
timocafe pushed a commit to timocafe/pytorch that referenced this pull request Apr 16, 2025
Literal Python-to-Metal translation of
https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225

Fixed missing barrier in `welford_combine`
And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass

Pull Request resolved: pytorch#150824
Approved by: https://github.com/dcci, https://github.com/jansel
ghstack dependencies: pytorch#151042
timocafe pushed a commit to timocafe/pytorch that referenced this pull request Apr 16, 2025
Literal Python-to-Metal translation of
https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225

Fixed missing barrier in `welford_combine`
And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass

Pull Request resolved: pytorch#150824
Approved by: https://github.com/dcci, https://github.com/jansel
ghstack dependencies: pytorch#151042
timocafe pushed a commit to timocafe/pytorch that referenced this pull request Apr 16, 2025
Literal Python-to-Metal translation of
https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225

Fixed missing barrier in `welford_combine`
And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass

Pull Request resolved: pytorch#150824
Approved by: https://github.com/dcci, https://github.com/jansel
ghstack dependencies: pytorch#151042
timocafe pushed a commit to timocafe/pytorch that referenced this pull request Apr 16, 2025
That avoids double/triple invocation of welford reductions when both
mean and deviation must be returned

Code has been copy-n-pasted for Halide implementation
https://github.com/pytorch/pytorch/blob/575f348965abe8ea428eba7098f67ec9764a7f9a/torch/_inductor/codegen/halide.py#L1189-L1191

Pull Request resolved: pytorch#151151
Approved by: https://github.com/jansel
ghstack dependencies: pytorch#151042, pytorch#150824
timocafe pushed a commit to timocafe/pytorch that referenced this pull request Apr 16, 2025
…#151152)

By using `welford_combine` primitive in the loop
This fixes `GPUTests.test_multilayer_var_lowp_mps`

Pull Request resolved: pytorch#151152
Approved by: https://github.com/jansel
ghstack dependencies: pytorch#151042, pytorch#150824, pytorch#151151
timocafe pushed a commit to timocafe/pytorch that referenced this pull request Apr 16, 2025
…#151152)

By using `welford_combine` primitive in the loop
This fixes `GPUTests.test_multilayer_var_lowp_mps`

Pull Request resolved: pytorch#151152
Approved by: https://github.com/jansel
ghstack dependencies: pytorch#151042, pytorch#150824, pytorch#151151
amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
If input is channels last than MPS will return a channels last output

This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py

That previous failed with
```
AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3
```
As FakeTensor implementation of conv returned `Contiguous`, rather than `ChannelLast` layout on MacOS-15 or later.
This doesn't seem to be very well documented, so will try to document the call path for `ExternKernel` invocation for `aten::convolution`:
 - First inductor decomp defined here is called
 https://github.com/pytorch/pytorch/blob/c93e4b829072c96e64f5d85f8f71c10f17771c06/torch/_inductor/kernel/conv.py#L424-L425

- Then it goes thru FakeTensor decomposition implemented here
https://github.com/pytorch/pytorch/blob/320914f1b6ce7303548f84ea1bdc3d3ce5cb6e55/torch/_subclasses/fake_impls.py#L739-L740
- Finally it goes down to convolution meta registrations implemented here
https://github.com/pytorch/pytorch/blob/320914f1b6ce7303548f84ea1bdc3d3ce5cb6e55/torch/_meta_registrations.py#L2416-L2417

Pull Request resolved: pytorch#151042
Approved by: https://github.com/dcci
amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
Literal Python-to-Metal translation of
https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225

Fixed missing barrier in `welford_combine`
And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass

Pull Request resolved: pytorch#150824
Approved by: https://github.com/dcci, https://github.com/jansel
ghstack dependencies: pytorch#151042
amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
Literal Python-to-Metal translation of
https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225

Fixed missing barrier in `welford_combine`
And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass

Pull Request resolved: pytorch#150824
Approved by: https://github.com/dcci, https://github.com/jansel
ghstack dependencies: pytorch#151042
amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
Literal Python-to-Metal translation of
https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225

Fixed missing barrier in `welford_combine`
And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass

Pull Request resolved: pytorch#150824
Approved by: https://github.com/dcci, https://github.com/jansel
ghstack dependencies: pytorch#151042
amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
That avoids double/triple invocation of welford reductions when both
mean and deviation must be returned

Code has been copy-n-pasted for Halide implementation
https://github.com/pytorch/pytorch/blob/575f348965abe8ea428eba7098f67ec9764a7f9a/torch/_inductor/codegen/halide.py#L1189-L1191

Pull Request resolved: pytorch#151151
Approved by: https://github.com/jansel
ghstack dependencies: pytorch#151042, pytorch#150824
amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
…#151152)

By using `welford_combine` primitive in the loop
This fixes `GPUTests.test_multilayer_var_lowp_mps`

Pull Request resolved: pytorch#151152
Approved by: https://github.com/jansel
ghstack dependencies: pytorch#151042, pytorch#150824, pytorch#151151
amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025
…#151152)

By using `welford_combine` primitive in the loop
This fixes `GPUTests.test_multilayer_var_lowp_mps`

Pull Request resolved: pytorch#151152
Approved by: https://github.com/jansel
ghstack dependencies: pytorch#151042, pytorch#150824, pytorch#151151
Divigroup-RAP pushed a commit to Divigroup-RAP/PYTORCH that referenced this pull request Apr 22, 2025
If input is channels last than MPS will return a channels last output

This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py

That previous failed with
```
AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3
```
As FakeTensor implementation of conv returned Contiguous, rather than
ChannelLast layout

ghstack-source-id: 1c626ef
Pull Request resolved: pytorch/pytorch#151042
@github-actions github-actions bot deleted the gh/malfet/273/head branch May 16, 2025 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/mps Run MPS tests (subset of trunk) Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor release notes: mps Release notes category topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants