[MPS] Fix `determine_backend_memory_format` logic #151042

malfet · 2025-04-10T20:24:08Z

Stack from ghstack (oldest at bottom):

If input is channels last than MPS will return a channels last output

This fixed GPUTests.test_convolution_4_mps from test_torchinductor.py

That previous failed with

AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3

As FakeTensor implementation of conv returned Contiguous, rather than ChannelLast layout on MacOS-15 or later.
This doesn't seem to be very well documented, so will try to document the call path for ExternKernel invocation for aten::convolution:

First inductor decomp defined here is called

pytorch/torch/_inductor/kernel/conv.py

Lines 424 to 425 in c93e4b8

@register_lowering(aten.convolution)

def convolution(

Then it goes thru FakeTensor decomposition implemented here

pytorch/torch/_subclasses/fake_impls.py

Lines 739 to 740 in 320914f

    
           @register_op_impl([aten.convolution.default, aten.convolution_backward.default]) 
        
           def conv(fake_mode, func, *args, **kwargs):

Finally it goes down to convolution meta registrations implemented here

pytorch/torch/_meta_registrations.py

Lines 2416 to 2417 in 320914f

@register_meta(aten.convolution.default)

def meta_conv(

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

If input is channels last than MPS will return a channels last output This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py That previous failed with ``` AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3 ``` As FakeTensor implementation of conv returned Contiguous, rather than ChannelLast layout [ghstack-poisoned]

pytorch-bot · 2025-04-10T20:24:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151042

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2a649df with merge base 1a1a32c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

If input is channels last than MPS will return a channels last output This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py That previous failed with ``` AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3 ``` As FakeTensor implementation of conv returned Contiguous, rather than ChannelLast layout cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

malfet · 2025-04-11T01:48:52Z

@pytorchbot merge -f "Lint + MPS are green"

pytorchmergebot · 2025-04-11T01:51:24Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Literal Python-to-Metal translation of https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225 Fixed missing barrier in `welford_combine` And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass Pull Request resolved: #150824 Approved by: https://github.com/dcci, https://github.com/jansel ghstack dependencies: #151042

That avoids double/triple invocation of welford reductions when both mean and deviation must be returned Code has been copy-n-pasted for Halide implementation https://github.com/pytorch/pytorch/blob/575f348965abe8ea428eba7098f67ec9764a7f9a/torch/_inductor/codegen/halide.py#L1189-L1191 Pull Request resolved: #151151 Approved by: https://github.com/jansel ghstack dependencies: #151042, #150824

By using `welford_combine` primitive in the loop This fixes `GPUTests.test_multilayer_var_lowp_mps` Pull Request resolved: #151152 Approved by: https://github.com/jansel ghstack dependencies: #151042, #150824, #151151

If input is channels last than MPS will return a channels last output This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py That previous failed with ``` AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3 ``` As FakeTensor implementation of conv returned `Contiguous`, rather than `ChannelLast` layout on MacOS-15 or later. This doesn't seem to be very well documented, so will try to document the call path for `ExternKernel` invocation for `aten::convolution`: - First inductor decomp defined here is called https://github.com/pytorch/pytorch/blob/c93e4b829072c96e64f5d85f8f71c10f17771c06/torch/_inductor/kernel/conv.py#L424-L425 - Then it goes thru FakeTensor decomposition implemented here https://github.com/pytorch/pytorch/blob/320914f1b6ce7303548f84ea1bdc3d3ce5cb6e55/torch/_subclasses/fake_impls.py#L739-L740 - Finally it goes down to convolution meta registrations implemented here https://github.com/pytorch/pytorch/blob/320914f1b6ce7303548f84ea1bdc3d3ce5cb6e55/torch/_meta_registrations.py#L2416-L2417 Pull Request resolved: pytorch#151042 Approved by: https://github.com/dcci

Literal Python-to-Metal translation of https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225 Fixed missing barrier in `welford_combine` And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass Pull Request resolved: pytorch#150824 Approved by: https://github.com/dcci, https://github.com/jansel ghstack dependencies: pytorch#151042

That avoids double/triple invocation of welford reductions when both mean and deviation must be returned Code has been copy-n-pasted for Halide implementation https://github.com/pytorch/pytorch/blob/575f348965abe8ea428eba7098f67ec9764a7f9a/torch/_inductor/codegen/halide.py#L1189-L1191 Pull Request resolved: pytorch#151151 Approved by: https://github.com/jansel ghstack dependencies: pytorch#151042, pytorch#150824

…#151152) By using `welford_combine` primitive in the loop This fixes `GPUTests.test_multilayer_var_lowp_mps` Pull Request resolved: pytorch#151152 Approved by: https://github.com/jansel ghstack dependencies: pytorch#151042, pytorch#150824, pytorch#151151

If input is channels last than MPS will return a channels last output This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py That previous failed with ``` AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3 ``` As FakeTensor implementation of conv returned `Contiguous`, rather than `ChannelLast` layout on MacOS-15 or later. This doesn't seem to be very well documented, so will try to document the call path for `ExternKernel` invocation for `aten::convolution`: - First inductor decomp defined here is called https://github.com/pytorch/pytorch/blob/c93e4b829072c96e64f5d85f8f71c10f17771c06/torch/_inductor/kernel/conv.py#L424-L425 - Then it goes thru FakeTensor decomposition implemented here https://github.com/pytorch/pytorch/blob/320914f1b6ce7303548f84ea1bdc3d3ce5cb6e55/torch/_subclasses/fake_impls.py#L739-L740 - Finally it goes down to convolution meta registrations implemented here https://github.com/pytorch/pytorch/blob/320914f1b6ce7303548f84ea1bdc3d3ce5cb6e55/torch/_meta_registrations.py#L2416-L2417 Pull Request resolved: pytorch#151042 Approved by: https://github.com/dcci

Literal Python-to-Metal translation of https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225 Fixed missing barrier in `welford_combine` And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass Pull Request resolved: pytorch#150824 Approved by: https://github.com/dcci, https://github.com/jansel ghstack dependencies: pytorch#151042

That avoids double/triple invocation of welford reductions when both mean and deviation must be returned Code has been copy-n-pasted for Halide implementation https://github.com/pytorch/pytorch/blob/575f348965abe8ea428eba7098f67ec9764a7f9a/torch/_inductor/codegen/halide.py#L1189-L1191 Pull Request resolved: pytorch#151151 Approved by: https://github.com/jansel ghstack dependencies: pytorch#151042, pytorch#150824

…#151152) By using `welford_combine` primitive in the loop This fixes `GPUTests.test_multilayer_var_lowp_mps` Pull Request resolved: pytorch#151152 Approved by: https://github.com/jansel ghstack dependencies: pytorch#151042, pytorch#150824, pytorch#151151

If input is channels last than MPS will return a channels last output This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py That previous failed with ``` AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3 ``` As FakeTensor implementation of conv returned Contiguous, rather than ChannelLast layout ghstack-source-id: 1c626ef Pull Request resolved: pytorch/pytorch#151042

malfet mentioned this pull request Apr 10, 2025

[MPSInductor] Naive welford_reduce implementation #150824

Closed

pytorch-bot bot added ciflow/inductor ciflow/mps Run MPS tests (subset of trunk) module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor labels Apr 10, 2025

malfet requested review from Skylion007, dcci and jansel April 10, 2025 20:32

malfet added the topic: bug fixes topic category label Apr 10, 2025

dcci approved these changes Apr 10, 2025

View reviewed changes

malfet added the release notes: mps Release notes category label Apr 10, 2025

pytorchmergebot added the merging label Apr 11, 2025

pytorchmergebot added the Merged label Apr 11, 2025

pytorchmergebot closed this in b7c0fda Apr 11, 2025

pytorchmergebot removed the merging label Apr 11, 2025

This was referenced Apr 12, 2025

[MPSInductor][BE] Implement reduction caching #151151

Closed

[MPSInductor] Fix larger-than-threadgroup Welford reductions #151152

Closed

[MPS] Start benchmarking compile results #151155

Closed

github-actions bot deleted the gh/malfet/273/head branch May 16, 2025 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MPS] Fix `determine_backend_memory_format` logic #151042

[MPS] Fix `determine_backend_memory_format` logic #151042

Uh oh!

malfet commented Apr 10, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 10, 2025 •

edited

Loading

Uh oh!

malfet commented Apr 11, 2025

Uh oh!

pytorchmergebot commented Apr 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	@register_op_impl([aten.convolution.default, aten.convolution_backward.default])
	def conv(fake_mode, func, args, *kwargs):

	@register_lowering(aten.convolution)
	def convolution(

	@register_meta(aten.convolution.default)
	def meta_conv(

[MPS] Fix determine_backend_memory_format logic #151042

[MPS] Fix determine_backend_memory_format logic #151042

Uh oh!

Conversation

malfet commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151042

✅ No Failures

Uh oh!

malfet commented Apr 11, 2025

Uh oh!

pytorchmergebot commented Apr 11, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[MPS] Fix `determine_backend_memory_format` logic #151042

[MPS] Fix `determine_backend_memory_format` logic #151042

malfet commented Apr 10, 2025 •

edited

Loading

pytorch-bot bot commented Apr 10, 2025 •

edited

Loading