-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[MPS] Fix determine_backend_memory_format
logic
#151042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If input is channels last than MPS will return a channels last output This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py That previous failed with ``` AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3 ``` As FakeTensor implementation of conv returned Contiguous, rather than ChannelLast layout [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151042
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 2a649df with merge base 1a1a32c ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
If input is channels last than MPS will return a channels last output This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py That previous failed with ``` AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3 ``` As FakeTensor implementation of conv returned Contiguous, rather than ChannelLast layout cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
@pytorchbot merge -f "Lint + MPS are green" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Literal Python-to-Metal translation of https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225 Fixed missing barrier in `welford_combine` And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass Pull Request resolved: #150824 Approved by: https://github.com/dcci, https://github.com/jansel ghstack dependencies: #151042
Literal Python-to-Metal translation of https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225 Fixed missing barrier in `welford_combine` And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass Pull Request resolved: #150824 Approved by: https://github.com/dcci, https://github.com/jansel ghstack dependencies: #151042
Literal Python-to-Metal translation of https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225 Fixed missing barrier in `welford_combine` And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass Pull Request resolved: #150824 Approved by: https://github.com/dcci, https://github.com/jansel ghstack dependencies: #151042
That avoids double/triple invocation of welford reductions when both mean and deviation must be returned Code has been copy-n-pasted for Halide implementation https://github.com/pytorch/pytorch/blob/575f348965abe8ea428eba7098f67ec9764a7f9a/torch/_inductor/codegen/halide.py#L1189-L1191 Pull Request resolved: #151151 Approved by: https://github.com/jansel ghstack dependencies: #151042, #150824
By using `welford_combine` primitive in the loop This fixes `GPUTests.test_multilayer_var_lowp_mps` Pull Request resolved: #151152 Approved by: https://github.com/jansel ghstack dependencies: #151042, #150824, #151151
By using `welford_combine` primitive in the loop This fixes `GPUTests.test_multilayer_var_lowp_mps` Pull Request resolved: #151152 Approved by: https://github.com/jansel ghstack dependencies: #151042, #150824, #151151
If input is channels last than MPS will return a channels last output This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py That previous failed with ``` AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3 ``` As FakeTensor implementation of conv returned `Contiguous`, rather than `ChannelLast` layout on MacOS-15 or later. This doesn't seem to be very well documented, so will try to document the call path for `ExternKernel` invocation for `aten::convolution`: - First inductor decomp defined here is called https://github.com/pytorch/pytorch/blob/c93e4b829072c96e64f5d85f8f71c10f17771c06/torch/_inductor/kernel/conv.py#L424-L425 - Then it goes thru FakeTensor decomposition implemented here https://github.com/pytorch/pytorch/blob/320914f1b6ce7303548f84ea1bdc3d3ce5cb6e55/torch/_subclasses/fake_impls.py#L739-L740 - Finally it goes down to convolution meta registrations implemented here https://github.com/pytorch/pytorch/blob/320914f1b6ce7303548f84ea1bdc3d3ce5cb6e55/torch/_meta_registrations.py#L2416-L2417 Pull Request resolved: pytorch#151042 Approved by: https://github.com/dcci
Literal Python-to-Metal translation of https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225 Fixed missing barrier in `welford_combine` And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass Pull Request resolved: pytorch#150824 Approved by: https://github.com/dcci, https://github.com/jansel ghstack dependencies: pytorch#151042
Literal Python-to-Metal translation of https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225 Fixed missing barrier in `welford_combine` And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass Pull Request resolved: pytorch#150824 Approved by: https://github.com/dcci, https://github.com/jansel ghstack dependencies: pytorch#151042
Literal Python-to-Metal translation of https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225 Fixed missing barrier in `welford_combine` And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass Pull Request resolved: pytorch#150824 Approved by: https://github.com/dcci, https://github.com/jansel ghstack dependencies: pytorch#151042
That avoids double/triple invocation of welford reductions when both mean and deviation must be returned Code has been copy-n-pasted for Halide implementation https://github.com/pytorch/pytorch/blob/575f348965abe8ea428eba7098f67ec9764a7f9a/torch/_inductor/codegen/halide.py#L1189-L1191 Pull Request resolved: pytorch#151151 Approved by: https://github.com/jansel ghstack dependencies: pytorch#151042, pytorch#150824
…#151152) By using `welford_combine` primitive in the loop This fixes `GPUTests.test_multilayer_var_lowp_mps` Pull Request resolved: pytorch#151152 Approved by: https://github.com/jansel ghstack dependencies: pytorch#151042, pytorch#150824, pytorch#151151
…#151152) By using `welford_combine` primitive in the loop This fixes `GPUTests.test_multilayer_var_lowp_mps` Pull Request resolved: pytorch#151152 Approved by: https://github.com/jansel ghstack dependencies: pytorch#151042, pytorch#150824, pytorch#151151
If input is channels last than MPS will return a channels last output This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py That previous failed with ``` AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3 ``` As FakeTensor implementation of conv returned `Contiguous`, rather than `ChannelLast` layout on MacOS-15 or later. This doesn't seem to be very well documented, so will try to document the call path for `ExternKernel` invocation for `aten::convolution`: - First inductor decomp defined here is called https://github.com/pytorch/pytorch/blob/c93e4b829072c96e64f5d85f8f71c10f17771c06/torch/_inductor/kernel/conv.py#L424-L425 - Then it goes thru FakeTensor decomposition implemented here https://github.com/pytorch/pytorch/blob/320914f1b6ce7303548f84ea1bdc3d3ce5cb6e55/torch/_subclasses/fake_impls.py#L739-L740 - Finally it goes down to convolution meta registrations implemented here https://github.com/pytorch/pytorch/blob/320914f1b6ce7303548f84ea1bdc3d3ce5cb6e55/torch/_meta_registrations.py#L2416-L2417 Pull Request resolved: pytorch#151042 Approved by: https://github.com/dcci
Literal Python-to-Metal translation of https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225 Fixed missing barrier in `welford_combine` And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass Pull Request resolved: pytorch#150824 Approved by: https://github.com/dcci, https://github.com/jansel ghstack dependencies: pytorch#151042
Literal Python-to-Metal translation of https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225 Fixed missing barrier in `welford_combine` And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass Pull Request resolved: pytorch#150824 Approved by: https://github.com/dcci, https://github.com/jansel ghstack dependencies: pytorch#151042
Literal Python-to-Metal translation of https://github.com/pytorch/pytorch/blob/85549fe6de3b9a980d1dc98dc57379501bd2bb18/torch/_inductor/runtime/triton_helpers.py#L217-L225 Fixed missing barrier in `welford_combine` And this is sufficient to make `GPUTests.test_batch_norm_2d_2_mps` to pass Pull Request resolved: pytorch#150824 Approved by: https://github.com/dcci, https://github.com/jansel ghstack dependencies: pytorch#151042
That avoids double/triple invocation of welford reductions when both mean and deviation must be returned Code has been copy-n-pasted for Halide implementation https://github.com/pytorch/pytorch/blob/575f348965abe8ea428eba7098f67ec9764a7f9a/torch/_inductor/codegen/halide.py#L1189-L1191 Pull Request resolved: pytorch#151151 Approved by: https://github.com/jansel ghstack dependencies: pytorch#151042, pytorch#150824
…#151152) By using `welford_combine` primitive in the loop This fixes `GPUTests.test_multilayer_var_lowp_mps` Pull Request resolved: pytorch#151152 Approved by: https://github.com/jansel ghstack dependencies: pytorch#151042, pytorch#150824, pytorch#151151
…#151152) By using `welford_combine` primitive in the loop This fixes `GPUTests.test_multilayer_var_lowp_mps` Pull Request resolved: pytorch#151152 Approved by: https://github.com/jansel ghstack dependencies: pytorch#151042, pytorch#150824, pytorch#151151
If input is channels last than MPS will return a channels last output This fixed `GPUTests.test_convolution_4_mps` from test_torchinductor.py That previous failed with ``` AssertionError: expected size 3==3, stride 1==192 at dim=1; expected size 12==12, stride 48==16 at dim=2; expected size 16==16, stride 3==1 at dim=3 ``` As FakeTensor implementation of conv returned Contiguous, rather than ChannelLast layout ghstack-source-id: 1c626ef Pull Request resolved: pytorch/pytorch#151042
Stack from ghstack (oldest at bottom):
determine_backend_memory_format
logic #151042If input is channels last than MPS will return a channels last output
This fixed
GPUTests.test_convolution_4_mps
from test_torchinductor.pyThat previous failed with
As FakeTensor implementation of conv returned
Contiguous
, rather thanChannelLast
layout on MacOS-15 or later.This doesn't seem to be very well documented, so will try to document the call path for
ExternKernel
invocation foraten::convolution
:First inductor decomp defined here is called
pytorch/torch/_inductor/kernel/conv.py
Lines 424 to 425 in c93e4b8
Then it goes thru FakeTensor decomposition implemented here
pytorch/torch/_subclasses/fake_impls.py
Lines 739 to 740 in 320914f
Finally it goes down to convolution meta registrations implemented here
pytorch/torch/_meta_registrations.py
Lines 2416 to 2417 in 320914f
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov