[MPS] Pixel shuffle unshuffle support #99306

alexdremov · 2023-04-17T10:48:08Z

Now, MPS implementation is blazingly fast.

Though, I have several questions on improving this PR:

I copied code from test_nn.py. Is there better way to test this?
I decided to use usepixelshuffleorder:YES. Am I right performance-wise? According to docs:

`usePixelShuffleOrder` can be
used to control how the data within spatial blocks is ordered in the
`depthAxis` dimension: with `usePixelShuffleOrder=YES` the values within the
spatial blocks are stored contiguosly within the `depthAxis` dimension whereas
otherwise they are stored interleaved with existing values in the `depthAxis` dimension.

cc: @razarmehr @kulinseth

pytorch-bot · 2023-04-17T10:48:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99306

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Merge Blocking SEVs

There is 1 active merge blocking SEVs. Please view them below:

(merge blocking) GH API issues are preventing multiple ci jobs to start.

If you must merge, use @pytorchbot merge -f.

✅ You can merge normally! (1 Unrelated Failure)

As of commit dc8bab3 with merge base 781b7eb ():

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

linux-focal-rocm5.6-py3.8 / test (default, 2, 3, linux.rocm.gpu, unstable) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aten/src/ATen/native/PixelShuffle.h

alexdremov · 2023-04-17T14:27:48Z

This fails on MacOS 12. Any ideas? Should I just make MPS implementation unavailable on MacOS 12?

kulinseth

Looks great . Can you please refactor the upscale and downscale code for pixel shuffle and unshuffle.

kulinseth · 2023-04-17T14:45:18Z

This fails on MacOS 12. Any ideas? Should I just make MPS implementation unavailable on MacOS 12?

You can use fallback for macOS 12 . The api should work but seems like there are bugs ..

alexdremov · 2023-04-17T15:01:53Z

aten/src/ATen/native/mps/operations/PixelShuffle.mm

+  using CachedGraph = MPSUnaryCachedGraph;
+
+  if (upscale_factor == 1) {
+    return self;


@kulinseth should this return self.clone()? What is expected behaviour?

Figured out it should cause we don't want this:

t = torch.randn(1, 4, 3, 3, device='mps') res = F.pixel_shuffle(t, 1) t -= 10000 # res modified too

alexdremov · 2023-04-17T15:08:54Z

Looks great . Can you please refactor the upscale and downscale code for pixel shuffle and unshuffle.

Could you please clarify which code specifically? Is there something wrong in PixelShuffle.mm?

alexdremov · 2023-04-20T11:29:05Z

@kulinseth any updates?

kulinseth · 2023-04-22T04:00:50Z

aten/src/ATen/native/mps/operations/PixelShuffle.mm

+  out_shape.insert(out_shape.end(), {oc, oh, ow});
+
+  Tensor output = at::empty(out_shape, self.options());
+


I think this can be added...

auto output = at::empty({0}, self.options()); if (output.numel() == 0) { return output; }

kulinseth

Modulo this looks good.

kulinseth · 2023-04-22T04:03:23Z

Looks great . Can you please refactor the upscale and downscale code for pixel shuffle and unshuffle.

Could you please clarify which code specifically? Is there something wrong in PixelShuffle.mm?

I meant if you start the upscale and downscale code, the calculations are same. The Graph and other parts can be refactored:

const int64_t c = self.size(-3);
   const int64_t h = self.size(-2);
   const int64_t w = self.size(-1);
   constexpr auto NUM_NON_BATCH_DIMS = 3;
   const auto self_sizes_batch_end = self.sizes().end() - NUM_NON_BATCH_DIMS;

   const int64_t downscale_factor_squared = downscale_factor * downscale_factor;
   const int64_t oc = c * downscale_factor_squared;
   const int64_t oh = h / downscale_factor;
   const int64_t ow = w / downscale_factor;

   std::vector<int64_t> out_shape(self.sizes().begin(), self_sizes_batch_end);
   out_shape.insert(out_shape.end(), {oc, oh, ow});

I would encourage you move all the code to a helper function and then call it from Shuffle and Unshuffle.

alexdremov · 2023-04-22T09:49:32Z

@pytorchmergebot merge

pytorchmergebot · 2023-04-22T09:51:37Z

Merge failed

Reason: Approval needed from one of the following:
YXIE14, dulinriley, Hangjun, eprivezentsev, govardhan, ...

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

alexdremov · 2023-04-22T12:21:54Z

Merge failed

Reason: Approval needed from one of the following: YXIE14, dulinriley, Hangjun, eprivezentsev, govardhan, ...

Details for Dev Infra team
Raised by workflow job
Failing merge rule: Core Maintainers

@kulinseth, could you tag an appropriate core maintainer, please? ;)

kulinseth

@albanD can you please take a look ?

kulinseth · 2023-04-22T14:51:03Z

Merge failed

Reason: Approval needed from one of the following: YXIE14, dulinriley, Hangjun, eprivezentsev, govardhan, ...
Details for Dev Infra team
Raised by workflow job
Failing merge rule: Core Maintainers

@kulinseth, could you tag an appropriate core maintainer, please? ;)

Done , added @albanD

alexdremov · 2023-04-30T12:55:02Z

@albanD, could you please take a look?

alexdremov · 2023-05-08T20:51:47Z

@albanD, @kulinseth this PR is almost a month old

kulinseth · 2023-05-09T15:17:02Z

@albanD, @kulinseth this PR is almost a month old

Pinged @albanD.

albanD · 2023-05-10T14:03:43Z

aten/src/ATen/native/mps/operations/PixelShuffle.mm

+    TORCH_WARN_ONCE("MPS: pixel_shuffle op is supported starting from macOS 13.0. ",
+                    "Falling back on CPU. This may have performance implications.");
+
+    return at::native::pixel_shuffle_cpu(self.to("cpu"), upscale_factor).clone().to("mps");


Why clone the output here and below?

True, unneeded

albanD · 2023-05-10T14:04:31Z

test/test_mps.py

@@ -934,6 +934,110 @@ def leak_gpu0():
        with self.assertRaisesRegex(RuntimeError, r"MPS driver API confirmed .+"):
            leak_gpu0()

+
+# These tests were taken from test/test_nn.py


We have OpInfo based tests for this. So this should already properly covered?

That’s a good question . This is a good additional test which tests different ranges of upscale_factor . Although we will enable test_nn in near future, then we can remove these duplicates

I meant there is a regular OpInfo:

pytorch/torch/testing/_internal/common_methods_invocations.py

Line 17985 in f542b31

OpInfo(

That one should already be ran by the existing tests in this file no?

If these tests already run, I'll be happy to remove them. Is there a way to check it?

In the CI logs on the mps machine, you can look for nn_functional_pixel_shuffle

Found these lines

Did you find similar ones for unshuffle there?

Yes

test_mps.py::TestConsistencyCPU::test_output_grad_match_nn_functional_pixel_shuffle_cpu_float16 PASSED [0.1346s] [ 21%] test_mps.py::TestConsistencyCPU::test_output_grad_match_nn_functional_pixel_shuffle_cpu_float32 PASSED [0.1138s] [ 21%] test_mps.py::TestConsistencyCPU::test_output_grad_match_nn_functional_pixel_unshuffle_cpu_float16 PASSED [0.0176s] [ 21%] test_mps.py::TestConsistencyCPU::test_output_grad_match_nn_functional_pixel_unshuffle_cpu_float32 PASSED [0.0179s] [ 21%]

Do these already cover this test you just copied here? (and the test in test_nn is just redundant)
Or they test different things?

They're different

IMO it's a bid redundant, but can be done as a followup PR

@alexdremov can you please remove the test_nn copied test in this PR to avoid duplication? We are planning to enable test_nn testing with 'mps' device soon and will have this enabled there.

kulinseth · 2023-07-19T07:27:32Z

@alexdremov , can we resurrect this PR ?

kulinseth · 2023-07-19T07:27:41Z

@pytorchbot rebase

pytorchmergebot · 2023-07-19T07:29:52Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2023-07-19T07:30:04Z

Successfully rebased mps_channel_shuffle onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout mps_channel_shuffle && git pull --rebase)

alexdremov · 2023-07-21T06:00:56Z

@alexdremov , can we resurrect this PR ?

@kulinseth Seems like code side is fine and it successfully rebased.

There’s only @malfet questions that need to be checked. I

#99306 (review)

alexdremov · 2023-08-26T11:42:18Z

@pytorchbot rebase

pytorchmergebot · 2023-08-26T11:44:40Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2023-08-26T11:44:51Z

Successfully rebased mps_channel_shuffle onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout mps_channel_shuffle && git pull --rebase)

alexdremov · 2023-08-26T13:36:22Z

@kulinseth I've ressurected this PR. Merging?

I added non-contiguous tests
Added style fixes noted above
This works with integers. Tests run checks for various dtypes:

test/test_mps.py::TestPixelShuffle::test_pixel_shuffle_unshuffle PASSED [5.7919s]                                                                                                                        [  4%]
test/test_mps.py::TestConsistencyCPU::test_output_grad_match_nn_functional_pixel_shuffle_cpu_float16 PASSED [0.0728s]                                                                                    [  9%]
test/test_mps.py::TestConsistencyCPU::test_output_grad_match_nn_functional_pixel_shuffle_cpu_float32 PASSED [0.0689s]                                                                                    [ 14%]
test/test_mps.py::TestConsistencyCPU::test_output_grad_match_nn_functional_pixel_unshuffle_cpu_float16 PASSED [0.0044s]                                                                                  [ 19%]
test/test_mps.py::TestConsistencyCPU::test_output_grad_match_nn_functional_pixel_unshuffle_cpu_float32 PASSED [0.0041s]                                                                                  [ 23%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_shuffle_cpu_bool PASSED [0.0355s]                                                                                            [ 28%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_shuffle_cpu_float16 PASSED [0.0025s]                                                                                         [ 33%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_shuffle_cpu_float32 PASSED [0.0027s]                                                                                         [ 38%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_shuffle_cpu_int16 PASSED [0.0375s]                                                                                           [ 42%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_shuffle_cpu_int32 PASSED [0.0357s]                                                                                           [ 47%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_shuffle_cpu_int64 PASSED [0.0360s]                                                                                           [ 52%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_shuffle_cpu_int8 PASSED [0.0346s]                                                                                            [ 57%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_shuffle_cpu_uint8 PASSED [0.0369s]                                                                                           [ 61%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_unshuffle_cpu_bool PASSED [0.0360s]                                                                                          [ 66%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_unshuffle_cpu_float16 PASSED [0.0023s]                                                                                       [ 71%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_unshuffle_cpu_float32 PASSED [0.0020s]                                                                                       [ 76%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_unshuffle_cpu_int16 PASSED [0.0371s]                                                                                         [ 80%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_unshuffle_cpu_int32 PASSED [0.0383s]                                                                                         [ 85%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_unshuffle_cpu_int64 PASSED [0.0357s]                                                                                         [ 90%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_unshuffle_cpu_int8 PASSED [0.0344s]                                                                                          [ 95%]
test/test_mps.py::TestConsistencyCPU::test_output_match_nn_functional_pixel_unshuffle_cpu_uint8 PASSED [0.0352s]                                                                                         [100%]

alexdremov · 2023-08-30T19:55:42Z

@kulinseth could you take a look please? Anything else to fix?

alexdremov · 2023-09-06T09:09:22Z

@kulinseth any updates?

alexdremov · 2023-09-06T09:09:56Z

@pytorchbot merge

pytorchmergebot · 2023-09-06T09:11:32Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

alexdremov requested a review from kulinseth as a code owner April 17, 2023 10:48

pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Apr 17, 2023

alexdremov commented Apr 17, 2023

View reviewed changes

aten/src/ATen/native/PixelShuffle.h Show resolved Hide resolved

pytorchbot added the open source label Apr 17, 2023

alexdremov force-pushed the mps_channel_shuffle branch from 4534efc to 6acce78 Compare April 17, 2023 12:09

kulinseth reviewed Apr 17, 2023

View reviewed changes

alexdremov commented Apr 17, 2023

View reviewed changes

alexdremov requested a review from kulinseth April 17, 2023 15:26

kulinseth reviewed Apr 22, 2023

View reviewed changes

kulinseth approved these changes Apr 22, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 22, 2023

pytorchmergebot added the merging label Apr 22, 2023

kulinseth reviewed Apr 22, 2023

View reviewed changes

kulinseth requested a review from albanD April 22, 2023 14:50

albanD reviewed May 10, 2023

View reviewed changes

alexdremov force-pushed the mps_channel_shuffle branch from 5f21e98 to ee3aee8 Compare May 11, 2023 07:55

github-actions bot added the Stale label Jul 15, 2023

pytorchmergebot force-pushed the mps_channel_shuffle branch from d161002 to 1abfa6c Compare July 19, 2023 07:30

github-actions bot closed this Aug 20, 2023

kulinseth reopened this Aug 21, 2023

alexdremov added 7 commits August 26, 2023 11:44

mps pixel shuffle support

b497fba

macos12 fallback

8bd5592

macos12 fallback

9e32191

missing clones

c14d259

refactor

a4ca43b

remove clones

f89dc4b

submodules

b6be7b6

pytorchmergebot force-pushed the mps_channel_shuffle branch from 1abfa6c to b6be7b6 Compare August 26, 2023 11:44

fixes

dc8bab3

pytorchmergebot added Merged and removed merging labels Sep 6, 2023

pytorchmergebot closed this in b60273b Sep 6, 2023

		out_shape.insert(out_shape.end(), {oc, oh, ow});

		Tensor output = at::empty(out_shape, self.options());

[MPS] Pixel shuffle unshuffle support #99306

[MPS] Pixel shuffle unshuffle support #99306

Conversation

alexdremov commented Apr 17, 2023 • edited

pytorch-bot bot commented Apr 17, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99306

❗ 1 Merge Blocking SEVs

✅ You can merge normally! (1 Unrelated Failure)

alexdremov commented Apr 17, 2023

kulinseth left a comment

Choose a reason for hiding this comment

kulinseth commented Apr 17, 2023

Choose a reason for hiding this comment

alexdremov Apr 17, 2023 • edited

Choose a reason for hiding this comment

alexdremov commented Apr 17, 2023

alexdremov commented Apr 20, 2023

Choose a reason for hiding this comment

kulinseth left a comment

Choose a reason for hiding this comment

kulinseth commented Apr 22, 2023

alexdremov commented Apr 22, 2023

pytorchmergebot commented Apr 22, 2023

Merge failed

alexdremov commented Apr 22, 2023

Merge failed

kulinseth left a comment

Choose a reason for hiding this comment

kulinseth commented Apr 22, 2023

Merge failed

alexdremov commented Apr 30, 2023

alexdremov commented May 8, 2023

kulinseth commented May 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kulinseth commented Jul 19, 2023

kulinseth commented Jul 19, 2023

pytorchmergebot commented Jul 19, 2023

pytorchmergebot commented Jul 19, 2023

alexdremov commented Jul 21, 2023

alexdremov commented Aug 26, 2023

pytorchmergebot commented Aug 26, 2023

pytorchmergebot commented Aug 26, 2023

alexdremov commented Aug 26, 2023 • edited

alexdremov commented Aug 30, 2023

alexdremov commented Sep 6, 2023

alexdremov commented Sep 6, 2023

pytorchmergebot commented Sep 6, 2023

Merge started

alexdremov commented Apr 17, 2023 •

edited

pytorch-bot bot commented Apr 17, 2023 •

edited

alexdremov Apr 17, 2023 •

edited

alexdremov commented Aug 26, 2023 •

edited