Support for XNNPACK max pooling operator. #33766

AshkanAliabadi · 2020-02-25T18:07:17Z

Add support for XNNPACK 2D max pool operator. The operator is enabled as a result of integration into at::max_pool2d(...), itself registered through native_functions.yaml.

Test Plan: CI

facebook-github-bot

@AshkanAliabadi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

dr-ci · 2020-02-25T18:37:33Z

💊 CircleCI build failures summary and remediations

As of commit 24f19b5 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following build failures do not appear to be due to upstream breakages (reran 1 job to discount flakiness):

pytorch_xla_linux_xenial_py3_6_clang7_test (1/1)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Mar 16 01:43:56 unknown file: Failure

Mar 16 01:38:38 [       OK ] AtenXlaTensorTest.TestAdaptiveAvgPool2DNoBatchBackward (300 ms) 
Mar 16 01:38:38 [ RUN      ] AtenXlaTensorTest.TestConv2DBackward 
Mar 16 01:40:20 [       OK ] AtenXlaTensorTest.TestConv2DBackward (101956 ms) 
Mar 16 01:40:20 [ RUN      ] AtenXlaTensorTest.TestTransposedConv2DBackward 
Mar 16 01:41:03 [       OK ] AtenXlaTensorTest.TestTransposedConv2DBackward (42557 ms) 
Mar 16 01:41:03 [ RUN      ] AtenXlaTensorTest.TestConv3DBackward 
Mar 16 01:42:41 [       OK ] AtenXlaTensorTest.TestConv3DBackward (97983 ms) 
Mar 16 01:42:41 [ RUN      ] AtenXlaTensorTest.TestTransposedConv3DBackward 
Mar 16 01:43:56 [       OK ] AtenXlaTensorTest.TestTransposedConv3DBackward (75418 ms) 
Mar 16 01:43:56 [ RUN      ] AtenXlaTensorTest.TestMaxPool2DBackward 
Mar 16 01:43:56 unknown file: Failure 
Mar 16 01:43:56 C++ exception with description "element 0 of tensors does not require grad and does not have a grad_fn0 (run_backward at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd.cpp:74) 
Mar 16 01:43:56 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x4a (0x7fe72d66d6da in /var/lib/jenkins/workspace/torch/lib/libc10.so) 
Mar 16 01:43:56 frame #1: <unknown function> + 0x30e4f33 (0x7fe71bcb4f33 in /var/lib/jenkins/workspace/torch/lib/libtorch_cpu.so) 
Mar 16 01:43:56 frame #2: torch::autograd::backward(std::vector<at::Tensor, std::allocator<at::Tensor> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, c10::optional<bool>, bool) + 0x72 (0x7fe71bcb5572 in /var/lib/jenkins/workspace/torch/lib/libtorch_cpu.so) 
Mar 16 01:43:56 frame #3: <unknown function> + 0x347855e (0x7fe71c04855e in /var/lib/jenkins/workspace/torch/lib/libtorch_cpu.so) 
Mar 16 01:43:56 frame #4: void c10::KernelFunction::callUnboxed<void, at::Tensor const&, at::Tensor const&, bool, bool>(c10::OperatorHandle const&, at::Tensor const&, at::Tensor const&, bool, bool) const + 0x113 (0x57ca33 in ./test_ptxla) 
Mar 16 01:43:56 frame #5: torch_xla::cpp_test::TestBackward(std::vector<at::Tensor, std::allocator<at::Tensor> > const&, c10::Device const&, std::function<at::Tensor (std::vector<at::Tensor, std::allocator<at::Tensor> > const&)> const&, double, double) + 0x604 (0x5723c4 in ./test_ptxla) 
Mar 16 01:43:56 frame #6: ./test_ptxla() [0x6d1241] 
Mar 16 01:43:56 frame #7: torch_xla::cpp_test::ForEachDevice(std::function<void (c10::Device const&)> const&) + 0x2d (0x56f5dd in ./test_ptxla) 
Mar 16 01:43:56 frame #8: torch_xla::cpp_test::AtenXlaTensorTest_TestMaxPool2DBackward_Test::TestBody() + 0x97 (0x624d27 in ./test_ptxla)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 74 times.

facebook-github-bot

@AshkanAliabadi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

kimishpatel · 2020-03-11T14:48:23Z

aten/src/ATen/native/xnnpack/Pooling.cpp

Is ceil_mode used somewhere by XNNPACK for maxpool op? PT documentation suggests that it is used to compute output shape, https://pytorch.org/docs/stable/nn.html#torch.nn.MaxPool2d.
It it is not used then dont we need to figure out what output shape XNNPACK wants and be sure to provide buffer with that shape?
And also impose the restriction on whether maxpool2d can be mapped to xnnpack or not depending on the value of ceil_mode?

I wonder if the seg faults in tests are related to the ceil_mode stuff.

Right, maybe. I'll investigate, thanks.

kimishpatel · 2020-03-11T14:50:24Z

aten/src/ATen/native/Pooling.cpp

Why are we directly exposing this here? I thought, based on conv and linear, our philosophy was going to be not exposing this directly but require explicit opt-in via "xnnpackify.." short of transform of the network. @dreiss for comment.

Also note that, this will also make it a little harder to fuse maxpool + relu.

Binding it directly is good as it directly benefits existing models. It doesn't preclude us from doing more optimizations also like fusion you described.

I agree. With Dima. We can have a separate diff that exposes the fused version.

kimishpatel

Thanks Ashkan. Overall looks great. I have left a couple of comments.

dzhulgakov

Looks good from overall point of view!

dzhulgakov · 2020-03-11T19:04:59Z

aten/src/ATen/native/Pooling.cpp

Binding it directly is good as it directly benefits existing models. It doesn't preclude us from doing more optimizations also like fusion you described.

ngimel · 2020-03-11T19:22:22Z

aten/src/ATen/native/xnnpack/Pooling.cpp

how does this work when kernel.size()==1? Layout::Parameter::width is 1, and you can't have kernel[1]. Ah, I see, you are expanding kernel in create, but then check for kernel.size() should be different.

I'm sorry I didn't get your point about kernel.size() being different, can you elaborate? Yes, I'm allowing 1 and 2 but expanding to 2 prior to use.

If you require 2, then you should not allow 1, and looks like you require 2, because otherwise kernel[1] will segfault.

Aaah ... how could I have missed that?! :/ Thanks.

ngimel · 2020-03-11T19:23:32Z

aten/src/ATen/native/xnnpack/Pooling.cpp

nit: you'll probably need similar structure for AvgPool and other kinds of pooling, so it makes sens not to put it in max_pool2d namespace

Thanks, I'll move it in a future patch after Kimish merges his. The only place I can currently put it right now is inside Common.h which is starting to become a cluttered mess.

OK, done now that Kimish's patch is merged.

dreiss

Integration looks good. Should have some unit tests added for this.

dreiss · 2020-03-11T20:01:22Z

aten/src/ATen/native/Pooling.cpp

I agree. With Dima. We can have a separate diff that exposes the fused version.

dreiss · 2020-03-11T20:02:19Z

aten/src/ATen/native/xnnpack/Convolution.cpp

These changes should probably be a separate diff.

Yes please. Can you remove these changes? Else I or you will run into some merge conflict.

OK, no worries, will revert.

dreiss · 2020-03-11T20:05:29Z

aten/src/ATen/native/xnnpack/Pooling.cpp

If output channels is always equal to input channels, and you assert later that it always will be equal to input channels, why have it as a separate parameter?

dreiss · 2020-03-11T20:10:16Z

aten/src/ATen/native/xnnpack/Pooling.cpp

Message needs to be updated. Also, shouldn't this be checked when the context is created?

Updated the message.

The input tensor is not available at creation time. available() checks any parameter provided at creation time while usable gates anything that depends on the input tensor.

dreiss · 2020-03-11T20:10:59Z

aten/src/ATen/native/xnnpack/Pooling.cpp

Do we need to check that input_nhwc was created with a guarding allocator, or is that only needed for the output?

Yes, this should use allocate_padded_if_needed() in Kimish's patch: only re-allocate input if not already allocated with this allocator.

dreiss · 2020-03-11T20:12:08Z

aten/src/ATen/native/xnnpack/Pooling.cpp

Why not just return output_nhwc?

I think PyTorch's convention is to return tensors in the same layout they came in so we'll have to switch back to NCHW if that's the layout the input tensor was in. Dima / Natalia can confirm.

If that's the convention, then it makes sense to stick with it.

We should make sure that if we're returning NHWC, we're doing so with a guarding allocator to get maximum performance from a sequence of XNNPACK ops.

contiguous is a no-op if memory is already in the requested layout. In other words, if the input tensor is in NHWC, contiguous will short circuit.

dreiss · 2020-03-11T20:13:10Z

aten/src/ATen/native/xnnpack/Pooling.cpp

This seems like too tight of a coupling. Can you add a separate check for dim==4 instead?

AshkanAliabadi · 2020-03-11T23:31:19Z

Thanks for the comments. Will address and upload a new patch.

facebook-github-bot

@AshkanAliabadi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

AshkanAliabadi · 2020-03-14T00:09:56Z

aten/src/ATen/native/xnnpack/Factory.cpp

Note: Had to add this to prevent a test in test/test_namedtensor.py from failing. Said test expects tensor names to propagate.

facebook-github-bot

@AshkanAliabadi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

dreiss

Can you separate the named tensor and formatting/comment changes into separate diffs?

dreiss · 2020-03-20T00:51:25Z

aten/src/ATen/native/xnnpack/MaxPooling.cpp

+  const Tensor input_nhwc = input.contiguous(MemoryFormat::ChannelsLast);
+  const Tensor padded_input_nhwc = allocate_padded_if_needed(input_nhwc);


This is potentially two copies. Can we combine these into one?

dreiss · 2020-03-20T00:52:48Z

aten/src/ATen/native/xnnpack/MaxPooling.cpp

+  return max_pool2d::available(
+        input.size(Layout::Activation4D::channels),
+        parameters.kernel,
+        parameters.padding,
+        parameters.stride,
+        parameters.dilation,
+        ceil_mode,
+        internal::max_pool2d::Context::kMin,
+        internal::max_pool2d::Context::kMax) &&
+     max_pool2d::usable(


Why do we need available and usable as separate functions?

dreiss · 2020-03-20T00:56:07Z

aten/src/ATen/native/xnnpack/MaxPooling.cpp

+Tensor create_and_run(
+    const Tensor& input,
+    const IntArrayRef kernel,
+    const IntArrayRef padding,
+    const IntArrayRef stride,
+    const IntArrayRef dilation,
+    const bool ceil_mode,
+    const float output_min,
+    const float output_max) {
+  using namespace internal;
+
+  return internal::max_pool2d::run(
+      internal::max_pool2d::create(
+          input.size(Layout::Activation4D::channels),
+          kernel,
+          padding,
+          stride,
+          dilation,
+          ceil_mode,
+          output_min,
+          output_max),
+      input);
+}


This seems like an unnecessary layer of abstraction since we're not persisting the Context. If you inline both create and run directly into max_pool2d, can you eliminate Context and shorten/simplify the entire diff?

Don't we want to maintain the ability to separate create and run in the future?

Why? We don't expect it will ever have a significant perf improvement over this implementation, right?

…95. (#35081) Summary: Required to fix a build issue in #33766. Pull Request resolved: #35081 Reviewed By: dreiss Differential Revision: D20567230 Pulled By: AshkanAliabadi fbshipit-source-id: 1ed61708851402f60b80abc818ae7330e43adb83

AshkanAliabadi · 2020-03-25T03:00:35Z

Breaking PR into smaller chunks per David's request. #35354. Closing.

facebook-github-bot reviewed Feb 25, 2020

View reviewed changes

facebook-github-bot reviewed Feb 26, 2020

View reviewed changes

AshkanAliabadi requested review from dreiss, dzhulgakov, kimishpatel and ngimel March 11, 2020 05:26

kimishpatel reviewed Mar 11, 2020

View reviewed changes

kimishpatel requested changes Mar 11, 2020

View reviewed changes

dzhulgakov reviewed Mar 11, 2020

View reviewed changes

ngimel reviewed Mar 11, 2020

View reviewed changes

dreiss requested changes Mar 11, 2020

View reviewed changes

facebook-github-bot reviewed Mar 13, 2020

View reviewed changes

AshkanAliabadi commented Mar 14, 2020

View reviewed changes

facebook-github-bot reviewed Mar 15, 2020

View reviewed changes

Support for XNNPACK max pooling operator.

24f19b5

dreiss requested changes Mar 20, 2020

View reviewed changes

This was referenced Mar 20, 2020

Upstream cpuinfo updates in XNNPACK as of XNNPACK Git revision c58bd34 pytorch/cpuinfo#33

Merged

Update XNNPACK to Git revision 1b354636b5942826547055252f3b359b54acff95. #35081

Closed

AshkanAliabadi closed this Mar 25, 2020

		const Tensor input_nhwc = input.contiguous(MemoryFormat::ChannelsLast);
		const Tensor padded_input_nhwc = allocate_padded_if_needed(input_nhwc);

Support for XNNPACK max pooling operator. #33766

Support for XNNPACK max pooling operator. #33766

Uh oh!

Conversation

AshkanAliabadi commented Feb 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

dr-ci bot commented Feb 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_xla_linux_xenial_py3_6_clang7_test (1/1)

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kimishpatel left a comment

Choose a reason for hiding this comment

Uh oh!

dzhulgakov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dreiss left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AshkanAliabadi commented Feb 25, 2020 •

edited

Loading

dr-ci bot commented Feb 25, 2020 •

edited

Loading

AshkanAliabadi Mar 13, 2020 •

edited

Loading