add channels last for MaxPool2d #48917

mingfeima · 2020-12-07T04:42:24Z

Stack from ghstack:

add channels last support for ConvTranspose2d #51185 add channels last support for ConvTranspose2d
add channels last support for PixelShuffle and PixelUnshuffle #50573 add channels last support for PixelShuffle and PixelUnshuffle
add channels last support for ChannelShuffle #50247 add channels last support for ChannelShuffle
add channel last support for MaxUnpool2d #49984 add channel last support for MaxUnpool2d
add channels last for GroupNorm #49821 add channels last for GroupNorm
add channels last support for thnn_conv2d (non-dilated) #49582 add channels last support for thnn_conv2d (non-dilated)
add channels last for AdapativeMaxPool2d #48920 add channels last for AdapativeMaxPool2d
optimize channels last for BatchNorm2d on CPU #48919 optimize channels last for BatchNorm2d on CPU
add channels last support for AvgPool2d on CPU #48918 add channels last support for AvgPool2d on CPU
add channels last for MaxPool2d #48917 add channels last for MaxPool2d

max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

Differential Revision: D25399470

max_pool2d channels last support forward path max_pool2d channels last support backward path vectorize channels last forward path rename the header file fix windows build combine PoolingKernel.h into Pool.h add data type check loosen test_max_pool2d_nhwc to cover device CPU [ghstack-poisoned]

mingfeima · 2020-12-07T05:06:35Z

use this one to replace #42719

This patch adds channels last memory format support for nn.MaxPool2d on CPU, CL path is manually vectorized with vec256 on dimension of C.

Performance result on CPU Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz, 2*20 cores, comparing CL v.s. contiguous:

single core inference (BS=1): 9.7x speedup
single socket inference (BS=1): 3.3x speedup
single socket inference (BS=128): 2.6x speedup

input size is picked up from RN50. Use this script to reproduce, ./run.sh max_pool2d.py.

### (before)
### using OMP_NUM_THREADS=1
MaxPool2d(contiguous): input_size [1, 64, 112, 112] time: 5.317 ms

### using OMP_NUM_THREADS=20
MaxPool2d(contiguous): input_size [1, 64, 112, 112] time: 6.443 ms
MaxPool2d(contiguous): input_size [128, 64, 112, 112] time: 45.178 ms

### (after)
### using OMP_NUM_THREADS=1
MaxPool2d(contiguous):   input_size [1, 64, 112, 112] time: 5.304 ms
MaxPool2d(channels_last): input_size [1, 64, 112, 112] time: 0.544 ms

### using OMP_NUM_THREADS=20

MaxPool2d(contiguous):   input_size [1, 64, 112, 112] time: 0.336 ms
MaxPool2d(channels_last): input_size [1,   64, 112, 112] time: 0.102 ms
MaxPool2d(contiguous): input_size [128,   64, 112, 112] time: 42.141 ms
MaxPool2d(channels_last): input_size   [128, 64, 112, 112] time: 15.897 ms

VitalyFedyunin · 2020-12-09T21:47:03Z

Fails internally on Android VR build:

Summary: 
[removed]/caffe2/aten/src/ATen/native/cpu/MaxPoolKernel.cpp:172:57: error: no member named 'isnan' in 'at::vec256::(anonymous namespace)::Vec256<float>'
            Vec mask = (val_vec > maxval_vec) | val_vec.isnan();
                                                ~~~~~~~ ^
stderr: [removed]/caffe2/aten/src/ATen/native/cpu/MaxPoolKernel.cpp:172:57: error: no member named 'isnan' in 'at::vec256::(anonymous namespace)::Vec256<float>'
            Vec mask = (val_vec > maxval_vec) | val_vec.isnan();
                                                ~~~~~~~ ^
 ** Summary of failures encountered during the build **
Rule [removed]
 FAILED because Command failed with exit code 1.

Due to android build failures

max_pool2d channels last support forward path max_pool2d channels last support backward path vectorize channels last forward path rename the header file fix windows build combine PoolingKernel.h into Pool.h add data type check loosen test_max_pool2d_nhwc to cover device CPU Differential Revision: [D25399470](https://our.internmc.facebook.com/intern/diff/D25399470) [ghstack-poisoned]

mingfeima · 2020-12-31T02:16:21Z

Fails internally on Android VR build:

Summary: 
[removed]/caffe2/aten/src/ATen/native/cpu/MaxPoolKernel.cpp:172:57: error: no member named 'isnan' in 'at::vec256::(anonymous namespace)::Vec256<float>'
            Vec mask = (val_vec > maxval_vec) | val_vec.isnan();
                                                ~~~~~~~ ^
stderr: [removed]/caffe2/aten/src/ATen/native/cpu/MaxPoolKernel.cpp:172:57: error: no member named 'isnan' in 'at::vec256::(anonymous namespace)::Vec256<float>'
            Vec mask = (val_vec > maxval_vec) | val_vec.isnan();
                                                ~~~~~~~ ^
 ** Summary of failures encountered during the build **
Rule [removed]
 FAILED because Command failed with exit code 1.

@VitalyFedyunin, the android build failure has been fixed, please check!

max_pool2d channels last support forward path max_pool2d channels last support backward path vectorize channels last forward path rename the header file fix windows build combine PoolingKernel.h into Pool.h add data type check loosen test_max_pool2d_nhwc to cover device CPU Differential Revision: [D25399470](https://our.internmc.facebook.com/intern/diff/D25399470) [ghstack-poisoned]

VitalyFedyunin

Hello! I finished with OneDDN PRs and moving to this stack. Meanwhile can you please add vec256 test for new function.

aten/src/ATen/cpu/vec256/vec256_base.h

max_pool2d channels last support forward path max_pool2d channels last support backward path vectorize channels last forward path rename the header file fix windows build combine PoolingKernel.h into Pool.h add data type check loosen test_max_pool2d_nhwc to cover device CPU Differential Revision: [D25399470](https://our.internmc.facebook.com/intern/diff/D25399470) [ghstack-poisoned]

VitalyFedyunin · 2021-03-31T22:05:35Z

Sorry, merge conflicts with just landed #54898 , please rebase again

max_pool2d channels last support forward path max_pool2d channels last support backward path vectorize channels last forward path rename the header file fix windows build combine PoolingKernel.h into Pool.h add data type check loosen test_max_pool2d_nhwc to cover device CPU Differential Revision: [D25399470](https://our.internmc.facebook.com/intern/diff/D25399470) [ghstack-poisoned]

mingfeima · 2021-04-01T06:46:27Z

@VitalyFedyunin updated!

facebook-github-bot · 2021-04-02T16:14:49Z

@VitalyFedyunin merged this pull request in f43eb59.

agolynski · 2021-04-02T17:13:37Z

Seems like this broke ios build MaxPoolKernel.cpp

facebook-github-bot · 2021-04-02T17:16:52Z

This pull request has been reverted by 978fca6.

VitalyFedyunin · 2021-04-02T19:00:24Z

Broken build url https://app.circleci.com/pipelines/github/pytorch/pytorch/295279/workflows/5f586a11-d066-46a7-a085-9b72258ce6bc/jobs/12076475

mingfeima · 2021-04-08T12:41:31Z

Broken build url https://app.circleci.com/pipelines/github/pytorch/pytorch/295279/workflows/5f586a11-d066-46a7-a085-9b72258ce6bc/jobs/12076475

working on it.

mingfeima · 2021-04-12T07:32:00Z

I can't reproduce the build failure on this patch on iOS.
I followed the instruction from https://pytorch.org/mobile/ios/#build-libtorch-for-arm64-devices

BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR ./scripts/build_ios.sh

I built on both iOS and MacOS and both succeeded.
My local environment is

-- Toolchain using default iOS SDK: /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS14.4.sdk
-- The CXX compiler identification is AppleClang 12.0.0.12000032
-- The C compiler identification is AppleClang 12.0.0.12000032

About the failure from https://app.circleci.com/pipelines/github/pytorch/pytorch/295279/workflows/5f586a11-d066-46a7-a085-9b72258ce6bc/jobs/12076475

The thing that goes wrong is MaxPoolKernel.cpp has been compiled twice

the first time succeeded, MaxPoolKernel.cpp is a kernel file from ATen/native/cpu directory and it has been compiled successfully (with .DEFAULT) at

Apr 02 16:27:52 [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPoolKernel.cpp.DEFAULT.cpp.o

second time goes wrong, this should not exist

Apr 02 16:33:41 [ 93%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPoolKernel.cpp.o

I don't understand why both MaxPoolKernel.cpp and MaxPoolKernel.cpp.DEFAULT.cpp are compiled.

@VitalyFedyunin About the failed CI, is there any special build recipe for it?

max_pool2d channels last support forward path max_pool2d channels last support backward path vectorize channels last forward path rename the header file fix windows build combine PoolingKernel.h into Pool.h add data type check loosen test_max_pool2d_nhwc to cover device CPU ghstack-source-id: 039bb43d75a3df01fdc47cb53c96d6961cfda3f0 Pull Request resolved: pytorch#48917

malfet · 2021-04-13T17:56:56Z

aten/src/ATen/native/Pool.h

+    int kW, int kH, int dW, int dH, int padW, int padH, int dilationW, int dilationH);
+using max_pool2d_backward_fn = void(*)(Tensor& grad_input, const Tensor& grad_output, const Tensor& indices);
+
+DECLARE_DISPATCH(max_pool2d_fn, max_pool2d_kernel);


I think this PR misses a DEFINE_DISPATCH call in one of .cpp files

aten/src/ATen/cpu/vec256/vec256_float_neon.h

malfet · 2021-04-13T18:03:54Z

aten/src/ATen/native/cpu/MaxPoolKernel.cpp

+    int64_t ow = 0;
+    data_index_init(begin, c, channels, oh, output_height, ow, output_width);
+
+    for (int64_t i = begin; i < end; i++) {


Can c10::irange be used here?

VitalyFedyunin · 2021-04-13T18:46:29Z

@mingfeima lets try to figure out what is going on here. Could you please create this change as separate PR (do NOT abandon stack) and add ci-all tag to it.

Also please add aten/src/ATen/native/cpu/MaxPoolKernel.cpp into the tools/build_variables.bzl

malfet · 2021-04-13T19:52:36Z

I can't reproduce the build failure on this patch on iOS.
I followed the instruction from https://pytorch.org/mobile/ios/#build-libtorch-for-arm64-devices

This only happens when Lite interpreter is build, i.e. when BUILD_LITE_INTERPRETER=1
And to fix the problem, add MaxPoolKernel.cpp to

pytorch/tools/build_variables.bzl

Line 759 in bbdb37b

aten_native_source_codegen_list = [

@cccclai Can you please explain why this must be an explicit list instead of glob?

cccclai · 2021-04-13T20:15:25Z

Apr 02 16:33:41 [ 93%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPoolKernel.cpp.o

Yes the recipe is to build lite interpreter is here: https://pytorch.org/tutorials/prototype/lite_interpreter.html. To build for ios, the command is:

BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR BUILD_LITE_INTERPRETER=1 ./scripts/build_ios.sh

cccclai · 2021-04-13T20:16:41Z

I can't reproduce the build failure on this patch on iOS.
I followed the instruction from https://pytorch.org/mobile/ios/#build-libtorch-for-arm64-devices

This only happens when Lite interpreter is build, i.e. when BUILD_LITE_INTERPRETER=1
And to fix the problem, add MaxPoolKernel.cpp to

pytorch/tools/build_variables.bzl

Line 759 in bbdb37b

aten_native_source_codegen_list = [

@cccclai Can you please explain why this must be an explicit list instead of glob?

The main reason is to pick the files needed for pytorch edge, such that we can build a library with smaller size (lite interpreter). Some resources from aten are not needed when deploying for mobile.

max_pool2d channels last support forward path max_pool2d channels last support backward path vectorize channels last forward path rename the header file fix windows build combine PoolingKernel.h into Pool.h add data type check loosen test_max_pool2d_nhwc to cover device CPU ghstack-source-id: 2f2487b881ab8080b261d07fc87c46fcef59b8a0 Pull Request resolved: #48917

max_pool2d channels last support forward path max_pool2d channels last support backward path vectorize channels last forward path rename the header file fix windows build combine PoolingKernel.h into Pool.h add data type check loosen test_max_pool2d_nhwc to cover device CPU ghstack-source-id: 2f2487b881ab8080b261d07fc87c46fcef59b8a0 Pull Request resolved: pytorch#48917

Summary: add channels last support for MaxPool2d. this one is a replacement of #48917 Pull Request resolved: #56361 Reviewed By: heitorschueroff Differential Revision: D27874142 Pulled By: VitalyFedyunin fbshipit-source-id: bc9604def9c974d7b59621fc709a39948088b992

Summary: add channels last support for MaxPool2d. this one is a replacement of pytorch#48917 Pull Request resolved: pytorch#56361 Reviewed By: heitorschueroff Differential Revision: D27874142 Pulled By: VitalyFedyunin fbshipit-source-id: bc9604def9c974d7b59621fc709a39948088b992

facebook-github-bot added the cla signed label Dec 7, 2020

This was referenced Dec 7, 2020

add channels last for AdaptiveAvgPool2d #48916

Closed

add channels last support for AvgPool2d on CPU #48918

Closed

optimize channels last for BatchNorm2d on CPU #48919

Closed

add channels last for AdapativeMaxPool2d #48920

Closed

pytorchbot added the open source label Dec 7, 2020

mingfeima mentioned this pull request Dec 7, 2020

add channels last support for MaxPool2d on CPU path #42719

Closed

VitalyFedyunin self-requested a review December 8, 2020 17:19

VitalyFedyunin previously approved these changes Dec 8, 2020

View reviewed changes

mingfeima mentioned this pull request Dec 18, 2020

add channels last support for thnn_conv2d (non-dilated) #49582

Closed

mingfeima mentioned this pull request Dec 24, 2020

add channels last for GroupNorm #49821

Closed

mingfeima mentioned this pull request Dec 31, 2020

add channel last support for MaxUnpool2d #49984

Closed

mingfeima added 3 commits January 1, 2021 08:55

mingfeima mentioned this pull request Jan 8, 2021

add channels last support for ChannelShuffle #50247

Closed

mingfeima mentioned this pull request Jan 15, 2021

add channels last support for PixelShuffle and PixelUnshuffle #50573

Closed

mingfeima added 3 commits January 18, 2021 10:01

VitalyFedyunin reviewed Jan 22, 2021

View reviewed changes

aten/src/ATen/cpu/vec256/vec256_base.h Show resolved Hide resolved

mingfeima added 2 commits January 25, 2021 15:07

mingfeima mentioned this pull request Jan 27, 2021

add channels last support for ConvTranspose2d #51185

Closed

facebook-github-bot closed this in f43eb59 Apr 2, 2021

facebook-github-bot added the Merged label Apr 2, 2021

facebook-github-bot added the Reverted label Apr 2, 2021

facebook-github-bot deleted the gh/mingfeima/3/head branch April 6, 2021 14:17

mingfeima mentioned this pull request Apr 8, 2021

add channels last (2d) support for mkldnn_convolution #55584

Closed

mingfeima mentioned this pull request Apr 13, 2021

enable BFloat16 mkldnn_convolution on both contiguous and channels last memory format #55864

Closed

malfet reviewed Apr 13, 2021

View reviewed changes

aten/src/ATen/cpu/vec256/vec256_float_neon.h Show resolved Hide resolved

malfet reviewed Apr 13, 2021

View reviewed changes

mingfeima mentioned this pull request Apr 19, 2021

add channels last for MaxPool2d #56361

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add channels last for MaxPool2d #48917

add channels last for MaxPool2d #48917

mingfeima commented Dec 7, 2020 •

edited

mingfeima commented Dec 7, 2020

VitalyFedyunin commented Dec 9, 2020

mingfeima commented Dec 31, 2020 •

edited

VitalyFedyunin left a comment

VitalyFedyunin commented Mar 31, 2021

mingfeima commented Apr 1, 2021

facebook-github-bot commented Apr 2, 2021

agolynski commented Apr 2, 2021

facebook-github-bot commented Apr 2, 2021

VitalyFedyunin commented Apr 2, 2021

mingfeima commented Apr 8, 2021

mingfeima commented Apr 12, 2021

malfet Apr 13, 2021

malfet Apr 13, 2021

VitalyFedyunin commented Apr 13, 2021

malfet commented Apr 13, 2021 •

edited

cccclai commented Apr 13, 2021

cccclai commented Apr 13, 2021

add channels last for MaxPool2d #48917

add channels last for MaxPool2d #48917

Conversation

mingfeima commented Dec 7, 2020 • edited

mingfeima commented Dec 7, 2020

VitalyFedyunin commented Dec 9, 2020

mingfeima commented Dec 31, 2020 • edited

VitalyFedyunin left a comment

Choose a reason for hiding this comment

VitalyFedyunin commented Mar 31, 2021

mingfeima commented Apr 1, 2021

facebook-github-bot commented Apr 2, 2021

agolynski commented Apr 2, 2021

facebook-github-bot commented Apr 2, 2021

VitalyFedyunin commented Apr 2, 2021

mingfeima commented Apr 8, 2021

mingfeima commented Apr 12, 2021

malfet Apr 13, 2021

Choose a reason for hiding this comment

malfet Apr 13, 2021

Choose a reason for hiding this comment

VitalyFedyunin commented Apr 13, 2021

malfet commented Apr 13, 2021 • edited

cccclai commented Apr 13, 2021

cccclai commented Apr 13, 2021

mingfeima commented Dec 7, 2020 •

edited

mingfeima commented Dec 31, 2020 •

edited

malfet commented Apr 13, 2021 •

edited