Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add channels last for MaxPool2d #48917

Closed
wants to merge 18 commits into from

Conversation

mingfeima
Copy link
Collaborator

@mingfeima mingfeima commented Dec 7, 2020

Stack from ghstack:

max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

Differential Revision: D25399470

max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

[ghstack-poisoned]
@mingfeima
Copy link
Collaborator Author

use this one to replace #42719

This patch adds channels last memory format support for nn.MaxPool2d on CPU, CL path is manually vectorized with vec256 on dimension of C.

Performance result on CPU Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz, 2*20 cores, comparing CL v.s. contiguous:

  • single core inference (BS=1): 9.7x speedup
  • single socket inference (BS=1): 3.3x speedup
  • single socket inference (BS=128): 2.6x speedup

input size is picked up from RN50. Use this script to reproduce, ./run.sh max_pool2d.py.

### (before)
### using OMP_NUM_THREADS=1
MaxPool2d(contiguous): input_size [1, 64, 112, 112] time: 5.317 ms

### using OMP_NUM_THREADS=20
MaxPool2d(contiguous): input_size [1, 64, 112, 112] time: 6.443 ms
MaxPool2d(contiguous): input_size [128, 64, 112, 112] time: 45.178 ms

### (after)
### using OMP_NUM_THREADS=1
MaxPool2d(contiguous):   input_size [1, 64, 112, 112] time: 5.304 ms
MaxPool2d(channels_last): input_size [1, 64, 112, 112] time: 0.544 ms

### using OMP_NUM_THREADS=20

MaxPool2d(contiguous):   input_size [1, 64, 112, 112] time: 0.336 ms
MaxPool2d(channels_last): input_size [1,   64, 112, 112] time: 0.102 ms
MaxPool2d(contiguous): input_size [128,   64, 112, 112] time: 42.141 ms
MaxPool2d(channels_last): input_size   [128, 64, 112, 112] time: 15.897 ms

VitalyFedyunin
VitalyFedyunin previously approved these changes Dec 8, 2020
@VitalyFedyunin
Copy link
Contributor

Fails internally on Android VR build:

Summary: 
[removed]/caffe2/aten/src/ATen/native/cpu/MaxPoolKernel.cpp:172:57: error: no member named 'isnan' in 'at::vec256::(anonymous namespace)::Vec256<float>'
            Vec mask = (val_vec > maxval_vec) | val_vec.isnan();
                                                ~~~~~~~ ^
stderr: [removed]/caffe2/aten/src/ATen/native/cpu/MaxPoolKernel.cpp:172:57: error: no member named 'isnan' in 'at::vec256::(anonymous namespace)::Vec256<float>'
            Vec mask = (val_vec > maxval_vec) | val_vec.isnan();
                                                ~~~~~~~ ^
 ** Summary of failures encountered during the build **
Rule [removed]
 FAILED because Command failed with exit code 1.

max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

Differential Revision: [D25399470](https://our.internmc.facebook.com/intern/diff/D25399470)

[ghstack-poisoned]
@mingfeima
Copy link
Collaborator Author

mingfeima commented Dec 31, 2020

Fails internally on Android VR build:

Summary: 
[removed]/caffe2/aten/src/ATen/native/cpu/MaxPoolKernel.cpp:172:57: error: no member named 'isnan' in 'at::vec256::(anonymous namespace)::Vec256<float>'
            Vec mask = (val_vec > maxval_vec) | val_vec.isnan();
                                                ~~~~~~~ ^
stderr: [removed]/caffe2/aten/src/ATen/native/cpu/MaxPoolKernel.cpp:172:57: error: no member named 'isnan' in 'at::vec256::(anonymous namespace)::Vec256<float>'
            Vec mask = (val_vec > maxval_vec) | val_vec.isnan();
                                                ~~~~~~~ ^
 ** Summary of failures encountered during the build **
Rule [removed]
 FAILED because Command failed with exit code 1.

@VitalyFedyunin, the android build failure has been fixed, please check!

max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

Differential Revision: [D25399470](https://our.internmc.facebook.com/intern/diff/D25399470)

[ghstack-poisoned]
max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

Differential Revision: [D25399470](https://our.internmc.facebook.com/intern/diff/D25399470)

[ghstack-poisoned]
max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

Differential Revision: [D25399470](https://our.internmc.facebook.com/intern/diff/D25399470)

[ghstack-poisoned]
max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

Differential Revision: [D25399470](https://our.internmc.facebook.com/intern/diff/D25399470)

[ghstack-poisoned]
max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

Differential Revision: [D25399470](https://our.internmc.facebook.com/intern/diff/D25399470)

[ghstack-poisoned]
max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

Differential Revision: [D25399470](https://our.internmc.facebook.com/intern/diff/D25399470)

[ghstack-poisoned]
Copy link
Contributor

@VitalyFedyunin VitalyFedyunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! I finished with OneDDN PRs and moving to this stack. Meanwhile can you please add vec256 test for new function.

aten/src/ATen/cpu/vec256/vec256_base.h Show resolved Hide resolved
max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

Differential Revision: [D25399470](https://our.internmc.facebook.com/intern/diff/D25399470)

[ghstack-poisoned]
max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

Differential Revision: [D25399470](https://our.internmc.facebook.com/intern/diff/D25399470)

[ghstack-poisoned]
@VitalyFedyunin
Copy link
Contributor

Sorry, merge conflicts with just landed #54898 , please rebase again

max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

Differential Revision: [D25399470](https://our.internmc.facebook.com/intern/diff/D25399470)

[ghstack-poisoned]
@mingfeima
Copy link
Collaborator Author

@VitalyFedyunin updated!

@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in f43eb59.

@agolynski
Copy link
Contributor

Seems like this broke ios build MaxPoolKernel.cpp

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 978fca6.

@VitalyFedyunin
Copy link
Contributor

Broken build url https://app.circleci.com/pipelines/github/pytorch/pytorch/295279/workflows/5f586a11-d066-46a7-a085-9b72258ce6bc/jobs/12076475

@mingfeima
Copy link
Collaborator Author

@mingfeima
Copy link
Collaborator Author

I can't reproduce the build failure on this patch on iOS.
I followed the instruction from https://pytorch.org/mobile/ios/#build-libtorch-for-arm64-devices

BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR ./scripts/build_ios.sh

I built on both iOS and MacOS and both succeeded.
My local environment is

-- Toolchain using default iOS SDK: /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS14.4.sdk
-- The CXX compiler identification is AppleClang 12.0.0.12000032
-- The C compiler identification is AppleClang 12.0.0.12000032

About the failure from https://app.circleci.com/pipelines/github/pytorch/pytorch/295279/workflows/5f586a11-d066-46a7-a085-9b72258ce6bc/jobs/12076475

The thing that goes wrong is MaxPoolKernel.cpp has been compiled twice

  1. the first time succeeded, MaxPoolKernel.cpp is a kernel file from ATen/native/cpu directory and it has been compiled successfully (with .DEFAULT) at
Apr 02 16:27:52 [ 84%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPoolKernel.cpp.DEFAULT.cpp.o
  1. second time goes wrong, this should not exist
Apr 02 16:33:41 [ 93%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPoolKernel.cpp.o

I don't understand why both MaxPoolKernel.cpp and MaxPoolKernel.cpp.DEFAULT.cpp are compiled.

@VitalyFedyunin About the failed CI, is there any special build recipe for it?

leslie-fang-intel pushed a commit to leslie-fang-intel/pytorch that referenced this pull request Apr 12, 2021
max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

ghstack-source-id: 039bb43d75a3df01fdc47cb53c96d6961cfda3f0
Pull Request resolved: pytorch#48917
int kW, int kH, int dW, int dH, int padW, int padH, int dilationW, int dilationH);
using max_pool2d_backward_fn = void(*)(Tensor& grad_input, const Tensor& grad_output, const Tensor& indices);

DECLARE_DISPATCH(max_pool2d_fn, max_pool2d_kernel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR misses a DEFINE_DISPATCH call in one of .cpp files

int64_t ow = 0;
data_index_init(begin, c, channels, oh, output_height, ow, output_width);

for (int64_t i = begin; i < end; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can c10::irange be used here?

@VitalyFedyunin
Copy link
Contributor

@mingfeima lets try to figure out what is going on here. Could you please create this change as separate PR (do NOT abandon stack) and add ci-all tag to it.

Also please add aten/src/ATen/native/cpu/MaxPoolKernel.cpp into the tools/build_variables.bzl

@malfet
Copy link
Contributor

malfet commented Apr 13, 2021

I can't reproduce the build failure on this patch on iOS.
I followed the instruction from https://pytorch.org/mobile/ios/#build-libtorch-for-arm64-devices

This only happens when Lite interpreter is build, i.e. when BUILD_LITE_INTERPRETER=1
And to fix the problem, add MaxPoolKernel.cpp to

aten_native_source_codegen_list = [

@cccclai Can you please explain why this must be an explicit list instead of glob?

@cccclai
Copy link
Contributor

cccclai commented Apr 13, 2021

Apr 02 16:33:41 [ 93%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/MaxPoolKernel.cpp.o

Yes the recipe is to build lite interpreter is here: https://pytorch.org/tutorials/prototype/lite_interpreter.html. To build for ios, the command is:

BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR BUILD_LITE_INTERPRETER=1 ./scripts/build_ios.sh

@cccclai
Copy link
Contributor

cccclai commented Apr 13, 2021

I can't reproduce the build failure on this patch on iOS.
I followed the instruction from https://pytorch.org/mobile/ios/#build-libtorch-for-arm64-devices

This only happens when Lite interpreter is build, i.e. when BUILD_LITE_INTERPRETER=1
And to fix the problem, add MaxPoolKernel.cpp to

aten_native_source_codegen_list = [

@cccclai Can you please explain why this must be an explicit list instead of glob?

The main reason is to pick the files needed for pytorch edge, such that we can build a library with smaller size (lite interpreter). Some resources from aten are not needed when deploying for mobile.

mingfeima added a commit that referenced this pull request Apr 19, 2021
max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

ghstack-source-id: 2f2487b881ab8080b261d07fc87c46fcef59b8a0
Pull Request resolved: #48917
mingfeima added a commit to mingfeima/pytorch that referenced this pull request Apr 19, 2021
max_pool2d channels last support forward path

max_pool2d channels last support backward path

vectorize channels last forward path

rename the header file

fix windows build

combine PoolingKernel.h into Pool.h

add data type check

loosen test_max_pool2d_nhwc to cover device CPU

ghstack-source-id: 2f2487b881ab8080b261d07fc87c46fcef59b8a0
Pull Request resolved: pytorch#48917
facebook-github-bot pushed a commit that referenced this pull request Apr 20, 2021
Summary:
add channels last support for MaxPool2d.
this one is a replacement of #48917

Pull Request resolved: #56361

Reviewed By: heitorschueroff

Differential Revision: D27874142

Pulled By: VitalyFedyunin

fbshipit-source-id: bc9604def9c974d7b59621fc709a39948088b992
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
add channels last support for MaxPool2d.
this one is a replacement of pytorch#48917

Pull Request resolved: pytorch#56361

Reviewed By: heitorschueroff

Differential Revision: D27874142

Pulled By: VitalyFedyunin

fbshipit-source-id: bc9604def9c974d7b59621fc709a39948088b992
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants