Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add channels last support for thnn_conv2d (non-dilated) #49582

Closed
wants to merge 38 commits into from

Conversation

mingfeima
Copy link
Collaborator

@mingfeima mingfeima commented Dec 18, 2020

Stack from ghstack:

Differential Revision: D26007050

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Dec 18, 2020

💊 CI failures summary and remediations

As of commit 7279dd8 (more details on the Dr. CI page and at hud.pytorch.org/pr/49582):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test1 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jul 08 02:53:15 SUMMARY: UndefinedBehaviorSanit.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in
Jul 08 02:53:15     #9 0x555efbea28f2 in PyEval_EvalCode /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval.c:731
Jul 08 02:53:15     #10 0x555efbf0acd5 in run_mod /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:1025
Jul 08 02:53:15     #11 0x555efbf0cd5d in PyRun_StringFlags /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:949
Jul 08 02:53:15     #12 0x555efbf0cdbb in PyRun_SimpleStringFlags /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:445
Jul 08 02:53:15     #13 0x555efbf0d926 in run_command /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Modules/main.c:301
Jul 08 02:53:15     #14 0x555efbf0d926 in Py_Main /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Modules/main.c:749
Jul 08 02:53:15     #15 0x555efbe47196 in main /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Programs/python.c:69
Jul 08 02:53:15     #16 0x7f42f008183f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291
Jul 08 02:53:15     #17 0x555efbed733d in _start (/opt/conda/bin/python3.6+0x1a733d)
Jul 08 02:53:15 
Jul 08 02:53:15 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in 
Jul 08 02:53:15 + retcode=1
Jul 08 02:53:15 + set -e
Jul 08 02:53:15 + return 1
Jul 08 02:53:15 + [[ pytorch-linux-xenial-py3-clang5-asan-test1 == *-NO_AVX-* ]]
Jul 08 02:53:15 + [[ pytorch-linux-xenial-py3-clang5-asan-test1 == *-NO_AVX2-* ]]
Jul 08 02:53:15 + '[' -n https://github.com/pytorch/pytorch/pull/49582 ']'
Jul 08 02:53:15 + [[ pytorch-linux-xenial-py3-clang5-asan-test1 != *coverage* ]]
Jul 08 02:53:15 ++ mktemp
Jul 08 02:53:15 + DETERMINE_FROM=/tmp/tmp.CufbEAbXYp
Jul 08 02:53:15 + file_diff_from_base /tmp/tmp.CufbEAbXYp

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Jul 08 05:23:59 unknown file: Failure
Jul 08 05:23:59 frame #7: build/bin/test_api() [0xc0b8c5]
Jul 08 05:23:59 frame #8: build/bin/test_api() [0xc0bb65]
Jul 08 05:23:59 frame #9: testing::internal::UnitTestImpl::RunAllTests() + 0xbf9 (0xc0cba9 in build/bin/test_api)
Jul 08 05:23:59 frame #10: testing::UnitTest::Run() + 0x8f (0xc0ce4f in build/bin/test_api)
Jul 08 05:23:59 frame #11: main + 0xc8 (0x5833a8 in build/bin/test_api)
Jul 08 05:23:59 frame #12: __libc_start_main + 0xf0 (0x7f7f36387840 in /lib/x86_64-linux-gnu/libc.so.6)
Jul 08 05:23:59 frame #13: _start + 0x29 (0x5b9a19 in build/bin/test_api)
Jul 08 05:23:59 " thrown in the test body.
Jul 08 05:23:59 [  FAILED  ] IntegrationTest.MNIST_CUDA (3 ms)
Jul 08 05:23:59 [ RUN      ] IntegrationTest.MNISTBatchNorm_CUDA
Jul 08 05:23:59 unknown file: Failure
Jul 08 05:23:59 C++ exception with description "Error opening images file at test/cpp/api/mnist/train-images-idx3-ubyte
Jul 08 05:23:59 Exception raised from read_images at /var/lib/jenkins/workspace/torch/csrc/api/src/data/datasets/mnist.cpp:67 (most recent call first):
Jul 08 05:23:59 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f7f4ff34c6b in /var/lib/jenkins/workspace/build/lib/libc10.so)
Jul 08 05:23:59 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xce (0x7f7f4ff3047e in /var/lib/jenkins/workspace/build/lib/libc10.so)
Jul 08 05:23:59 frame #2: <unknown function> + 0x4215952 (0x7f7f54595952 in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
Jul 08 05:23:59 frame #3: torch::data::datasets::MNIST::MNIST(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::data::datasets::MNIST::Mode) + 0x46 (0x7f7f545969f6 in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
Jul 08 05:23:59 frame #4: IntegrationTest_MNISTBatchNorm_CUDA_Test::TestBody() + 0x9d6 (0x7843b6 in build/bin/test_api)
Jul 08 05:23:59 frame #5: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x4a (0xc1489a in build/bin/test_api)
Jul 08 05:23:59 frame #6: build/bin/test_api() [0xc0b2d6]
Jul 08 05:23:59 frame #7: build/bin/test_api() [0xc0b8c5]

Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

mingfeima added a commit that referenced this pull request Dec 18, 2020
ghstack-source-id: 8044e97be67244e41383d2a4a956860c000e90a0
Pull Request resolved: #49582
@mingfeima
Copy link
Collaborator Author

Initiative of this patch

Due to performance or functionality reasons, nn.Conv2d will choose mkldnn_convolution or thnn_conv2d on CPU depending on input tensor size and kernel size, refer to use_mkldnn definition. So in order to claim that nn.Conv2d supports channels last memory format on CPU, need to make sure the underlying implementations support CL format. mkldnn_convolution CL support will be covered in later patches, this PR aims at providing CL support for thnn_conv2d.

Performance

Measured Conv2d test configs #40610 on single core of Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, mkldnn path is manually disabled with torch._C._set_mkldnn_enabled(False) so as to test thnn_conv2d:

Notice that on grouped conv CL performs worse than Contiguous in the following table, but grouped conv will end up in mkldnn with default setting (mkldnn has much better perf with group conv than the gemm based algorithm)

input size output channels kernel stride padding groups before (nchw) after (nchw) after (nhwc)
[1,3,224,224] 64 7 2 3 1 4.443 4.435 4.488
[1,64,56,56] 128 1 1 0 1 0.337 0.337 0.360
[1,128,56,56] 128 3 1 1 32 1.784 1.798 9.012
[1,128,56,56] 256 1 1 0 1 1.117 1.116 1.155
[1,64,56,56] 256 1 1 0 1 0.773 0.774 0.783
[1,256,56,56] 128 1 1 0 1 0.994 1.006 1.084
[1,128,56,56] 128 3 1 1 32 1.787 1.784 9.013
[1,128,56,56] 256 1 1 0 1 1.117 1.117 1.159
[1,256,56,56] 128 1 1 0 1 0.994 0.995 1.085
[1,128,56,56] 128 3 1 1 32 1.783 1.786 9.004
[1,128,56,56] 256 1 1 0 1 1.123 1.115 1.157
[1,256,56,56] 256 1 1 0 1 2.057 2.059 2.136
[1,256,56,56] 256 3 2 1 32 4.083 4.101 4.186
[1,256,28,28] 512 1 1 0 1 1.059 1.057 1.073
[1,256,56,56] 512 1 2 0 1 1.395 1.400 1.149
[1,512,28,28] 256 1 1 0 1 1.048 1.050 1.109
[1,256,28,28] 256 3 1 1 32 1.379 1.383 2.936
[1,256,28,28] 512 1 1 0 1 1.057 1.056 1.072
[1,512,28,28] 256 1 1 0 1 1.048 1.051 1.106
[1,256,28,28] 256 3 1 1 32 1.376 1.381 2.942
[1,256,28,28] 512 1 1 0 1 1.059 1.065 1.073
[1,512,28,28] 256 1 1 0 1 1.049 1.056 1.110
[1,256,28,28] 256 3 1 1 32 1.377 1.387 2.933
[1,256,28,28] 512 1 1 0 1 1.058 1.056 1.072
[1,512,28,28] 512 1 1 0 1 2.014 2.008 2.059
[1,512,28,28] 512 3 2 1 32 2.263 2.284 1.583
[1,512,14,14] 1024 1 1 0 1 1.200 1.199 1.136
[1,512,28,28] 1024 1 2 0 1 1.356 1.359 1.175
[1,1024,14,14] 512 1 1 0 1 1.239 1.235 1.164
[1,512,14,14] 512 3 1 1 32 1.380 1.395 1.301
[1,512,14,14] 1024 1 1 0 1 1.192 1.190 1.140
[1,1024,14,14] 512 1 1 0 1 1.234 1.235 1.163
[1,512,14,14] 512 3 1 1 32 1.375 1.394 1.303
[1,512,14,14] 1024 1 1 0 1 1.192 1.195 1.143
[1,1024,14,14] 512 1 1 0 1 1.240 1.235 1.165
[1,512,14,14] 512 3 1 1 32 1.384 1.394 1.309
[1,512,14,14] 1024 1 1 0 1 1.195 1.189 1.134
[1,1024,14,14] 512 1 1 0 1 1.234 1.231 1.164
[1,512,14,14] 512 3 1 1 32 1.380 1.399 1.301
[1,512,14,14] 1024 1 1 0 1 1.196 1.188 1.141
[1,1024,14,14] 512 1 1 0 1 1.234 1.237 1.168
[1,512,14,14] 512 3 1 1 32 1.379 1.388 1.303
[1,512,14,14] 1024 1 1 0 1 1.192 1.192 1.140
[1,1024,14,14] 512 1 1 0 1 1.234 1.234 1.166
[1,512,14,14] 512 3 1 1 32 1.381 1.389 1.302
[1,512,14,14] 1024 1 1 0 1 1.189 1.194 1.138
[1,1024,14,14] 512 1 1 0 1 1.235 1.235 1.161
[1,512,14,14] 512 3 1 1 32 1.379 1.397 1.312
[1,512,14,14] 1024 1 1 0 1 1.198 1.191 1.143
[1,1024,14,14] 512 1 1 0 1 1.236 1.233 1.163
[1,512,14,14] 512 3 1 1 32 1.379 1.391 1.306
[1,512,14,14] 1024 1 1 0 1 1.191 1.190 1.137
[1,1024,14,14] 512 1 1 0 1 1.236 1.232 1.168
[1,512,14,14] 512 3 1 1 32 1.375 1.389 1.303
[1,512,14,14] 1024 1 1 0 1 1.194 1.195 1.136
[1,1024,14,14] 512 1 1 0 1 1.243 1.241 1.161
[1,512,14,14] 512 3 1 1 32 1.379 1.398 1.305
[1,512,14,14] 1024 1 1 0 1 1.195 1.189 1.141
[1,1024,14,14] 512 1 1 0 1 1.237 1.232 1.163
[1,512,14,14] 512 3 1 1 32 1.379 1.391 1.304
[1,512,14,14] 1024 1 1 0 1 1.193 1.188 1.136
[1,1024,14,14] 512 1 1 0 1 1.237 1.231 1.166
[1,512,14,14] 512 3 1 1 32 1.380 1.397 1.305
[1,512,14,14] 1024 1 1 0 1 1.191 1.193 1.136
[1,1024,14,14] 512 1 1 0 1 1.233 1.234 1.163
[1,512,14,14] 512 3 1 1 32 1.377 1.399 1.302
[1,512,14,14] 1024 1 1 0 1 1.194 1.191 1.140
[1,1024,14,14] 512 1 1 0 1 1.236 1.237 1.162
[1,512,14,14] 512 3 1 1 32 1.376 1.391 1.305
[1,512,14,14] 1024 1 1 0 1 1.194 1.194 1.144
[1,1024,14,14] 512 1 1 0 1 1.233 1.232 1.169
[1,512,14,14] 512 3 1 1 32 1.381 1.397 1.300
[1,512,14,14] 1024 1 1 0 1 1.191 1.189 1.140
[1,1024,14,14] 512 1 1 0 1 1.236 1.235 1.164
[1,512,14,14] 512 3 1 1 32 1.379 1.397 1.306
[1,512,14,14] 1024 1 1 0 1 1.199 1.187 1.136
[1,1024,14,14] 512 1 1 0 1 1.239 1.241 1.162
[1,512,14,14] 512 3 1 1 32 1.377 1.404 1.305
[1,512,14,14] 1024 1 1 0 1 1.192 1.189 1.136
[1,1024,14,14] 512 1 1 0 1 1.235 1.233 1.166
[1,512,14,14] 512 3 1 1 32 1.379 1.407 1.310
[1,512,14,14] 1024 1 1 0 1 1.202 1.193 1.151
[1,1024,14,14] 512 1 1 0 1 1.239 1.234 1.163
[1,512,14,14] 512 3 1 1 32 1.376 1.393 1.305
[1,512,14,14] 1024 1 1 0 1 1.194 1.199 1.134
[1,1024,14,14] 512 1 1 0 1 1.234 1.236 1.162
[1,512,14,14] 512 3 1 1 32 1.379 1.395 1.305
[1,512,14,14] 1024 1 1 0 1 1.191 1.194 1.138
[1,1024,14,14] 512 1 1 0 1 1.235 1.234 1.164
[1,512,14,14] 512 3 1 1 32 1.377 1.398 1.301
[1,512,14,14] 1024 1 1 0 1 1.192 1.191 1.137
[1,1024,14,14] 512 1 1 0 1 1.232 1.233 1.168
[1,512,14,14] 512 3 1 1 32 1.380 1.394 1.310
[1,512,14,14] 1024 1 1 0 1 1.191 1.189 1.135
[1,1024,14,14] 1024 1 1 0 1 2.336 2.333 2.280
[1,1024,14,14] 1024 3 2 1 32 1.711 1.735 1.288
[1,1024,7,7] 2048 1 1 0 1 1.620 1.615 1.746
[1,1024,14,14] 2048 1 2 0 1 1.720 1.728 1.763
[1,2048,7,7] 1024 1 1 0 1 1.630 1.629 1.763
[1,1024,7,7] 1024 3 1 1 32 1.492 1.503 1.166
[1,1024,7,7] 2048 1 1 0 1 1.620 1.614 1.749
[1,2048,7,7] 1024 1 1 0 1 1.624 1.640 1.760
[1,1024,7,7] 1024 3 1 1 32 1.489 1.510 1.172
[1,1024,7,7] 2048 1 1 0 1 1.630 1.616 1.750

mingfeima added a commit that referenced this pull request Dec 21, 2020
ghstack-source-id: 420d9cdfbcf948e6695a89018643071ac1409d51
Pull Request resolved: #49582
dgl-intel pushed a commit to dgl-intel/pytorch that referenced this pull request May 15, 2021
ghstack-source-id: 0584ef4b7b499004aa55c1d62db34c584ee60aab
Pull Request resolved: pytorch#49582
@VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mingfeima
Copy link
Collaborator Author

rebased!

@VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mingfeima
Copy link
Collaborator Author

rebased!

@VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in 8b88c24.

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by b60d1b7.

@VitalyFedyunin
Copy link
Contributor

Reverting as it might be offender to Iphone perf regression, please do nothing at thing point, might be false positive, I'm investigating.

@facebook-github-bot facebook-github-bot deleted the gh/mingfeima/7/head branch July 24, 2021 14:17
@mingfeima
Copy link
Collaborator Author

mingfeima commented Nov 3, 2021

@VitalyFedyunin The current code base in ConvolutionMM2d.cpp has replaced .mm() with cpublas::gemm, also use TensorAccessor to save .narrow(), which shall be able to same some time in dispatch and TensorImpl construction/deconstruction.

I will rewrite this one accordingly, and collect new performance data. @VitalyFedyunin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants