[Binary] Link whole CuDNN for CUDA-11.1 #59802

malfet · 2021-06-10T16:39:13Z

Fixes #50153

facebook-github-bot · 2021-06-10T16:39:19Z

💊 CI failures summary and remediations

As of commit 98a4b0f (more details on the Dr. CI page):

4/4 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 10 19:18:45 ERROR [63.061s]: test_backward_...i (__main__.ProcessGroupDistAutogradTestWithSpawn)

Jun 10 19:18:42 ok (2.757s)
Jun 10 19:18:44   test_send_remote_module_over_the_wire_script_not_supported (__main__.ProcessGroupThreeWorkersRemoteModuleTestWithSpawn) ... /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/__init__.py:162: UserWarning: RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. PyTorch v1.9 will be the last release that carries PROCESS_GROUP RPC backend. If you have concerns or suggestions please comment in https://github.com/pytorch/pytorch/issues/55615
Jun 10 19:18:44   warnings.warn(
Jun 10 19:18:44 /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/__init__.py:162: UserWarning: RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. PyTorch v1.9 will be the last release that carries PROCESS_GROUP RPC backend. If you have concerns or suggestions please comment in https://github.com/pytorch/pytorch/issues/55615
Jun 10 19:18:44   warnings.warn(
Jun 10 19:18:44 /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/__init__.py:162: UserWarning: RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. PyTorch v1.9 will be the last release that carries PROCESS_GROUP RPC backend. If you have concerns or suggestions please comment in https://github.com/pytorch/pytorch/issues/55615
Jun 10 19:18:44   warnings.warn(
Jun 10 19:18:45 ok (2.755s)
Jun 10 19:18:45 
Jun 10 19:18:45 ======================================================================
Jun 10 19:18:45 ERROR [63.061s]: test_backward_rref_multi (__main__.ProcessGroupDistAutogradTestWithSpawn)
Jun 10 19:18:45 ----------------------------------------------------------------------
Jun 10 19:18:45 Traceback (most recent call last):
Jun 10 19:18:45   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 398, in wrapper
Jun 10 19:18:45     self._join_processes(fn)
Jun 10 19:18:45   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 590, in _join_processes
Jun 10 19:18:45     self._check_return_codes(elapsed_time)
Jun 10 19:18:45   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 633, in _check_return_codes
Jun 10 19:18:45     raise RuntimeError(error)
Jun 10 19:18:45 RuntimeError: Process 1 exited with error code 10 and exception:
Jun 10 19:18:45 Traceback (most recent call last):

pytorch_paralleltbb_linux_xenial_py3_6_gcc5_4_test (2/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 10 19:50:40 [E request_callback_no_python.c...quest type 275: Unexpected end of pickler archive.

Jun 10 19:50:40 frame #7: torch::distributed::autograd::CleanupAutogradContextReq::fromMessage(torch::distributed::rpc::Message const&) + 0xcb (0x7f0d0e036bbb in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #8: torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) + 0x1ed (0x7f0d0e07d44d in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #9: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x7f (0x7f0d0e051a2f in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #10: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x57 (0x7f0d0e049cf7 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #11: <unknown function> + 0xd32440 (0x7f0d16a83440 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
Jun 10 19:50:40 frame #12: c10::ThreadPool::main_loop(unsigned long) + 0x2a3 (0x7f0d156eabd3 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 10 19:50:40 frame #13: <unknown function> + 0xc8421 (0x7f0d173cc421 in /opt/conda/lib/libstdc++.so.6)
Jun 10 19:50:40 frame #14: <unknown function> + 0x76ba (0x7f0d24f366ba in /lib/x86_64-linux-gnu/libpthread.so.0)
Jun 10 19:50:40 frame #15: clone + 0x6d (0x7f0d24c6c51d in /lib/x86_64-linux-gnu/libc.so.6)
Jun 10 19:50:40 
Jun 10 19:50:40 [E request_callback_no_python.cpp:552] Received error while processing request type 275: Unexpected end of pickler archive.
Jun 10 19:50:40 Exception raised from readSlowWithBuffer at /var/lib/jenkins/workspace/torch/csrc/jit/serialization/unpickler.cpp:756 (most recent call first):
Jun 10 19:50:40 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x69 (0x7f0d15714ed9 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 10 19:50:40 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0xc5 (0x7f0d15711155 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 10 19:50:40 frame #2: <unknown function> + 0x3fd9838 (0x7f0d0ddc5838 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #3: torch::jit::Unpickler::run() + 0xdf (0x7f0d0ddcfd6f in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #4: torch::jit::Unpickler::parse_ivalue() + 0x2e (0x7f0d0ddcfede in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #5: torch::jit::unpickle(std::function<unsigned long (char*, unsigned long)>, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>) + 0x25c (0x7f0d0dda2dbc in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #6: torch::jit::unpickle(char const*, unsigned long, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>) + 0xdd (0x7f0d0dda32cd in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #7: torch::distributed::autograd::CleanupAutogradContextReq::fromMessage(torch::distributed::rpc::Message const&) + 0xcb (0x7f0d0e036bbb in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #8: torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) + 0x1ed (0x7f0d0e07d44d in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)

2 failures not recognized by patterns:

Job	Step	Action
^{binary_linux_conda_3_6_cu111_devtoolset7_nightly_build}	^Unknown	🔁 rerun
^{binary_macos_conda_3_8_cpu_nightly_build}	^Build	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot · 2021-06-10T16:39:25Z

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

driazati

lgtm, is the minimal repro for the perf difference share-able in a gist or something?

malfet · 2021-06-10T19:43:31Z

@driazati unfortunately there are no way to detect in in our current CI setup, as perf difference does not show up in pre sm_75 GPUs.

But here is minimal C++ repro that demonstrates the effect static vs dynamic linking has on cudnnConvolutionForward() performance:

facebook-github-bot · 2021-06-10T20:31:28Z

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-06-10T23:56:15Z

@malfet merged this pull request in c2c35c0.

Summary: Fixes pytorch#50153 Pull Request resolved: pytorch#59802 Reviewed By: driazati, seemethere Differential Revision: D29033537 Pulled By: malfet fbshipit-source-id: e816fc71f273ae0b4ba8a0621d5368a2078561a1

* Move cublas dependency after CuDNN (#58287) Summary: Library linking order matters during static linking Not sure whether its a bug or a feature, but if cublas is reference before CuDNN, it will be partially statically linked into the library, even if it is not used Pull Request resolved: #58287 Reviewed By: janeyx99 Differential Revision: D28433165 Pulled By: malfet fbshipit-source-id: 8dffa0533075126dc383428f838f7d048074205c * [CMake] Split caffe2::cudnn into public and private (#59721) Summary: This is only important for builds where cuDNN is linked statically into libtorch_cpu. Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library. Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening. Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in #50153 Pull Request resolved: #59721 Reviewed By: ngimel Differential Revision: D29000967 Pulled By: malfet fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336 * Add USE_WHOLE_CUDNN option (#59744) Summary: It is only enabled if USE_STATIC_CUDNN is enabled Next step after #59721 towards resolving fast kernels stripping reported in #50153 Pull Request resolved: #59744 Reviewed By: seemethere, ngimel Differential Revision: D29007314 Pulled By: malfet fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a * [Binary] Link whole CuDNN for CUDA-11.1 (#59802) Summary: Fixes #50153 Pull Request resolved: #59802 Reviewed By: driazati, seemethere Differential Revision: D29033537 Pulled By: malfet fbshipit-source-id: e816fc71f273ae0b4ba8a0621d5368a2078561a1

malfet added the ci/binaries label Jun 10, 2021

facebook-github-bot added the cla signed label Jun 10, 2021

malfet force-pushed the malfet/build-nightly-cuda11-with-whole branch from 7dd35e3 to 1849413 Compare June 10, 2021 17:25

[Binary] Link whole CuDNN for CUDA-11.1

98a4b0f

malfet force-pushed the malfet/build-nightly-cuda11-with-whole branch from 1849413 to 98a4b0f Compare June 10, 2021 17:51

malfet requested a review from a team June 10, 2021 18:23

driazati approved these changes Jun 10, 2021

View reviewed changes

seemethere approved these changes Jun 10, 2021

View reviewed changes

facebook-github-bot closed this in c2c35c0 Jun 10, 2021

facebook-github-bot added the Merged label Jun 10, 2021

This was referenced Jun 11, 2021

[Release/1.9] Link whole CuDNN for CUDA-11.1 #59873

Merged

[v1.9.0] Release Tracker #58518

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Binary] Link whole CuDNN for CUDA-11.1 #59802

[Binary] Link whole CuDNN for CUDA-11.1 #59802

malfet commented Jun 10, 2021

facebook-github-bot commented Jun 10, 2021 •

edited

Loading

facebook-github-bot commented Jun 10, 2021

driazati left a comment

malfet commented Jun 10, 2021 •

edited

Loading

facebook-github-bot commented Jun 10, 2021

facebook-github-bot commented Jun 10, 2021

[Binary] Link whole CuDNN for CUDA-11.1 #59802

[Binary] Link whole CuDNN for CUDA-11.1 #59802

Conversation

malfet commented Jun 10, 2021

facebook-github-bot commented Jun 10, 2021 • edited Loading

💊 CI failures summary and remediations

🕵️ 2 new failures recognized by patterns

pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (1/2)

pytorch_paralleltbb_linux_xenial_py3_6_gcc5_4_test (2/2)

2 failures not recognized by patterns:

facebook-github-bot commented Jun 10, 2021

driazati left a comment

Choose a reason for hiding this comment

malfet commented Jun 10, 2021 • edited Loading

facebook-github-bot commented Jun 10, 2021

facebook-github-bot commented Jun 10, 2021

facebook-github-bot commented Jun 10, 2021 •

edited

Loading

malfet commented Jun 10, 2021 •

edited

Loading