Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Binary] Link whole CuDNN for CUDA-11.1 #59802

Closed

Conversation

malfet
Copy link
Contributor

@malfet malfet commented Jun 10, 2021

Fixes #50153

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 10, 2021

💊 CI failures summary and remediations

As of commit 98a4b0f (more details on the Dr. CI page):


  • 4/4 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 10 19:18:45 ERROR [63.061s]: test_backward_...i (__main__.ProcessGroupDistAutogradTestWithSpawn)
Jun 10 19:18:42 ok (2.757s)
Jun 10 19:18:44   test_send_remote_module_over_the_wire_script_not_supported (__main__.ProcessGroupThreeWorkersRemoteModuleTestWithSpawn) ... /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/__init__.py:162: UserWarning: RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. PyTorch v1.9 will be the last release that carries PROCESS_GROUP RPC backend. If you have concerns or suggestions please comment in https://github.com/pytorch/pytorch/issues/55615
Jun 10 19:18:44   warnings.warn(
Jun 10 19:18:44 /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/__init__.py:162: UserWarning: RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. PyTorch v1.9 will be the last release that carries PROCESS_GROUP RPC backend. If you have concerns or suggestions please comment in https://github.com/pytorch/pytorch/issues/55615
Jun 10 19:18:44   warnings.warn(
Jun 10 19:18:44 /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/__init__.py:162: UserWarning: RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. PyTorch v1.9 will be the last release that carries PROCESS_GROUP RPC backend. If you have concerns or suggestions please comment in https://github.com/pytorch/pytorch/issues/55615
Jun 10 19:18:44   warnings.warn(
Jun 10 19:18:45 ok (2.755s)
Jun 10 19:18:45 
Jun 10 19:18:45 ======================================================================
Jun 10 19:18:45 ERROR [63.061s]: test_backward_rref_multi (__main__.ProcessGroupDistAutogradTestWithSpawn)
Jun 10 19:18:45 ----------------------------------------------------------------------
Jun 10 19:18:45 Traceback (most recent call last):
Jun 10 19:18:45   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 398, in wrapper
Jun 10 19:18:45     self._join_processes(fn)
Jun 10 19:18:45   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 590, in _join_processes
Jun 10 19:18:45     self._check_return_codes(elapsed_time)
Jun 10 19:18:45   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 633, in _check_return_codes
Jun 10 19:18:45     raise RuntimeError(error)
Jun 10 19:18:45 RuntimeError: Process 1 exited with error code 10 and exception:
Jun 10 19:18:45 Traceback (most recent call last):

See CircleCI build pytorch_paralleltbb_linux_xenial_py3_6_gcc5_4_test (2/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 10 19:50:40 [E request_callback_no_python.c...quest type 275: Unexpected end of pickler archive.
Jun 10 19:50:40 frame #7: torch::distributed::autograd::CleanupAutogradContextReq::fromMessage(torch::distributed::rpc::Message const&) + 0xcb (0x7f0d0e036bbb in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #8: torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) + 0x1ed (0x7f0d0e07d44d in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #9: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x7f (0x7f0d0e051a2f in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #10: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x57 (0x7f0d0e049cf7 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #11: <unknown function> + 0xd32440 (0x7f0d16a83440 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
Jun 10 19:50:40 frame #12: c10::ThreadPool::main_loop(unsigned long) + 0x2a3 (0x7f0d156eabd3 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 10 19:50:40 frame #13: <unknown function> + 0xc8421 (0x7f0d173cc421 in /opt/conda/lib/libstdc++.so.6)
Jun 10 19:50:40 frame #14: <unknown function> + 0x76ba (0x7f0d24f366ba in /lib/x86_64-linux-gnu/libpthread.so.0)
Jun 10 19:50:40 frame #15: clone + 0x6d (0x7f0d24c6c51d in /lib/x86_64-linux-gnu/libc.so.6)
Jun 10 19:50:40 
Jun 10 19:50:40 [E request_callback_no_python.cpp:552] Received error while processing request type 275: Unexpected end of pickler archive.
Jun 10 19:50:40 Exception raised from readSlowWithBuffer at /var/lib/jenkins/workspace/torch/csrc/jit/serialization/unpickler.cpp:756 (most recent call first):
Jun 10 19:50:40 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x69 (0x7f0d15714ed9 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 10 19:50:40 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0xc5 (0x7f0d15711155 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 10 19:50:40 frame #2: <unknown function> + 0x3fd9838 (0x7f0d0ddc5838 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #3: torch::jit::Unpickler::run() + 0xdf (0x7f0d0ddcfd6f in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #4: torch::jit::Unpickler::parse_ivalue() + 0x2e (0x7f0d0ddcfede in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #5: torch::jit::unpickle(std::function<unsigned long (char*, unsigned long)>, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>) + 0x25c (0x7f0d0dda2dbc in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #6: torch::jit::unpickle(char const*, unsigned long, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>) + 0xdd (0x7f0d0dda32cd in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #7: torch::distributed::autograd::CleanupAutogradContextReq::fromMessage(torch::distributed::rpc::Message const&) + 0xcb (0x7f0d0e036bbb in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 10 19:50:40 frame #8: torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) + 0x1ed (0x7f0d0e07d44d in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)

2 failures not recognized by patterns:

Job Step Action
CircleCI binary_linux_conda_3_6_cu111_devtoolset7_nightly_build Unknown 🔁 rerun
CircleCI binary_macos_conda_3_8_cpu_nightly_build Build 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@malfet malfet force-pushed the malfet/build-nightly-cuda11-with-whole branch from 7dd35e3 to 1849413 Compare June 10, 2021 17:25
@malfet malfet force-pushed the malfet/build-nightly-cuda11-with-whole branch from 1849413 to 98a4b0f Compare June 10, 2021 17:51
@malfet malfet requested a review from a team June 10, 2021 18:23
Copy link
Contributor

@driazati driazati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, is the minimal repro for the perf difference share-able in a gist or something?

@malfet
Copy link
Contributor Author

malfet commented Jun 10, 2021

@driazati unfortunately there are no way to detect in in our current CI setup, as perf difference does not show up in pre sm_75 GPUs.

But here is minimal C++ repro that demonstrates the effect static vs dynamic linking has on cudnnConvolutionForward() performance:

@facebook-github-bot
Copy link
Contributor

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@malfet merged this pull request in c2c35c0.

malfet added a commit to malfet/pytorch that referenced this pull request Jun 11, 2021
Summary:
Fixes pytorch#50153

Pull Request resolved: pytorch#59802

Reviewed By: driazati, seemethere

Differential Revision: D29033537

Pulled By: malfet

fbshipit-source-id: e816fc71f273ae0b4ba8a0621d5368a2078561a1
malfet added a commit that referenced this pull request Jun 11, 2021
* Move cublas dependency after CuDNN (#58287)

Summary:
Library linking order matters during static linking
Not sure whether its a bug or a feature, but if cublas is reference
before CuDNN, it will be partially statically linked into the library,
even if it is not used

Pull Request resolved: #58287

Reviewed By: janeyx99

Differential Revision: D28433165

Pulled By: malfet

fbshipit-source-id: 8dffa0533075126dc383428f838f7d048074205c

* [CMake] Split caffe2::cudnn into public and private (#59721)

Summary:
This is only important for builds where cuDNN is linked statically into libtorch_cpu.
Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library.
Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening.
Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in #50153

Pull Request resolved: #59721

Reviewed By: ngimel

Differential Revision: D29000967

Pulled By: malfet

fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336

* Add USE_WHOLE_CUDNN option (#59744)

Summary:
It is only enabled if USE_STATIC_CUDNN is enabled

Next step after #59721 towards resolving fast kernels stripping reported in #50153

Pull Request resolved: #59744

Reviewed By: seemethere, ngimel

Differential Revision: D29007314

Pulled By: malfet

fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a

* [Binary] Link whole CuDNN for CUDA-11.1 (#59802)

Summary:
Fixes #50153

Pull Request resolved: #59802

Reviewed By: driazati, seemethere

Differential Revision: D29033537

Pulled By: malfet

fbshipit-source-id: e816fc71f273ae0b4ba8a0621d5368a2078561a1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants