Skip to content

[CMake] Split caffe2::cudnn into public and private #59721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

malfet
Copy link
Contributor

@malfet malfet commented Jun 9, 2021

This is only important for builds where cuDNN is linked statically into libtorch_cpu.
Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library.
Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening.
Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in #50153

@malfet malfet requested review from ngimel and a team June 9, 2021 17:14
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 9, 2021

💊 CI failures summary and remediations

As of commit 16d3e33 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test1 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 09 20:16:45 ERROR [0.015s]: test_multi_worker_with_nonfixed_world_size (__main__.TCPStoreTest)
Jun 09 20:16:43 Exception ignored in: <Finalize object, dead>
Jun 09 20:16:43 Traceback (most recent call last):
Jun 09 20:16:43   File "/opt/conda/lib/python3.6/multiprocessing/util.py", line 186, in __call__
Jun 09 20:16:43     res = self._callback(*self._args, **self._kwargs)
Jun 09 20:16:43 OSError: [Errno 9] Bad file descriptor
Jun 09 20:16:43   test_multitenancy (__main__.TCPStoreTest) ... ok (0.002s)
Jun 09 20:16:45   test_numkeys_delkeys (__main__.TCPStoreTest) ... ok (2.028s)
Jun 09 20:16:45   test_set_get (__main__.TCPStoreTest) ... ok (0.003s)
Jun 09 20:16:45 
Jun 09 20:16:45 ======================================================================
Jun 09 20:16:45 ERROR [0.015s]: test_multi_worker_with_nonfixed_world_size (__main__.TCPStoreTest)
Jun 09 20:16:45 ----------------------------------------------------------------------
Jun 09 20:16:45 Traceback (most recent call last):
Jun 09 20:16:45   File "distributed/test_store.py", line 291, in test_multi_worker_with_nonfixed_world_size
Jun 09 20:16:45     self._multi_worker_helper(-1)
Jun 09 20:16:45   File "distributed/test_store.py", line 279, in _multi_worker_helper
Jun 09 20:16:45     raise RuntimeError(error_message)
Jun 09 20:16:45 RuntimeError
Jun 09 20:16:45 
Jun 09 20:16:45 ----------------------------------------------------------------------
Jun 09 20:16:45 Ran 24 tests in 12.319s

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -1493,6 +1497,10 @@ elseif(USE_CUDA)
torch_cuda PRIVATE ${Caffe2_GPU_INCLUDE})
target_link_libraries(
torch_cuda PRIVATE ${Caffe2_CUDA_DEPENDENCY_LIBS})
if(USE_CUDNN)
target_link_libraries(
torch_cuda PRIVATE caffe2::cudnn-private)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: double spaced.

@facebook-github-bot
Copy link
Contributor

@malfet merged this pull request in 8845cba.

@malfet malfet deleted the malfet/introduce-cudnn-private branch June 9, 2021 20:54
deniskokarev pushed a commit to deniskokarev/pytorch that referenced this pull request Jun 9, 2021
Summary:
This is only important for builds where cuDNN is linked statically into libtorch_cpu.
Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library.
Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening.
Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in pytorch#50153

Pull Request resolved: pytorch#59721

Reviewed By: ngimel

Differential Revision: D29000967

Pulled By: malfet

fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336
facebook-github-bot pushed a commit that referenced this pull request Jun 10, 2021
Summary:
It is only enabled if USE_STATIC_CUDNN is enabled

Next step after #59721 towards resolving fast kernels stripping reported in #50153

Pull Request resolved: #59744

Reviewed By: seemethere, ngimel

Differential Revision: D29007314

Pulled By: malfet

fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a
malfet added a commit to malfet/pytorch that referenced this pull request Jun 11, 2021
Summary:
This is only important for builds where cuDNN is linked statically into libtorch_cpu.
Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library.
Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening.
Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in pytorch#50153

Pull Request resolved: pytorch#59721

Reviewed By: ngimel

Differential Revision: D29000967

Pulled By: malfet

fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336
malfet added a commit to malfet/pytorch that referenced this pull request Jun 11, 2021
Summary:
It is only enabled if USE_STATIC_CUDNN is enabled

Next step after pytorch#59721 towards resolving fast kernels stripping reported in pytorch#50153

Pull Request resolved: pytorch#59744

Reviewed By: seemethere, ngimel

Differential Revision: D29007314

Pulled By: malfet

fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a
malfet added a commit that referenced this pull request Jun 11, 2021
* Move cublas dependency after CuDNN (#58287)

Summary:
Library linking order matters during static linking
Not sure whether its a bug or a feature, but if cublas is reference
before CuDNN, it will be partially statically linked into the library,
even if it is not used

Pull Request resolved: #58287

Reviewed By: janeyx99

Differential Revision: D28433165

Pulled By: malfet

fbshipit-source-id: 8dffa0533075126dc383428f838f7d048074205c

* [CMake] Split caffe2::cudnn into public and private (#59721)

Summary:
This is only important for builds where cuDNN is linked statically into libtorch_cpu.
Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library.
Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening.
Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in #50153

Pull Request resolved: #59721

Reviewed By: ngimel

Differential Revision: D29000967

Pulled By: malfet

fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336

* Add USE_WHOLE_CUDNN option (#59744)

Summary:
It is only enabled if USE_STATIC_CUDNN is enabled

Next step after #59721 towards resolving fast kernels stripping reported in #50153

Pull Request resolved: #59744

Reviewed By: seemethere, ngimel

Differential Revision: D29007314

Pulled By: malfet

fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a

* [Binary] Link whole CuDNN for CUDA-11.1 (#59802)

Summary:
Fixes #50153

Pull Request resolved: #59802

Reviewed By: driazati, seemethere

Differential Revision: D29033537

Pulled By: malfet

fbshipit-source-id: e816fc71f273ae0b4ba8a0621d5368a2078561a1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants