[CMake] Split caffe2::cudnn into public and private #59721

malfet · 2021-06-09T17:14:05Z

This is only important for builds where cuDNN is linked statically into libtorch_cpu.
Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library.
Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening.
Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in #50153

facebook-github-bot · 2021-06-09T17:14:11Z

💊 CI failures summary and remediations

As of commit 16d3e33 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test1 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 09 20:16:45 ERROR [0.015s]: test_multi_worker_with_nonfixed_world_size (__main__.TCPStoreTest)

Jun 09 20:16:43 Exception ignored in: <Finalize object, dead>
Jun 09 20:16:43 Traceback (most recent call last):
Jun 09 20:16:43   File "/opt/conda/lib/python3.6/multiprocessing/util.py", line 186, in __call__
Jun 09 20:16:43     res = self._callback(*self._args, **self._kwargs)
Jun 09 20:16:43 OSError: [Errno 9] Bad file descriptor
Jun 09 20:16:43   test_multitenancy (__main__.TCPStoreTest) ... ok (0.002s)
Jun 09 20:16:45   test_numkeys_delkeys (__main__.TCPStoreTest) ... ok (2.028s)
Jun 09 20:16:45   test_set_get (__main__.TCPStoreTest) ... ok (0.003s)
Jun 09 20:16:45 
Jun 09 20:16:45 ======================================================================
Jun 09 20:16:45 ERROR [0.015s]: test_multi_worker_with_nonfixed_world_size (__main__.TCPStoreTest)
Jun 09 20:16:45 ----------------------------------------------------------------------
Jun 09 20:16:45 Traceback (most recent call last):
Jun 09 20:16:45   File "distributed/test_store.py", line 291, in test_multi_worker_with_nonfixed_world_size
Jun 09 20:16:45     self._multi_worker_helper(-1)
Jun 09 20:16:45   File "distributed/test_store.py", line 279, in _multi_worker_helper
Jun 09 20:16:45     raise RuntimeError(error_message)
Jun 09 20:16:45 RuntimeError
Jun 09 20:16:45 
Jun 09 20:16:45 ----------------------------------------------------------------------
Jun 09 20:16:45 Ran 24 tests in 12.319s

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot · 2021-06-09T18:34:16Z

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel

lgtm

walterddr · 2021-06-09T20:11:02Z

caffe2/CMakeLists.txt

@@ -1493,6 +1497,10 @@ elseif(USE_CUDA)
      torch_cuda PRIVATE ${Caffe2_GPU_INCLUDE})
  target_link_libraries(
      torch_cuda PRIVATE ${Caffe2_CUDA_DEPENDENCY_LIBS})
+  if(USE_CUDNN)
+    target_link_libraries(
+        torch_cuda PRIVATE  caffe2::cudnn-private)


nit: double spaced.

facebook-github-bot · 2021-06-09T20:20:34Z

@malfet merged this pull request in 8845cba.

Summary: This is only important for builds where cuDNN is linked statically into libtorch_cpu. Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library. Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening. Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in pytorch#50153 Pull Request resolved: pytorch#59721 Reviewed By: ngimel Differential Revision: D29000967 Pulled By: malfet fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336

Summary: It is only enabled if USE_STATIC_CUDNN is enabled Next step after #59721 towards resolving fast kernels stripping reported in #50153 Pull Request resolved: #59744 Reviewed By: seemethere, ngimel Differential Revision: D29007314 Pulled By: malfet fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a

Summary: This is only important for builds where cuDNN is linked statically into libtorch_cpu. Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library. Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening. Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in pytorch#50153 Pull Request resolved: pytorch#59721 Reviewed By: ngimel Differential Revision: D29000967 Pulled By: malfet fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336

Summary: It is only enabled if USE_STATIC_CUDNN is enabled Next step after pytorch#59721 towards resolving fast kernels stripping reported in pytorch#50153 Pull Request resolved: pytorch#59744 Reviewed By: seemethere, ngimel Differential Revision: D29007314 Pulled By: malfet fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a

* Move cublas dependency after CuDNN (#58287) Summary: Library linking order matters during static linking Not sure whether its a bug or a feature, but if cublas is reference before CuDNN, it will be partially statically linked into the library, even if it is not used Pull Request resolved: #58287 Reviewed By: janeyx99 Differential Revision: D28433165 Pulled By: malfet fbshipit-source-id: 8dffa0533075126dc383428f838f7d048074205c * [CMake] Split caffe2::cudnn into public and private (#59721) Summary: This is only important for builds where cuDNN is linked statically into libtorch_cpu. Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library. Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening. Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in #50153 Pull Request resolved: #59721 Reviewed By: ngimel Differential Revision: D29000967 Pulled By: malfet fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336 * Add USE_WHOLE_CUDNN option (#59744) Summary: It is only enabled if USE_STATIC_CUDNN is enabled Next step after #59721 towards resolving fast kernels stripping reported in #50153 Pull Request resolved: #59744 Reviewed By: seemethere, ngimel Differential Revision: D29007314 Pulled By: malfet fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a * [Binary] Link whole CuDNN for CUDA-11.1 (#59802) Summary: Fixes #50153 Pull Request resolved: #59802 Reviewed By: driazati, seemethere Differential Revision: D29033537 Pulled By: malfet fbshipit-source-id: e816fc71f273ae0b4ba8a0621d5368a2078561a1

Split caffe2::cudnn into public and private

5705e9e

malfet requested review from ngimel and a team June 9, 2021 17:14

facebook-github-bot added the cla signed label Jun 9, 2021

Do not link with cudnn-private if it is not defined

16d3e33

ngimel approved these changes Jun 9, 2021

View reviewed changes

driazati approved these changes Jun 9, 2021

View reviewed changes

walterddr approved these changes Jun 9, 2021

View reviewed changes

walterddr reviewed Jun 9, 2021

View reviewed changes

facebook-github-bot closed this in 8845cba Jun 9, 2021

facebook-github-bot added the Merged label Jun 9, 2021

malfet deleted the malfet/introduce-cudnn-private branch June 9, 2021 20:54

malfet mentioned this pull request Jun 9, 2021

Add USE_WHOLE_CUDNN option #59744

Closed

This was referenced Jun 11, 2021

[Release/1.9] Link whole CuDNN for CUDA-11.1 #59873

Merged

[v1.9.0] Release Tracker #58518

Closed

malfet mentioned this pull request Dec 20, 2021

Dot product return completely incorrect result when using pip but not when using conda #70162

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CMake] Split caffe2::cudnn into public and private #59721

[CMake] Split caffe2::cudnn into public and private #59721

Uh oh!

malfet commented Jun 9, 2021

Uh oh!

facebook-github-bot commented Jun 9, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 9, 2021

Uh oh!

ngimel left a comment

Uh oh!

walterddr Jun 9, 2021

Uh oh!

facebook-github-bot commented Jun 9, 2021

Uh oh!

Uh oh!

[CMake] Split caffe2::cudnn into public and private #59721

[CMake] Split caffe2::cudnn into public and private #59721

Uh oh!

Conversation

malfet commented Jun 9, 2021

Uh oh!

facebook-github-bot commented Jun 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test1 (1/1)

Uh oh!

facebook-github-bot commented Jun 9, 2021

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

walterddr Jun 9, 2021

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jun 9, 2021

Uh oh!

Uh oh!

facebook-github-bot commented Jun 9, 2021 •

edited

Loading