Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what should i do to disable libcaffe2_nvrtc.so #31985

Closed
HardLaugh opened this issue Jan 9, 2020 · 7 comments
Closed

what should i do to disable libcaffe2_nvrtc.so #31985

HardLaugh opened this issue Jan 9, 2020 · 7 comments
Assignees
Labels
module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@HardLaugh
Copy link

HardLaugh commented Jan 9, 2020

I find that executable file must link against libcaffe2_nvrtc.so,but I dont want to link against it due to my projects, I need pure executable file without anything .so about torch, so I build static torch libs and unfortunately libcaffe2_nvrtc.so stoped me

the question is about CUDAHooks.cpp

static std::pair<std::unique_ptr<at::DynamicLibrary>, at::cuda::NVRTC*> load_nvrtc() {
#if defined(_WIN32)
  std::string libcaffe2_nvrtc = "caffe2_nvrtc.dll";
#elif defined(__APPLE__)
  std::string libcaffe2_nvrtc = "libcaffe2_nvrtc.dylib";
#else
  std::string libcaffe2_nvrtc = "libcaffe2_nvrtc.so";
#endif
  std::unique_ptr<at::DynamicLibrary> libnvrtc_stub(
      new at::DynamicLibrary(libcaffe2_nvrtc.c_str()));
  auto fn = (at::cuda::NVRTC * (*)()) libnvrtc_stub->sym("load_nvrtc");
  return std::make_pair(std::move(libnvrtc_stub), fn());
}

and the errors is :

terminate called after throwing an instance of 'std::runtime_error'
  what():  Error in dlopen or dlsym: libcaffe2_nvrtc.so: cannot open shared object file: No such file or directory
The above operation failed in interpreter, with the following stack trace:

cc @ngimel

@zou3519 zou3519 added module: build Build system issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: cuda Related to torch.cuda, and CUDA support in general labels Jan 9, 2020
@zou3519
Copy link
Contributor

zou3519 commented Jan 9, 2020

I'm not following what "pure executable file without anything .so about torch" means, could you please clarify?

@HardLaugh
Copy link
Author

HardLaugh commented Jan 10, 2020

@zou3519 If the system environment changes, which only install mkl and cuda, however, executable file cannot run because it need the shared libs--libcaffe2_nvrtc.so

linux-vdso.so.1 =>  (0x00007ffed67ed000)
        libmkl_intel_lp64.so => /home/46799/anaconda3/envs/eptorch/lib/libmkl_intel_lp64.so (0x00007f31abf5f000)
        libmkl_gnu_thread.so => /home/46799/anaconda3/envs/eptorch/lib/libmkl_gnu_thread.so (0x00007f31aa84c000)
        libmkl_core.so => /home/46799/anaconda3/envs/eptorch/lib/libmkl_core.so (0x00007f31a6843000)
        libgomp.so.1 => /home/46799/anaconda3/envs/eptorch/lib/libgomp.so.1 (0x00007f31a6815000)
        libcuda.so.1 => /usr/local/nvidia/lib64/libcuda.so.1 (0x00007f31a5687000)
        libnvrtc.so.10.0 => /usr/local/cuda/lib64/libnvrtc.so.10.0 (0x00007f31a406b000)
        libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007f31a3e61000)
        libcudart.so.10.0 => /usr/local/cuda/lib64/libcudart.so.10.0 (0x00007f31a3be7000)
        libcufft.so.10.0 => /usr/local/cuda/lib64/libcufft.so.10.0 (0x00007f319d733000)
        libcublas.so.10.0 => /usr/local/cuda/lib64/libcublas.so.10.0 (0x00007f319919c000)
        libcurand.so.10.0 => /usr/local/cuda/lib64/libcurand.so.10.0 (0x00007f3195035000)
        libcusparse.so.10.0 => /usr/local/cuda/lib64/libcusparse.so.10.0 (0x00007f31915cd000)
        libcudnn.so.7 => /usr/local/cuda/lib64/libcudnn.so.7 (0x00007f317d03e000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f317ce20000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f317cc18000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f317ca13000)
        libstdc++.so.6 => /home/46799/anaconda3/envs/eptorch/lib/libstdc++.so.6 (0x00007f317c6d9000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f317c3d3000)
        libgcc_s.so.1 => /home/46799/anaconda3/envs/eptorch/lib/libgcc_s.so.1 (0x00007f317c3be000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f317bff5000)
        /lib64/ld-linux-x86-64.so.2 (0x0000562a03662000)
        libnvidia-fatbinaryloader.so.430.14 => /usr/local/nvidia/lib64/libnvidia-fatbinaryloader.so.430.14 (0x00007f317bda7000)

@wahlberg82
Copy link

I'm in the same situation. I'm building a plugin for a commercial host application. This plugin, which is in the form of a shared library (.so file), is statically linking libtorch (i.e. it's baking all of it into it's own so file). The only libraries that are dynamically linked against are the CUDA libs. This plugin needs to be portable to other systems without the need to install lots of dependencies (CUDA is fine to be needed as an additional install). Everything works, except I can't get my shared library to not also need the torch library "libcaffe2_nvrtc.so". Since it doesn't exist a static version of this library "libcaffe2_nvrtc.a", I can't seem to get around it. Is there a way to not have libtorch dynamically depend on this library and still have GPU support? I can build libtorch with CPU support only, and then I don't have the problem, but I need the GPU accelleration.
Cheers, David

@yeoserene
Copy link

I am having the same problem as @wahlberg82. Running my executable with CPU only doesn't spawn any problems, but trying to run it with GPU after compilation gives: "Error in dlopen or dlsym: libcaffe2_nvrtc.so: cannot open shared object file: No such file or directory". I also need this as a standalone that can run perfectly by itself.

@ghost
Copy link

ghost commented Oct 1, 2020

Is have same problem. Hoe make static link libtorch?

@malfet
Copy link
Contributor

malfet commented Oct 8, 2020

PR against master should address the issue (although your application will still have dynamic dependency on NVRTC library)

@wahlberg82
Copy link

I did a pull of the nightly master yesterday and recompiled statically. I can confirm that this fix is working well and solving the issue. Thanks all for fixing this so quickly and being such a great community!
Cheers, Davd

jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this issue Oct 29, 2022
Summary:
Instead of dynamically loading `caffe2_nvrtc`, lazyNVRTC provides the same functionality by binding all the hooks to lazy bind implementation, very similar to the shared library jump tables:
On the first call, each function from the list tries to get a global handle to the respective shared library and replace itself with the dynamically resolved symbol, using the following template:
```
  auto fn = reinterpret_cast<decltype(&NAME)>(getCUDALibrary().sym(C10_SYMBOLIZE(NAME)));
  if (!fn)
    throw std::runtime_error("Can't get" ## NAME);
  lazyNVRTC.NAME = fn;
  return fn(...)
```
Fixes pytorch/pytorch#31985

Pull Request resolved: pytorch/pytorch#45674

Reviewed By: ezyang

Differential Revision: D24073946

Pulled By: malfet

fbshipit-source-id: 1479a75e5200e14df003144625a859d312885874
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this issue Nov 10, 2022
Summary:
Instead of dynamically loading `caffe2_nvrtc`, lazyNVRTC provides the same functionality by binding all the hooks to lazy bind implementation, very similar to the shared library jump tables:
On the first call, each function from the list tries to get a global handle to the respective shared library and replace itself with the dynamically resolved symbol, using the following template:
```
  auto fn = reinterpret_cast<decltype(&NAME)>(getCUDALibrary().sym(C10_SYMBOLIZE(NAME)));
  if (!fn)
    throw std::runtime_error("Can't get" ## NAME);
  lazyNVRTC.NAME = fn;
  return fn(...)
```
Fixes pytorch/pytorch#31985

Pull Request resolved: pytorch/pytorch#45674

Reviewed By: ezyang

Differential Revision: D24073946

Pulled By: malfet

fbshipit-source-id: 1479a75e5200e14df003144625a859d312885874
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants