Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple C++ custom autograd function code throws error "CUDA error: driver shutting down" #35736

Open
yf225 opened this issue Mar 31, 2020 · 12 comments
Labels
module: autograd Related to torch.autograd, and the autograd engine in general module: cpp Related to C++ API triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@yf225
Copy link
Contributor

yf225 commented Mar 31, 2020

🐛 Bug

Running the following code in cuda-enabled libtorch throws error "CUDA error: driver shutting down", even though the code doesn't use CUDA. Running the same code in cpu-only libtorch doesn't throw any error.

#include <iostream>
#include <torch/torch.h>

using namespace torch::autograd;

class MulConstant : public Function<MulConstant> {
 public:
  static Variable forward(AutogradContext *ctx, Variable variable, double constant) {
    ctx->saved_data["constant"] = constant;
    return variable * constant;
  }

  static variable_list backward(AutogradContext *ctx, variable_list grad_outputs) {
    return {grad_outputs[0] * ctx->saved_data["constant"].toDouble(), Variable()};
  }
};

int main(int argc, char* argv[])
{
  auto x = torch::randn({2}).requires_grad_();
  auto y = MulConstant::apply(x, 5.5);
  y.sum().backward();
  std::cout << x.grad() << std::endl;
}

Error:

terminate called after throwing an instance of 'c10::Error'
terminate called recursively
  what():  CUDA error: driver shutting down (setDevice at /pytorch/c10/cuda/impl/CUDAGuardImpl.h:42)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7fedc6342656 in /data/libtorch/libtorch_nightly_cu92/libtorch/lib/libc10.so)
frame #1: <unknown function> + 0xc6c2 (0x7fed7bad26c2 in /data/libtorch/libtorch_nightly_cu92/libtorch/lib/libc10_cuda.so)
frame #2: torch::autograd::Engine::set_device(int) + 0x159 (0x7fedb9c36b39 in /data/libtorch/libtorch_nightly_cu92/libtorch/lib/libtorch_cpu.so)
frame #3: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x34 (0x7fedb9c39064 in /data/libtorch/libtorch_nightly_cu92/libtorch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0xc70f (0x7fedc657b70f in /data/libtorch/libtorch_nightly_cu92/libtorch/lib/libtorch.so)
frame #5: <unknown function> + 0x76ba (0x7fed7c3756ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x6d (0x7fed7c8bc41d in /lib/x86_64-linux-gnu/libc.so.6)
Aborted (core dumped)

Better backtrace:

Thread 4 "example-app" hit Catchpoint 1 (exception thrown), 0x00007fffccab38bd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007fffccab38bd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007fffc4d14ab9 in c10::cuda::impl::CUDAGuardImpl::getDevice (this=0x997920) at ../c10/cuda/impl/CUDAGuardImpl.h:37
#2  0x00007fffc4d14ed6 in c10::cuda::impl::CUDAGuardImpl::setDevice (this=0x997920, d=...) at ../c10/cuda/impl/CUDAGuardImpl.h:51
#3  0x00007ffff0f101db in torch::autograd::Engine::set_device (this=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>, device=1) at ../torch/csrc/autograd/engine.cpp:264
#4  0x00007ffff0f1034d in torch::autograd::Engine::thread_init (this=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>, device=1, ready_queue=std::shared_ptr (count 2, weak 0) 0x1b33aa0)
    at ../torch/csrc/autograd/engine.cpp:293
#5  0x00007ffff0f3613e in std::_Mem_fn_base<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&), true>::operator()<int, std::shared_ptr<torch::autograd::ReadyQueue>, void>(torch::autograd::Engine*, int&&, std::shared_ptr<torch::autograd::ReadyQueue>&&) const (this=0x1b340d8, __object=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>) at /usr/include/c++/5/functional:600
#6  0x00007ffff0f360a1 in std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (this=0x1b340b8) at /usr/include/c++/5/functional:1531
#7  0x00007ffff0f35cb8 in std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)>::operator()() (this=0x1b340b8) at /usr/include/c++/5/functional:1520
#8  0x00007ffff0f35ac8 in std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)> >::_M_run() (this=0x1b340a0) at /usr/include/c++/5/thread:115
#9  0x00007fffccadec80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007fffcbafa6ba in start_thread (arg=0x7fff9cd8d700) at pthread_create.c:333
#11 0x00007fffcc24441d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Update: I noticed that if we initialize a cuda tensor (e.g. auto cuda_tensor = torch::randn({3, 4}, torch::kCUDA); std::cout << cuda_tensor << std::endl;) before running the C++ custom autograd function, the whole thing would pass and there is no error.

Expected behavior

It should just work without throwing any error.

Environment

Latest libtorch nightly

cc @ezyang @ssnl @albanD @zou3519 @gqchen @yf225

@yf225 yf225 added module: autograd Related to torch.autograd, and the autograd engine in general module: cpp Related to C++ API triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 31, 2020
@albanD
Copy link
Collaborator

albanD commented Mar 31, 2020

cc @wanchaol that changed that code recently.

@wanchaol
Copy link
Contributor

cc @wanchaol that changed that code recently.

Thanks I'm looking into it.

@yf225
Copy link
Contributor Author

yf225 commented Mar 31, 2020

Update: I noticed that if we initialize a cuda tensor (e.g. auto cuda_tensor = torch::randn({3, 4}, torch::kCUDA); std::cout << cuda_tensor << std::endl;) before running the C++ custom autograd function, the whole thing would pass and there is no error.

@ezyang
Copy link
Contributor

ezyang commented Mar 31, 2020

This probably has something to do with initialization of CUDA happening inside the autograd engine, and then the destructor gets called too early at destruction time, before the autograd destructors happen. I wonder if there's a way for the autograd engine to "promote" the destructor so that it stays live until autograd is done shutting down.

@yf225
Copy link
Contributor Author

yf225 commented Mar 31, 2020

Update: I obtained the following backtrace:

Thread 4 "example-app" hit Catchpoint 1 (exception thrown), 0x00007fffccab38bd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007fffccab38bd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007fffc4d14ab9 in c10::cuda::impl::CUDAGuardImpl::getDevice (this=0x997920) at ../c10/cuda/impl/CUDAGuardImpl.h:37
#2  0x00007fffc4d14ed6 in c10::cuda::impl::CUDAGuardImpl::setDevice (this=0x997920, d=...) at ../c10/cuda/impl/CUDAGuardImpl.h:51
#3  0x00007ffff0f101db in torch::autograd::Engine::set_device (this=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>, device=1) at ../torch/csrc/autograd/engine.cpp:264
#4  0x00007ffff0f1034d in torch::autograd::Engine::thread_init (this=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>, device=1, ready_queue=std::shared_ptr (count 2, weak 0) 0x1b33aa0)
    at ../torch/csrc/autograd/engine.cpp:293
#5  0x00007ffff0f3613e in std::_Mem_fn_base<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&), true>::operator()<int, std::shared_ptr<torch::autograd::ReadyQueue>, void>(torch::autograd::Engine*, int&&, std::shared_ptr<torch::autograd::ReadyQueue>&&) const (this=0x1b340d8, __object=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>) at /usr/include/c++/5/functional:600
#6  0x00007ffff0f360a1 in std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (this=0x1b340b8) at /usr/include/c++/5/functional:1531
#7  0x00007ffff0f35cb8 in std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)>::operator()() (this=0x1b340b8) at /usr/include/c++/5/functional:1520
#8  0x00007ffff0f35ac8 in std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)> >::_M_run() (this=0x1b340a0) at /usr/include/c++/5/thread:115
#9  0x00007fffccadec80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007fffcbafa6ba in start_thread (arg=0x7fff9cd8d700) at pthread_create.c:333
#11 0x00007fffcc24441d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Also @wanchaol and I found that this bug only reproduces when an external C++ program is linked to libtorch, not when the test script itself is in our test suite (i.e. running it as the only test in test/cpp/api/autograd.cpp can't repro the problem).

@wanchaol
Copy link
Contributor

wanchaol commented Mar 31, 2020

Update: I obtained the following backtrace:

Thread 4 "example-app" hit Catchpoint 1 (exception thrown), 0x00007fffccab38bd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007fffccab38bd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007fffc4d14ab9 in c10::cuda::impl::CUDAGuardImpl::getDevice (this=0x997920) at ../c10/cuda/impl/CUDAGuardImpl.h:37
#2  0x00007fffc4d14ed6 in c10::cuda::impl::CUDAGuardImpl::setDevice (this=0x997920, d=...) at ../c10/cuda/impl/CUDAGuardImpl.h:51
#3  0x00007ffff0f101db in torch::autograd::Engine::set_device (this=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>, device=1) at ../torch/csrc/autograd/engine.cpp:264
#4  0x00007ffff0f1034d in torch::autograd::Engine::thread_init (this=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>, device=1, ready_queue=std::shared_ptr (count 2, weak 0) 0x1b33aa0)
    at ../torch/csrc/autograd/engine.cpp:293
#5  0x00007ffff0f3613e in std::_Mem_fn_base<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&), true>::operator()<int, std::shared_ptr<torch::autograd::ReadyQueue>, void>(torch::autograd::Engine*, int&&, std::shared_ptr<torch::autograd::ReadyQueue>&&) const (this=0x1b340d8, __object=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>) at /usr/include/c++/5/functional:600
#6  0x00007ffff0f360a1 in std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (this=0x1b340b8) at /usr/include/c++/5/functional:1531
#7  0x00007ffff0f35cb8 in std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)>::operator()() (this=0x1b340b8) at /usr/include/c++/5/functional:1520
#8  0x00007ffff0f35ac8 in std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)> >::_M_run() (this=0x1b340a0) at /usr/include/c++/5/thread:115
#9  0x00007fffccadec80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007fffcbafa6ba in start_thread (arg=0x7fff9cd8d700) at pthread_create.c:333
#11 0x00007fffcc24441d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Also it seems that this bug only reproduces when an external C++ program is linked to libtorch, not when the test script itself is in our test suite (i.e. running it as the only test in test/cpp/api/autograd.cpp can't repro the problem).

Thanks for the backtrace, yeah I am trying to link external cpp with the test but it failed under the cuda library linking failure, I will need to figure out the build issue.

Do you know what is the difference between linking in our internal test suite and the external linking? Apart from this problem, I think we should make our internal test suite to be identical with external linking so that we could catch this problem directly in the tests.

@yf225
Copy link
Contributor Author

yf225 commented Mar 31, 2020

Thanks for the backtrace, yeah I am trying to link external cpp with the test but it failed under the cuda library linking failure, I will need to figure out the build issue.

Do you know what is the difference between linking in our internal test suite and the external linking? Apart from this problem, I think we should make our internal test suite to be identical with external linking so that we could catch this problem directly in the tests.

Yes I agreed, I think this is the CMakeLists.txt file we use to build the internal test suite: https://github.com/pytorch/pytorch/blob/master/test/cpp/api/CMakeLists.txt. And I suspect if this block is what makes internal test suite working:

target_link_libraries(test_api PRIVATE
${CUDA_LIBRARIES}
${CUDA_NVRTC_LIB}
${CUDA_CUDA_LIB}
${TORCH_CUDA_LIBRARIES})
target_compile_definitions(test_api PRIVATE "USE_CUDA")

@wanchaol
Copy link
Contributor

wanchaol commented Apr 2, 2020

@yf225 I tried downloading libtorch cu10 from pytorch.org, and locally build the cpp application, still could not repro, works fine on my side without the exception.

 …/build  cmake --build . --config Release
Scanning dependencies of target autogradtest
[ 50%] Building CXX object CMakeFiles/autogradtest.dir/autogradtest.cpp.o
[100%] Linking CXX executable autogradtest
[100%] Built target autogradtest
 …/build  ./autogradtest
 5.5000
 5.5000
[ CPUFloatType{2} ]

@endeavormoquan
Copy link

Hi, I think this is a workaround for this problem.

int main(){
  torch::cuda::is_available();  // add this line will let everything OK.
  torch::Tensor x = torch::ones({2,2}, c10::TensorOptions().device(torch::kCPU).requires_grad(true));
  x.mean().backward();
  return 0;
}

Just a simple CUDA related function called before initializing a tensor will let everything OK, even though the tensor does not need CUDA support.
Without torch::cuda::is_available(), I will get the error:

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: driver shutting down
Exception raised from getDevice at ../c10/cuda/impl/CUDAGuardImpl.h:37 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x69 (0x7f06e5ccdeb9 in /path/to/libtorch/lib/libc10.so)
frame #1: <unknown function> + 0x1555a (0x7f069ce7855a in /path/to/libtorch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x339c0a5 (0x7f06d8af90a5 in /path/to/libtorch/lib/libtorch_cpu.so)
frame #3: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x3a (0x7f06d8affcea in /path/to/libtorch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0xceeee (0x7f06e5ff0eee in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x76db (0x7f069dc2d6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x3f (0x7f069e52988f in /lib/x86_64-linux-gnu/libc.so.6)

Aborted (core dumped)

I am using LibTorch 1.6.0 with cuda10.1. The LibTorch was downloaded from https://download.pytorch.org/libtorch/cu101/libtorch-cxx11-abi-shared-with-deps-1.6.0%2Bcu101.zip.

The CMakeLists.txt is below:

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(test)

set(CMAKE_PREFIX_PATH "/path/to/libtorch")
find_package(Torch REQUIRED)

add_executable(test mautograd.cpp)
target_link_libraries(test "${TORCH_LIBRARIES}")
set_property(TARGET test PROPERTY CXX_STANDARD 14)

@albanD
Copy link
Collaborator

albanD commented Jan 22, 2021

@yf225 do you know if this is still an issue in latest versions?

@edlanglois
Copy link

edlanglois commented May 4, 2021

I am currently experiencing what seems to be the same issue via Rust bindings for the C++ API.

Versions (Arch linux packages)

  • python-pytorch-cuda: 1.8.1-4
  • cuda: 11.3.0-1
  • cudnn: 8.2.0.53-1
  • tch-rs (not packaged) commit 8b16e2e after 0.4.0

Code:

use tch::{Device, Kind, Tensor};

fn main() {
    let x = Tensor::zeros(&[1], (Kind::Float, Device::Cpu)).requires_grad_(true);
    x.backward();
}

Output:

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: driver shutting down
Exception raised from getDevice at ../c10/cuda/impl/CUDAGuardImpl.h:37 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x68 (0x7f460dd6f7c8 in /usr/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xf3 (0x7f460dd38b87 in /usr/lib/libc10.so)
frame #2: <unknown function> + 0xd28c (0x7f460d6dd28c in /usr/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x3566820 (0x7f46112fe820 in /usr/lib/libtorch_cpu.so)
frame #4: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x3b (0x7f46112ffc1b in /usr/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0xcfbc4 (0x7f4695c77bc4 in /usr/lib/libstdc++.so.6)
frame #6: <unknown function> + 0x9299 (0x7f460dcc9299 in /usr/lib/libpthread.so.0)
frame #7: clone + 0x43 (0x7f460da9f053 in /usr/lib/libc.so.6)

And adding tch::Cuda::is_available() to the first line of main() prevents the crash.

@Mous-Anony
Copy link

still not fixed with the latest libtorch_1.9.0_cuda_10.2, simple back-prop on the cpu would cause this error.

torch::Tensor x = torch::tensor(1.0, torch::requires_grad());
torch::Tensor w = torch::tensor(2.0, torch::requires_grad());
torch::Tensor b = torch::tensor(3.0, torch::requires_grad());
auto y = w * x + b;
y.backward();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: autograd Related to torch.autograd, and the autograd engine in general module: cpp Related to C++ API triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

7 participants