Simple C++ custom autograd function code throws error "CUDA error: driver shutting down" #35736

yf225 · 2020-03-31T01:36:48Z

🐛 Bug

Running the following code in cuda-enabled libtorch throws error "CUDA error: driver shutting down", even though the code doesn't use CUDA. Running the same code in cpu-only libtorch doesn't throw any error.

#include <iostream>
#include <torch/torch.h>

using namespace torch::autograd;

class MulConstant : public Function<MulConstant> {
 public:
  static Variable forward(AutogradContext *ctx, Variable variable, double constant) {
    ctx->saved_data["constant"] = constant;
    return variable * constant;
  }

  static variable_list backward(AutogradContext *ctx, variable_list grad_outputs) {
    return {grad_outputs[0] * ctx->saved_data["constant"].toDouble(), Variable()};
  }
};

int main(int argc, char* argv[])
{
  auto x = torch::randn({2}).requires_grad_();
  auto y = MulConstant::apply(x, 5.5);
  y.sum().backward();
  std::cout << x.grad() << std::endl;
}

Error:

terminate called after throwing an instance of 'c10::Error'
terminate called recursively
  what():  CUDA error: driver shutting down (setDevice at /pytorch/c10/cuda/impl/CUDAGuardImpl.h:42)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7fedc6342656 in /data/libtorch/libtorch_nightly_cu92/libtorch/lib/libc10.so)
frame #1: <unknown function> + 0xc6c2 (0x7fed7bad26c2 in /data/libtorch/libtorch_nightly_cu92/libtorch/lib/libc10_cuda.so)
frame #2: torch::autograd::Engine::set_device(int) + 0x159 (0x7fedb9c36b39 in /data/libtorch/libtorch_nightly_cu92/libtorch/lib/libtorch_cpu.so)
frame #3: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x34 (0x7fedb9c39064 in /data/libtorch/libtorch_nightly_cu92/libtorch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0xc70f (0x7fedc657b70f in /data/libtorch/libtorch_nightly_cu92/libtorch/lib/libtorch.so)
frame #5: <unknown function> + 0x76ba (0x7fed7c3756ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x6d (0x7fed7c8bc41d in /lib/x86_64-linux-gnu/libc.so.6)
Aborted (core dumped)

Better backtrace:

Thread 4 "example-app" hit Catchpoint 1 (exception thrown), 0x00007fffccab38bd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007fffccab38bd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007fffc4d14ab9 in c10::cuda::impl::CUDAGuardImpl::getDevice (this=0x997920) at ../c10/cuda/impl/CUDAGuardImpl.h:37
#2  0x00007fffc4d14ed6 in c10::cuda::impl::CUDAGuardImpl::setDevice (this=0x997920, d=...) at ../c10/cuda/impl/CUDAGuardImpl.h:51
#3  0x00007ffff0f101db in torch::autograd::Engine::set_device (this=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>, device=1) at ../torch/csrc/autograd/engine.cpp:264
#4  0x00007ffff0f1034d in torch::autograd::Engine::thread_init (this=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>, device=1, ready_queue=std::shared_ptr (count 2, weak 0) 0x1b33aa0)
    at ../torch/csrc/autograd/engine.cpp:293
#5  0x00007ffff0f3613e in std::_Mem_fn_base<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&), true>::operator()<int, std::shared_ptr<torch::autograd::ReadyQueue>, void>(torch::autograd::Engine*, int&&, std::shared_ptr<torch::autograd::ReadyQueue>&&) const (this=0x1b340d8, __object=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>) at /usr/include/c++/5/functional:600
#6  0x00007ffff0f360a1 in std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (this=0x1b340b8) at /usr/include/c++/5/functional:1531
#7  0x00007ffff0f35cb8 in std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)>::operator()() (this=0x1b340b8) at /usr/include/c++/5/functional:1520
#8  0x00007ffff0f35ac8 in std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)> >::_M_run() (this=0x1b340a0) at /usr/include/c++/5/thread:115
#9  0x00007fffccadec80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007fffcbafa6ba in start_thread (arg=0x7fff9cd8d700) at pthread_create.c:333
#11 0x00007fffcc24441d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Update: I noticed that if we initialize a cuda tensor (e.g. auto cuda_tensor = torch::randn({3, 4}, torch::kCUDA); std::cout << cuda_tensor << std::endl;) before running the C++ custom autograd function, the whole thing would pass and there is no error.

Expected behavior

It should just work without throwing any error.

Environment

Latest libtorch nightly

cc @ezyang @ssnl @albanD @zou3519 @gqchen @yf225

The text was updated successfully, but these errors were encountered:

albanD · 2020-03-31T13:58:30Z

cc @wanchaol that changed that code recently.

wanchaol · 2020-03-31T17:54:41Z

cc @wanchaol that changed that code recently.

Thanks I'm looking into it.

yf225 · 2020-03-31T20:26:13Z

Update: I noticed that if we initialize a cuda tensor (e.g. auto cuda_tensor = torch::randn({3, 4}, torch::kCUDA); std::cout << cuda_tensor << std::endl;) before running the C++ custom autograd function, the whole thing would pass and there is no error.

ezyang · 2020-03-31T21:45:05Z

This probably has something to do with initialization of CUDA happening inside the autograd engine, and then the destructor gets called too early at destruction time, before the autograd destructors happen. I wonder if there's a way for the autograd engine to "promote" the destructor so that it stays live until autograd is done shutting down.

yf225 · 2020-03-31T22:20:07Z

Update: I obtained the following backtrace:

Thread 4 "example-app" hit Catchpoint 1 (exception thrown), 0x00007fffccab38bd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007fffccab38bd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007fffc4d14ab9 in c10::cuda::impl::CUDAGuardImpl::getDevice (this=0x997920) at ../c10/cuda/impl/CUDAGuardImpl.h:37
#2  0x00007fffc4d14ed6 in c10::cuda::impl::CUDAGuardImpl::setDevice (this=0x997920, d=...) at ../c10/cuda/impl/CUDAGuardImpl.h:51
#3  0x00007ffff0f101db in torch::autograd::Engine::set_device (this=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>, device=1) at ../torch/csrc/autograd/engine.cpp:264
#4  0x00007ffff0f1034d in torch::autograd::Engine::thread_init (this=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>, device=1, ready_queue=std::shared_ptr (count 2, weak 0) 0x1b33aa0)
    at ../torch/csrc/autograd/engine.cpp:293
#5  0x00007ffff0f3613e in std::_Mem_fn_base<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&), true>::operator()<int, std::shared_ptr<torch::autograd::ReadyQueue>, void>(torch::autograd::Engine*, int&&, std::shared_ptr<torch::autograd::ReadyQueue>&&) const (this=0x1b340d8, __object=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>) at /usr/include/c++/5/functional:600
#6  0x00007ffff0f360a1 in std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (this=0x1b340b8) at /usr/include/c++/5/functional:1531
#7  0x00007ffff0f35cb8 in std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)>::operator()() (this=0x1b340b8) at /usr/include/c++/5/functional:1520
#8  0x00007ffff0f35ac8 in std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)> >::_M_run() (this=0x1b340a0) at /usr/include/c++/5/thread:115
#9  0x00007fffccadec80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007fffcbafa6ba in start_thread (arg=0x7fff9cd8d700) at pthread_create.c:333
#11 0x00007fffcc24441d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Also @wanchaol and I found that this bug only reproduces when an external C++ program is linked to libtorch, not when the test script itself is in our test suite (i.e. running it as the only test in test/cpp/api/autograd.cpp can't repro the problem).

wanchaol · 2020-03-31T22:24:07Z

Update: I obtained the following backtrace:

Thread 4 "example-app" hit Catchpoint 1 (exception thrown), 0x00007fffccab38bd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007fffccab38bd in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007fffc4d14ab9 in c10::cuda::impl::CUDAGuardImpl::getDevice (this=0x997920) at ../c10/cuda/impl/CUDAGuardImpl.h:37
#2  0x00007fffc4d14ed6 in c10::cuda::impl::CUDAGuardImpl::setDevice (this=0x997920, d=...) at ../c10/cuda/impl/CUDAGuardImpl.h:51
#3  0x00007ffff0f101db in torch::autograd::Engine::set_device (this=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>, device=1) at ../torch/csrc/autograd/engine.cpp:264
#4  0x00007ffff0f1034d in torch::autograd::Engine::thread_init (this=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>, device=1, ready_queue=std::shared_ptr (count 2, weak 0) 0x1b33aa0)
    at ../torch/csrc/autograd/engine.cpp:293
#5  0x00007ffff0f3613e in std::_Mem_fn_base<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&), true>::operator()<int, std::shared_ptr<torch::autograd::ReadyQueue>, void>(torch::autograd::Engine*, int&&, std::shared_ptr<torch::autograd::ReadyQueue>&&) const (this=0x1b340d8, __object=0x7ffff7b16bc0 <torch::autograd::Engine::get_base_engine()::engine>) at /usr/include/c++/5/functional:600
#6  0x00007ffff0f360a1 in std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (this=0x1b340b8) at /usr/include/c++/5/functional:1531
#7  0x00007ffff0f35cb8 in std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)>::operator()() (this=0x1b340b8) at /usr/include/c++/5/functional:1520
#8  0x00007ffff0f35ac8 in std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(int, std::shared_ptr<torch::autograd::ReadyQueue> const&)> (torch::autograd::Engine*, int, std::shared_ptr<torch::autograd::ReadyQueue>)> >::_M_run() (this=0x1b340a0) at /usr/include/c++/5/thread:115
#9  0x00007fffccadec80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007fffcbafa6ba in start_thread (arg=0x7fff9cd8d700) at pthread_create.c:333
#11 0x00007fffcc24441d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Also it seems that this bug only reproduces when an external C++ program is linked to libtorch, not when the test script itself is in our test suite (i.e. running it as the only test in test/cpp/api/autograd.cpp can't repro the problem).

Thanks for the backtrace, yeah I am trying to link external cpp with the test but it failed under the cuda library linking failure, I will need to figure out the build issue.

Do you know what is the difference between linking in our internal test suite and the external linking? Apart from this problem, I think we should make our internal test suite to be identical with external linking so that we could catch this problem directly in the tests.

yf225 · 2020-03-31T22:31:19Z

Thanks for the backtrace, yeah I am trying to link external cpp with the test but it failed under the cuda library linking failure, I will need to figure out the build issue.

Do you know what is the difference between linking in our internal test suite and the external linking? Apart from this problem, I think we should make our internal test suite to be identical with external linking so that we could catch this problem directly in the tests.

Yes I agreed, I think this is the CMakeLists.txt file we use to build the internal test suite: https://github.com/pytorch/pytorch/blob/master/test/cpp/api/CMakeLists.txt. And I suspect if this block is what makes internal test suite working:

pytorch/test/cpp/api/CMakeLists.txt

Lines 44 to 50 in 9650f46

    
           target_link_libraries(test_api PRIVATE 
        
             ${CUDA_LIBRARIES} 
        
             ${CUDA_NVRTC_LIB} 
        
             ${CUDA_CUDA_LIB} 
        
             ${TORCH_CUDA_LIBRARIES}) 
        
           target_compile_definitions(test_api PRIVATE "USE_CUDA")

wanchaol · 2020-04-02T06:35:46Z

@yf225 I tried downloading libtorch cu10 from pytorch.org, and locally build the cpp application, still could not repro, works fine on my side without the exception.

 …/build  cmake --build . --config Release
Scanning dependencies of target autogradtest
[ 50%] Building CXX object CMakeFiles/autogradtest.dir/autogradtest.cpp.o
[100%] Linking CXX executable autogradtest
[100%] Built target autogradtest
 …/build  ./autogradtest
 5.5000
 5.5000
[ CPUFloatType{2} ]

endeavormoquan · 2020-09-07T06:53:48Z

Hi, I think this is a workaround for this problem.

int main(){
  torch::cuda::is_available();  // add this line will let everything OK.
  torch::Tensor x = torch::ones({2,2}, c10::TensorOptions().device(torch::kCPU).requires_grad(true));
  x.mean().backward();
  return 0;
}

Just a simple CUDA related function called before initializing a tensor will let everything OK, even though the tensor does not need CUDA support.
Without torch::cuda::is_available(), I will get the error:

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: driver shutting down
Exception raised from getDevice at ../c10/cuda/impl/CUDAGuardImpl.h:37 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x69 (0x7f06e5ccdeb9 in /path/to/libtorch/lib/libc10.so)
frame #1: <unknown function> + 0x1555a (0x7f069ce7855a in /path/to/libtorch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x339c0a5 (0x7f06d8af90a5 in /path/to/libtorch/lib/libtorch_cpu.so)
frame #3: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x3a (0x7f06d8affcea in /path/to/libtorch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0xceeee (0x7f06e5ff0eee in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x76db (0x7f069dc2d6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x3f (0x7f069e52988f in /lib/x86_64-linux-gnu/libc.so.6)

Aborted (core dumped)

I am using LibTorch 1.6.0 with cuda10.1. The LibTorch was downloaded from https://download.pytorch.org/libtorch/cu101/libtorch-cxx11-abi-shared-with-deps-1.6.0%2Bcu101.zip.

The CMakeLists.txt is below:

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(test)

set(CMAKE_PREFIX_PATH "/path/to/libtorch")
find_package(Torch REQUIRED)

add_executable(test mautograd.cpp)
target_link_libraries(test "${TORCH_LIBRARIES}")
set_property(TARGET test PROPERTY CXX_STANDARD 14)

albanD · 2021-01-22T15:56:40Z

@yf225 do you know if this is still an issue in latest versions?

edlanglois · 2021-05-04T19:35:12Z

I am currently experiencing what seems to be the same issue via Rust bindings for the C++ API.

Versions (Arch linux packages)

python-pytorch-cuda: 1.8.1-4
cuda: 11.3.0-1
cudnn: 8.2.0.53-1
tch-rs (not packaged) commit 8b16e2e after 0.4.0

Code:

use tch::{Device, Kind, Tensor};

fn main() {
    let x = Tensor::zeros(&[1], (Kind::Float, Device::Cpu)).requires_grad_(true);
    x.backward();
}

Output:

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: driver shutting down
Exception raised from getDevice at ../c10/cuda/impl/CUDAGuardImpl.h:37 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x68 (0x7f460dd6f7c8 in /usr/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xf3 (0x7f460dd38b87 in /usr/lib/libc10.so)
frame #2: <unknown function> + 0xd28c (0x7f460d6dd28c in /usr/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x3566820 (0x7f46112fe820 in /usr/lib/libtorch_cpu.so)
frame #4: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x3b (0x7f46112ffc1b in /usr/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0xcfbc4 (0x7f4695c77bc4 in /usr/lib/libstdc++.so.6)
frame #6: <unknown function> + 0x9299 (0x7f460dcc9299 in /usr/lib/libpthread.so.0)
frame #7: clone + 0x43 (0x7f460da9f053 in /usr/lib/libc.so.6)

And adding tch::Cuda::is_available() to the first line of main() prevents the crash.

Mous-Anony · 2021-09-22T13:58:51Z

still not fixed with the latest libtorch_1.9.0_cuda_10.2, simple back-prop on the cpu would cause this error.

torch::Tensor x = torch::tensor(1.0, torch::requires_grad());
torch::Tensor w = torch::tensor(2.0, torch::requires_grad());
torch::Tensor b = torch::tensor(3.0, torch::requires_grad());
auto y = w * x + b;
y.backward();

yf225 added module: autograd Related to torch.autograd, and the autograd engine in general module: cpp Related to C++ API triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 31, 2020

guillaume-be mentioned this issue Nov 5, 2020

Custom functions LaurentMazare/tch-rs#279

Open

reverendbedford mentioned this issue Dec 11, 2022

Test 15 - ADSecDerivModel fails on Ubuntu reverendbedford/neml2#44

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple C++ custom autograd function code throws error "CUDA error: driver shutting down" #35736

Simple C++ custom autograd function code throws error "CUDA error: driver shutting down" #35736

yf225 commented Mar 31, 2020 •

edited

albanD commented Mar 31, 2020

wanchaol commented Mar 31, 2020

yf225 commented Mar 31, 2020

ezyang commented Mar 31, 2020

yf225 commented Mar 31, 2020 •

edited

wanchaol commented Mar 31, 2020 •

edited

yf225 commented Mar 31, 2020 •

edited

wanchaol commented Apr 2, 2020

endeavormoquan commented Sep 7, 2020

albanD commented Jan 22, 2021

edlanglois commented May 4, 2021 •

edited

Mous-Anony commented Sep 22, 2021

Simple C++ custom autograd function code throws error "CUDA error: driver shutting down" #35736

Simple C++ custom autograd function code throws error "CUDA error: driver shutting down" #35736

Comments

yf225 commented Mar 31, 2020 • edited

🐛 Bug

Expected behavior

Environment

albanD commented Mar 31, 2020

wanchaol commented Mar 31, 2020

yf225 commented Mar 31, 2020

ezyang commented Mar 31, 2020

yf225 commented Mar 31, 2020 • edited

wanchaol commented Mar 31, 2020 • edited

yf225 commented Mar 31, 2020 • edited

wanchaol commented Apr 2, 2020

endeavormoquan commented Sep 7, 2020

albanD commented Jan 22, 2021

edlanglois commented May 4, 2021 • edited

Mous-Anony commented Sep 22, 2021

yf225 commented Mar 31, 2020 •

edited

yf225 commented Mar 31, 2020 •

edited

wanchaol commented Mar 31, 2020 •

edited

yf225 commented Mar 31, 2020 •

edited

edlanglois commented May 4, 2021 •

edited