Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When run“bash experiments/test.sh”, experiments/test.sh: line 20: 8344 Aborted (core dumped) #6

Closed
VanniZhou opened this issue Nov 25, 2020 · 4 comments

Comments

@VanniZhou
Copy link

Using tensorboardX

Bad key "text.kerning_factor" on line 4 in
/home/vincent/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test_patch.mplstyle.
You probably need to get an updated matplotlibrc file from
http://github.com/matplotlib/matplotlib/blob/master/matplotlibrc.template
or from the matplotlib source distribution
terminate called after throwing an instance of 'c10::Error'
what(): No CUDA GPUs are available
Exception raised from device_count_ensure_non_zero at /pytorch/c10/cuda/CUDAFunctions.cpp:111 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7faf745fa8b2 in /home/vincent/.local/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::device_count_ensure_non_zero() + 0xc5 (0x7faf748488c5 in /home/vincent/.local/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: THCudaInit(THCState*) + 0x21 (0x7faf758169d1 in /home/vincent/.local/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #3: + 0xcd9644 (0x7faf75742644 in /home/vincent/.local/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #4: at::Context::lazyInitCUDA()::{lambda()#1}::operator()() const + 0x36 (0x7faf53a9a99c in /home/vincent/CenterFusion/src/lib/model/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #5: void std::__invoke_impl<void, at::Context::lazyInitCUDA()::{lambda()#1}>(std::__invoke_other, at::Context::lazyInitCUDA()::{lambda()#1}&&) + 0x20 (0x7faf53a9b61f in /home/vincent/CenterFusion/src/lib/model/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #6: std::__invoke_resultat::Context::lazyInitCUDA()::{lambda()#1}::type std::__invokeat::Context::lazyInitCUDA()::{lambda()#1}(std::__invoke_result&&, (at::Context::lazyInitCUDA()::{lambda()#1}&&)...) + 0x35 (0x7faf53a9b36e in /home/vincent/CenterFusion/src/lib/model/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #7: std::call_onceat::Context::lazyInitCUDA()::{lambda()#1}(std::once_flag&, at::Context::lazyInitCUDA()::{lambda()#1}&&)::{lambda()#1}::operator()() const + 0x23 (0x7faf53a9ae5d in /home/vincent/CenterFusion/src/lib/model/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #8: std::call_onceat::Context::lazyInitCUDA()::{lambda()#1}(std::once_flag&, at::Context::lazyInitCUDA()::{lambda()#1}&&)::{lambda()#2}::operator()() const + 0x27 (0x7faf53a9ae95 in /home/vincent/CenterFusion/src/lib/model/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #9: std::call_onceat::Context::lazyInitCUDA()::{lambda()#1}(std::once_flag&, at::Context::lazyInitCUDA()::{lambda()#1}&&)::{lambda()#2}::_FUN() + 0xe (0x7faf53a9aea6 in /home/vincent/CenterFusion/src/lib/model/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #10: + 0xf907 (0x7fafdfee8907 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #11: + 0x3c29a (0x7faf53a9629a in /home/vincent/CenterFusion/src/lib/model/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #12: void std::call_onceat::Context::lazyInitCUDA()::{lambda()#1}(std::once_flag&, at::Context::lazyInitCUDA()::{lambda()#1}&&) + 0x82 (0x7faf53a9af3b in /home/vincent/CenterFusion/src/lib/model/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #13: at::Context::lazyInitCUDA() + 0x36 (0x7faf53a9aa10 in /home/vincent/CenterFusion/src/lib/model/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #14: + 0x40845 (0x7faf53a9a845 in /home/vincent/CenterFusion/src/lib/model/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #15: + 0x40868 (0x7faf53a9a868 in /home/vincent/CenterFusion/src/lib/model/networks/DCNv2/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #16: + 0x108f3 (0x7fafe04f98f3 in /lib64/ld-linux-x86-64.so.2)
frame #17: + 0x153bf (0x7fafe04fe3bf in /lib64/ld-linux-x86-64.so.2)
frame #18: _dl_catch_exception + 0x6f (0x7fafe025f1ef in /lib/x86_64-linux-gnu/libc.so.6)
frame #19: + 0x1498a (0x7fafe04fd98a in /lib64/ld-linux-x86-64.so.2)
frame #20: + 0xf96 (0x7fafdfcd5f96 in /lib/x86_64-linux-gnu/libdl.so.2)
frame #21: _dl_catch_exception + 0x6f (0x7fafe025f1ef in /lib/x86_64-linux-gnu/libc.so.6)
frame #22: _dl_catch_error + 0x2f (0x7fafe025f27f in /lib/x86_64-linux-gnu/libc.so.6)
frame #23: + 0x1745 (0x7fafdfcd6745 in /lib/x86_64-linux-gnu/libdl.so.2)
frame #24: dlopen + 0x71 (0x7fafdfcd6051 in /lib/x86_64-linux-gnu/libdl.so.2)

frame #27: python3() [0x5fb62d]
frame #30: python3() [0x507be4]
frame #31: python3() [0x509900]
frame #32: python3() [0x50a2fd]
frame #34: python3() [0x5095c8]
frame #35: python3() [0x50a2fd]
frame #37: python3() [0x5095c8]
frame #38: python3() [0x50a2fd]
frame #40: python3() [0x5095c8]
frame #41: python3() [0x50a2fd]
frame #43: python3() [0x5095c8]
frame #44: python3() [0x50a2fd]
frame #51: python3() [0x507be4]
frame #52: python3() [0x516069]
frame #55: python3() [0x507be4]
frame #56: python3() [0x509900]
frame #57: python3() [0x50a2fd]
frame #59: python3() [0x5095c8]
frame #60: python3() [0x50a2fd]
frame #62: python3() [0x5095c8]
frame #63: python3() [0x50a2fd]

experiments/test.sh: line 20: 8344 Aborted (core dumped) python3 test.py ddd --exp_id centerfusion --dataset nuscenes --val_split mini_val --run_dataset_eval --num_workers 4 --nuscenes_att --velocity --gpus 0 --pointcloud --radar_sweeps 3 --max_pc_dist 60.0 --pc_z_offset -0.0 --load_model ../models/centerfusion_e60.pth --flip_test

Sorry to bother you.
THX!

@mrnabati
Copy link
Owner

This seems to be a CUDA-PyTorch version incompatibility, as you are getting the No CUDA GPUs are available error. The code is tested and works with torch==1.2.0, torchvision==0.4.0 and CUDA 10.0.

@VanniZhou
Copy link
Author

i changed my works totorch==1.2.0, torchvision==0.4.0 and CUDA 10.0.
and it doesn't work either.

vincent@girlfriend:~/CenterFusion$ sudo bash experiments/test.sh
Using tensorboardX

Bad key "text.kerning_factor" on line 4 in
/home/vincent/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test_patch.mplstyle.
You probably need to get an updated matplotlibrc file from
http://github.com/matplotlib/matplotlib/blob/master/matplotlibrc.template
or from the matplotlib source distribution
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=38 : no CUDA-capable device is detected
terminate called after throwing an instance of 'std::runtime_error'
what(): cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50
experiments/test.sh: line 21: 27740 Aborted (core dumped) python3 test.py ddd --exp_id centerfusion --dataset nuscenes --val_split mini_val --run_dataset_eval --num_workers 4 --nuscenes_att --velocity --gpus 0 --pointcloud --radar_sweeps 3 --max_pc_dist 60.0 --pc_z_offset -0.0 --load_model ../models/centerfusion_e60.pth --flip_test --resume

here is deviceQuery

vincent@girlfriend:/usr/local/cuda-10.0/samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1660"
CUDA Driver Version / Runtime Version 11.0 / 10.0
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 5942 MBytes (6230114304 bytes)
(22) Multiprocessors, ( 64) CUDA Cores/MP: 1408 CUDA Cores
GPU Max Clock rate: 1785 MHz (1.78 GHz)
Memory Clock rate: 4001 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.0, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS

here is nvidia-smi

vincent@girlfriend:~/CenterFusion$ nvidia-smi
Thu Nov 26 22:54:55 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1660 Off | 00000000:01:00.0 On | N/A |
| 43% 34C P8 14W / 120W | 625MiB / 5941MiB | 34% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

here is nvcc -V

vincent@girlfriend:~/CenterFusion$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

i want to know why there is no CUDA device, how can i fix this problem.
THX!

@xiaocai506
Copy link

i changed my works totorch==1.2.0, torchvision==0.4.0 and CUDA 10.0. and it doesn't work either.

vincent@girlfriend:~/CenterFusion$ sudo bash experiments/test.sh
Using tensorboardX

Bad key "text.kerning_factor" on line 4 in
/home/vincent/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test_patch.mplstyle.
You probably need to get an updated matplotlibrc file from
http://github.com/matplotlib/matplotlib/blob/master/matplotlibrc.template
or from the matplotlib source distribution
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=38 : no CUDA-capable device is detected
terminate called after throwing an instance of 'std::runtime_error'
what(): cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50
experiments/test.sh: line 21: 27740 Aborted (core dumped) python3 test.py ddd --exp_id centerfusion --dataset nuscenes --val_split mini_val --run_dataset_eval --num_workers 4 --nuscenes_att --velocity --gpus 0 --pointcloud --radar_sweeps 3 --max_pc_dist 60.0 --pc_z_offset -0.0 --load_model ../models/centerfusion_e60.pth --flip_test --resume

here is deviceQuery

vincent@girlfriend:/usr/local/cuda-10.0/samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1660"
CUDA Driver Version / Runtime Version 11.0 / 10.0
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 5942 MBytes (6230114304 bytes)
(22) Multiprocessors, ( 64) CUDA Cores/MP: 1408 CUDA Cores
GPU Max Clock rate: 1785 MHz (1.78 GHz)
Memory Clock rate: 4001 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.0, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS

here is nvidia-smi

vincent@girlfriend:~/CenterFusion$ nvidia-smi
Thu Nov 26 22:54:55 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1660 Off | 00000000:01:00.0 On | N/A |
| 43% 34C P8 14W / 120W | 625MiB / 5941MiB | 34% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

here is nvcc -V

vincent@girlfriend:~/CenterFusion$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

i want to know why there is no CUDA device, how can i fix this problem.
THX!

if you have only one gpu card,modify the code in file “experiments/test.sh”:
from
export CUDA_VISIBLE_DEVICES=1
to
export CUDA_VISIBLE_DEVICES=0

@VanniZhou
Copy link
Author

@xiaocai506
THX!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants