You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I have encountered an error when trying to train the model. I tried different datasets, but the error remains. Thank you for helping me!
My env: Python 3.7.0 and Pytorch 1.10.1
The complete error message goes as follows:
terminate called after throwing an instance of 'c10::HIPError' | 0/500 [00:00<?, ?it/s]
what(): HIP error: hipErrorNoDevice
HIP kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Exception raised from deviceCount at /pytorch/aten/src/ATen/hip/impl/HIPGuardImplMasqueradingAsCUDA.h:102 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f9eff1e7212 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x5618da (0x7f9f14f4a8da in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_hip.so)
frame #2: torch::autograd::Engine::start_device_threads() + 0x21a (0x7f9f4b4d82ca in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #3: + 0xf907 (0x7f9f61c86907 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #4: torch::autograd::Engine::initialize_device_threads_pool() + 0xcd (0x7f9f4b4d70bd in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::autograd::Engine::execute_with_graph_task(std::shared_ptrtorch::autograd::GraphTask const&, std::shared_ptrtorch::autograd::Node, torch::autograd::InputBuffer&&) + 0x28 (0x7f9f4b4ddf78 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: torch::autograd::python::PythonEngine::execute_with_graph_task(std::shared_ptrtorch::autograd::GraphTask const&, std::shared_ptrtorch::autograd::Node, torch::autograd::InputBuffer&&) + 0x3c (0x7f9f5ed7ba3c in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: torch::autograd::Engine::execute(std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&, std::vector<at::Tensor, std::allocatorat::Tensor > const&, bool, bool, bool, std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&) + 0x900 (0x7f9f4b4dc1a0 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::python::PythonEngine::execute(std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&, std::vector<at::Tensor, std::allocatorat::Tensor > const&, bool, bool, bool, std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&) + 0x56 (0x7f9f5ed7b996 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: THPEngine_run_backward(_object*, _object*, _object*) + 0x9d4 (0x7f9f5ed7c4a4 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #31: __libc_start_main + 0xe7 (0x7f9f618a7c87 in /lib/x86_64-linux-gnu/libc.so.6)
The text was updated successfully, but these errors were encountered:
Hi! I have encountered an error when trying to train the model. I tried different datasets, but the error remains. Thank you for helping me!
My env: Python 3.7.0 and Pytorch 1.10.1
The complete error message goes as follows:
terminate called after throwing an instance of 'c10::HIPError' | 0/500 [00:00<?, ?it/s]
what(): HIP error: hipErrorNoDevice
HIP kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Exception raised from deviceCount at /pytorch/aten/src/ATen/hip/impl/HIPGuardImplMasqueradingAsCUDA.h:102 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f9eff1e7212 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x5618da (0x7f9f14f4a8da in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_hip.so)
frame #2: torch::autograd::Engine::start_device_threads() + 0x21a (0x7f9f4b4d82ca in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #3: + 0xf907 (0x7f9f61c86907 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #4: torch::autograd::Engine::initialize_device_threads_pool() + 0xcd (0x7f9f4b4d70bd in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::autograd::Engine::execute_with_graph_task(std::shared_ptrtorch::autograd::GraphTask const&, std::shared_ptrtorch::autograd::Node, torch::autograd::InputBuffer&&) + 0x28 (0x7f9f4b4ddf78 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: torch::autograd::python::PythonEngine::execute_with_graph_task(std::shared_ptrtorch::autograd::GraphTask const&, std::shared_ptrtorch::autograd::Node, torch::autograd::InputBuffer&&) + 0x3c (0x7f9f5ed7ba3c in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: torch::autograd::Engine::execute(std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&, std::vector<at::Tensor, std::allocatorat::Tensor > const&, bool, bool, bool, std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&) + 0x900 (0x7f9f4b4dc1a0 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::python::PythonEngine::execute(std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&, std::vector<at::Tensor, std::allocatorat::Tensor > const&, bool, bool, bool, std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&) + 0x56 (0x7f9f5ed7b996 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: THPEngine_run_backward(_object*, _object*, _object*) + 0x9d4 (0x7f9f5ed7c4a4 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #31: __libc_start_main + 0xe7 (0x7f9f618a7c87 in /lib/x86_64-linux-gnu/libc.so.6)
The text was updated successfully, but these errors were encountered: