Having Error 'c10::HIPError' #6

haotiansun14 · 2022-09-07T00:51:34Z

Hi! I have encountered an error when trying to train the model. I tried different datasets, but the error remains. Thank you for helping me!
My env: Python 3.7.0 and Pytorch 1.10.1

The complete error message goes as follows:

terminate called after throwing an instance of 'c10::HIPError' | 0/500 [00:00<?, ?it/s]
what(): HIP error: hipErrorNoDevice
HIP kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Exception raised from deviceCount at /pytorch/aten/src/ATen/hip/impl/HIPGuardImplMasqueradingAsCUDA.h:102 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f9eff1e7212 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x5618da (0x7f9f14f4a8da in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_hip.so)
frame #2: torch::autograd::Engine::start_device_threads() + 0x21a (0x7f9f4b4d82ca in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #3: + 0xf907 (0x7f9f61c86907 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #4: torch::autograd::Engine::initialize_device_threads_pool() + 0xcd (0x7f9f4b4d70bd in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::autograd::Engine::execute_with_graph_task(std::shared_ptrtorch::autograd::GraphTask const&, std::shared_ptrtorch::autograd::Node, torch::autograd::InputBuffer&&) + 0x28 (0x7f9f4b4ddf78 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: torch::autograd::python::PythonEngine::execute_with_graph_task(std::shared_ptrtorch::autograd::GraphTask const&, std::shared_ptrtorch::autograd::Node, torch::autograd::InputBuffer&&) + 0x3c (0x7f9f5ed7ba3c in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: torch::autograd::Engine::execute(std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&, std::vector<at::Tensor, std::allocatorat::Tensor > const&, bool, bool, bool, std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&) + 0x900 (0x7f9f4b4dc1a0 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::python::PythonEngine::execute(std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&, std::vector<at::Tensor, std::allocatorat::Tensor > const&, bool, bool, bool, std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&) + 0x56 (0x7f9f5ed7b996 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: THPEngine_run_backward(_object*, _object*, _object*) + 0x9d4 (0x7f9f5ed7c4a4 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

frame #31: __libc_start_main + 0xe7 (0x7f9f618a7c87 in /lib/x86_64-linux-gnu/libc.so.6)

haotiansun14 · 2022-09-07T02:12:23Z

Update my torch to the latest version and the problem is solved.

haotiansun14 closed this as completed Sep 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Having Error 'c10::HIPError' #6

Having Error 'c10::HIPError' #6

haotiansun14 commented Sep 7, 2022

haotiansun14 commented Sep 7, 2022

Having Error 'c10::HIPError' #6

Having Error 'c10::HIPError' #6

Comments

haotiansun14 commented Sep 7, 2022

haotiansun14 commented Sep 7, 2022