Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having Error 'c10::HIPError' #6

Closed
haotiansun14 opened this issue Sep 7, 2022 · 1 comment
Closed

Having Error 'c10::HIPError' #6

haotiansun14 opened this issue Sep 7, 2022 · 1 comment

Comments

@haotiansun14
Copy link

Hi! I have encountered an error when trying to train the model. I tried different datasets, but the error remains. Thank you for helping me!
My env: Python 3.7.0 and Pytorch 1.10.1

The complete error message goes as follows:

terminate called after throwing an instance of 'c10::HIPError' | 0/500 [00:00<?, ?it/s]
what(): HIP error: hipErrorNoDevice
HIP kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Exception raised from deviceCount at /pytorch/aten/src/ATen/hip/impl/HIPGuardImplMasqueradingAsCUDA.h:102 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f9eff1e7212 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x5618da (0x7f9f14f4a8da in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_hip.so)
frame #2: torch::autograd::Engine::start_device_threads() + 0x21a (0x7f9f4b4d82ca in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #3: + 0xf907 (0x7f9f61c86907 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #4: torch::autograd::Engine::initialize_device_threads_pool() + 0xcd (0x7f9f4b4d70bd in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::autograd::Engine::execute_with_graph_task(std::shared_ptrtorch::autograd::GraphTask const&, std::shared_ptrtorch::autograd::Node, torch::autograd::InputBuffer&&) + 0x28 (0x7f9f4b4ddf78 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: torch::autograd::python::PythonEngine::execute_with_graph_task(std::shared_ptrtorch::autograd::GraphTask const&, std::shared_ptrtorch::autograd::Node, torch::autograd::InputBuffer&&) + 0x3c (0x7f9f5ed7ba3c in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: torch::autograd::Engine::execute(std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&, std::vector<at::Tensor, std::allocatorat::Tensor > const&, bool, bool, bool, std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&) + 0x900 (0x7f9f4b4dc1a0 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::autograd::python::PythonEngine::execute(std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&, std::vector<at::Tensor, std::allocatorat::Tensor > const&, bool, bool, bool, std::vector<torch::autograd::Edge, std::allocatortorch::autograd::Edge > const&) + 0x56 (0x7f9f5ed7b996 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: THPEngine_run_backward(_object*, _object*, _object*) + 0x9d4 (0x7f9f5ed7c4a4 in /nethome/hsun409/anaconda3/envs/GDSS_env/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

frame #31: __libc_start_main + 0xe7 (0x7f9f618a7c87 in /lib/x86_64-linux-gnu/libc.so.6)

@haotiansun14
Copy link
Author

Update my torch to the latest version and the problem is solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant