You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when I want to use Baichuan to train,I give some args and it returns me some errors like below.
[real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/sunmy/anaconda3/envs/gra/lib/python3.9/site-packages/torch/cuda/init.py:141: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
Unable to find hostfile, will proceed with training with local resources only.
/home/sunmy/anaconda3/envs/gra/lib/python3.9/site-packages/torch/cuda/init.py:628: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
Traceback (most recent call last):
File "/home/sunmy/anaconda3/envs/gra/bin/deepspeed", line 6, in
main()
File "/home/sunmy/anaconda3/envs/gra/lib/python3.9/site-packages/deepspeed/launcher/runner.py", line 418, in main
raise RuntimeError("Unable to proceed, no GPU resources available")
RuntimeError: Unable to proceed, no GPU resources available
The text was updated successfully, but these errors were encountered:
when I want to use Baichuan to train,I give some args and it returns me some errors like below.
[real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/sunmy/anaconda3/envs/gra/lib/python3.9/site-packages/torch/cuda/init.py:141: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
Unable to find hostfile, will proceed with training with local resources only.
/home/sunmy/anaconda3/envs/gra/lib/python3.9/site-packages/torch/cuda/init.py:628: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
Traceback (most recent call last):
File "/home/sunmy/anaconda3/envs/gra/bin/deepspeed", line 6, in
main()
File "/home/sunmy/anaconda3/envs/gra/lib/python3.9/site-packages/deepspeed/launcher/runner.py", line 418, in main
raise RuntimeError("Unable to proceed, no GPU resources available")
RuntimeError: Unable to proceed, no GPU resources available
The text was updated successfully, but these errors were encountered: