-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVIDIA: no NVIDIA devices found (THCudaCheck FAIL file=torch/csrc/autograd/engine.cpp) #1154
Comments
Hello, I have the same problem, I'm trying to run on a machine with no CPU. I have installed with pip python 2.7 on Linux as instructed no CUDA option:
Error:
Can someone help with this issue? Thanks a lot. |
Why do you have the CUDA driver installed if you don't have a GPU? |
i've reproduced the issue on a node that does not have driver or CUDA. The check Sam introduced here: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/engine.cpp#L353 is wrong. We first have to check if I'll fix this and rebuild binaries. Thanks for the report @woofie56 |
@soumith Thanks. I would love to play with this over the weekend, so if you could rebuild the binaries today that would be great. |
for sure. On it. Should be ready in 6 to 7 hours. |
@soumith Wicked 👍 ! Thanks |
I had this same issue with installing pytorch0.1.11 build 4 on any aws linux cpu via Build number 5 remedies it: |
@soumith Hi, I assumed that the package had been updated and so I tried installing via
Thanks. |
This should now be fixed with the new binaries. @woofie56 i'm not sure why you see that error, but please try: |
@soumith Hi I created a new linux user account, and this time installed Anaconda3-4.3.1-Linux-x86_64.sh (Python 3.6). However I still get the same problem :
Then when I run I get the following error message :
I also get the same error when I install via pip :
Should I be using the Python 3.5 version? |
@woofie56 i've tested the binaries on two linux machines with no GPUs, and I ran this script: The seem to run fine. I am presuming you did either of the following:
Are either of these cases what you did? If so can you revert either of them? |
@soumith : Hi thanks for the reply. I uninstalled the cuda drivers and reinstalled pytorch but that didnt help. Maybe I will try reinstalling again anaconda from scratch (since i didnt reinstall this after removing the cuda drivers). If this fails, I will make a fresh ubuntu virtual machine installation and try again in that. Thanks |
@soumith Hi it turned out that despite uninstalling nvidia from ubuntu, I still had nvida-375 installed (maybe from an earlier attempt to install the nvidia drivers). When I removed this and reinstalled anaconda and pytorch everything work. Thanks for taking the time out of your weekend to help me out. |
Hi @soumith, My case is I'm using the queuing system on the server, some are gpu queues, some are cpu queues. They share the same cuda installation. When I submit my jobs to cpu queues with no NVIDIA GPUs, this error occurs. |
Hello, I think I found the issue it comes from. I'd like to ask is there a particular reason using a black list like https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/engine.cpp#L357 for blocking the GPU? I understand the logic here --- if the error code == 35 (CUDA driver version is insufficient for CUDA runtime version), then backoff to CPU. But shouldn't it be a white list -- only allowing the error code == 0 to pass and otherwise backoff to CPU? The reason of my suggestion is I have a weird situation where the error code is unknown (30) and it crashed the whole program at https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/engine.cpp#L360. I've already asked my server admin how this err code could happen. (Something is apparently wrong with the system setup) But I might suggest to change this to a white list to make it more robust? Any concerns for that? |
* Expose some of the utility functions They are useful to have for the C++ interface.
…_code_split Add functions to compute grad_out1, grad_out1_halo
Hi,
I did the following :
Install PyTorch in Ubuntu 16.04 via :
conda install pytorch torchvision -c soumith
http://pytorch.org/
using Anaconda2-4.3.1
Installed CUDA Linux Ubuntu 16.04 x86_64 via :
https://developer.nvidia.com/cuda-downloads
There is no CUDA GPU installed on the machine (it is Virtualbox virtual machine)
Ran autograd_tutorial.py in :
http://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html
and I got the following error (using pytorch-0.1.11-py27_2) :
How do I get around not having a CUDA GPU card? Thanks
The text was updated successfully, but these errors were encountered: