NVIDIA: no NVIDIA devices found (THCudaCheck FAIL file=torch/csrc/autograd/engine.cpp) #1154

woofie56 · 2017-03-31T11:47:07Z

Hi,

I did the following :

Install PyTorch in Ubuntu 16.04 via :
conda install pytorch torchvision -c soumith
http://pytorch.org/
using Anaconda2-4.3.1
Installed CUDA Linux Ubuntu 16.04 x86_64 via :
https://developer.nvidia.com/cuda-downloads
There is no CUDA GPU installed on the machine (it is Virtualbox virtual machine)
Ran autograd_tutorial.py in :
http://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

and I got the following error (using pytorch-0.1.11-py27_2) :

Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]

Variable containing:
 3  3
 3  3
[torch.FloatTensor of size 2x2]

<torch.autograd._functions.basic_ops.AddConstant object at 0x7feddc620220>
(Variable containing:
 27  27
 27  27
[torch.FloatTensor of size 2x2]
, Variable containing:
 27
[torch.FloatTensor of size 1]
)
NVIDIA: no NVIDIA devices found
THCudaCheck FAIL file=torch/csrc/autograd/engine.cpp line=352 error=30 : unknown error
Traceback (most recent call last):
  File "autograd_tutorial.py", line 81, in <module>
    out.backward()
  File "/home/testuser/Anaconda2-4.3.1/lib/python2.7/site-packages/torch/autograd/variable.py", line 146, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)
RuntimeError: cuda runtime error (30) : unknown error at torch/csrc/autograd/engine.cpp:352

How do I get around not having a CUDA GPU card? Thanks

The text was updated successfully, but these errors were encountered:

useryc · 2017-03-31T12:57:50Z

Hello,

I have the same problem, I'm trying to run on a machine with no CPU.

I have installed with pip python 2.7 on Linux as instructed no CUDA option:

pip install http://download.pytorch.org/whl/cu75/torch-0.1.11.post4-cp27-none-linux_x86_64.whl
pip install torchvision

Error:

THCudaCheck FAIL file=torch/csrc/autograd/engine.cpp line=353 error=35 : CUDA driver version is insufficient for CUDA runtime version
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 146, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)
RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at torch/csrc/autograd/engine.cpp:353

Can someone help with this issue?

Thanks a lot.

apaszke · 2017-03-31T15:20:28Z

Why do you have the CUDA driver installed if you don't have a GPU?

soumith · 2017-03-31T15:35:45Z

i've reproduced the issue on a node that does not have driver or CUDA.

The check Sam introduced here: https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/engine.cpp#L353 is wrong.
It needs access to the driver.

We first have to check if isDriverSufficient and if it is not, then we have to skip this case.

I'll fix this and rebuild binaries. Thanks for the report @woofie56

woofie56 · 2017-03-31T15:41:13Z

@soumith Thanks. I would love to play with this over the weekend, so if you could rebuild the binaries today that would be great.

soumith · 2017-03-31T15:41:42Z

for sure. On it. Should be ready in 6 to 7 hours.

woofie56 · 2017-03-31T15:45:17Z

@soumith Wicked 👍 ! Thanks

ethancaballero · 2017-03-31T20:43:11Z

I had this same issue with installing pytorch0.1.11 build 4 on any aws linux cpu via
pip install http://download.pytorch.org/whl/cu75/torch-0.1.11.post4-cp35-cp35m-linux_x86_64.whl

Build number 5 remedies it:
pip install http://download.pytorch.org/whl/cu75/torch-0.1.11.post5-cp35-cp35m-linux_x86_64.whl

woofie56 · 2017-03-31T21:59:28Z

@soumith Hi, I assumed that the package had been updated and so I tried installing via conda install pytorch torchvision -c soumith and I got the following error message :

The following packages will be UPDATED:

pytorch: 0.1.11-py27_2 soumith --> 0.1.11-py27_5 soumith

Proceed ([y]/n)? y



CondaError: dist_name is not a valid conda package: c

Thanks.

soumith · 2017-03-31T23:05:22Z

This should now be fixed with the new binaries.

@woofie56 i'm not sure why you see that error, but please try: conda install conda to update your conda first.

woofie56 · 2017-04-01T08:53:34Z

@soumith Hi I created a new linux user account, and this time installed Anaconda3-4.3.1-Linux-x86_64.sh (Python 3.6). However I still get the same problem :

conda install pytorch -c soumith
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /home/tftest2/anaconda3:

The following NEW packages will be INSTALLED:

    pytorch: 0.1.11-py36_5 soumith

Proceed ([y]/n)? y

pytorch-0.1.11 100% |################################| Time: 0:18:07 255.07 kB/s

Then when I run I get the following error message :

> python autograd_tutorial.py 

Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]

Variable containing:
 3  3
 3  3
[torch.FloatTensor of size 2x2]

<torch.autograd._functions.basic_ops.AddConstant object at 0x7fdb51c43748>
Variable containing:
 27  27
 27  27
[torch.FloatTensor of size 2x2]
 Variable containing:
 27
[torch.FloatTensor of size 1]

NVIDIA: no NVIDIA devices found
THCudaCheck FAIL file=torch/csrc/autograd/engine.cpp line=359 error=30 : unknown error
Traceback (most recent call last):
  File "autograd_tutorial.py", line 81, in <module>
    out.backward()
  File "/home/tftest2/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 146, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)
RuntimeError: cuda runtime error (30) : unknown error at torch/csrc/autograd/engine.cpp:359

I also get the same error when I install via pip :

>pip install http://download.pytorch.org/whl/cu75/torch-0.1.11.post5-cp36-cp36m-linux_x86_64.whl

Collecting torch==0.1.11.post5 from http://download.pytorch.org/whl/cu75/torch-0.1.11.post5-cp36-cp36m-linux_x86_64.whl
  Downloading http://download.pytorch.org/whl/cu75/torch-0.1.11.post5-cp36-cp36m-linux_x86_64.whl (343.0MB)
    100% |████████████████████████████████| 343.0MB 353kB/s 
Requirement already satisfied: pyyaml in /home/tftest2/anaconda3/lib/python3.6/site-packages (from torch==0.1.11.post5)
Installing collected packages: torch
Successfully installed torch-0.1.11.post5

Should I be using the Python 3.5 version?

soumith · 2017-04-01T13:42:01Z

@woofie56 i've tested the binaries on two linux machines with no GPUs, and I ran this script:
http://pytorch.org/tutorials/_downloads/autograd_tutorial.py

The seem to run fine.

I am presuming you did either of the following:

modified the script in some way to add .cuda() calls
you proactively installed CUDA on your VM, I am not sure what happened but it is probably a botched install that is screwing up things. For example I dont know what happens when you install the nvidia driver on a machine with no NVIDIA GPUs.

Are either of these cases what you did? If so can you revert either of them?

woofie56 · 2017-04-01T14:46:51Z

@soumith : Hi thanks for the reply. I uninstalled the cuda drivers and reinstalled pytorch but that didnt help. Maybe I will try reinstalling again anaconda from scratch (since i didnt reinstall this after removing the cuda drivers).

If this fails, I will make a fresh ubuntu virtual machine installation and try again in that. Thanks

woofie56 · 2017-04-01T15:41:02Z

@soumith Hi it turned out that despite uninstalling nvidia from ubuntu, I still had nvida-375 installed (maybe from an earlier attempt to install the nvidia drivers). When I removed this and reinstalled anaconda and pytorch everything work.

Thanks for taking the time out of your weekend to help me out.

wddabc · 2017-04-14T00:44:05Z

Hi @soumith,
I ran into the same issue. Looks like the problem of my case indeed comes from the second case you mentioned "install the nvidia driver on a machine with no NVIDIA GPUs". But I few hard to get away with it.

My case is I'm using the queuing system on the server, some are gpu queues, some are cpu queues. They share the same cuda installation. When I submit my jobs to cpu queues with no NVIDIA GPUs, this error occurs.

wddabc · 2017-04-14T04:25:26Z

Hello, I think I found the issue it comes from.

I'd like to ask is there a particular reason using a black list like https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/engine.cpp#L357 for blocking the GPU? I understand the logic here --- if the error code == 35 (CUDA driver version is insufficient for CUDA runtime version), then backoff to CPU. But shouldn't it be a white list -- only allowing the error code == 0 to pass and otherwise backoff to CPU?

The reason of my suggestion is I have a weird situation where the error code is unknown (30) and it crashed the whole program at https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/engine.cpp#L360. I've already asked my server admin how this err code could happen. (Something is apparently wrong with the system setup) But I might suggest to change this to a white list to make it more robust? Any concerns for that?

* Expose some of the utility functions They are useful to have for the C++ interface.

…_code_split Add functions to compute grad_out1, grad_out1_halo

soumith added bug high priority labels Mar 31, 2017

soumith mentioned this issue Mar 31, 2017

check for nvidia driver's sufficiency before checking for number of CUDA devices #1156

Merged

soumith closed this as completed in #1156 Mar 31, 2017

soumith reopened this Mar 31, 2017

apaszke mentioned this issue Mar 31, 2017

No cuda device, but "CUDA driver version is insufficient for CUDA runtime version" #1159

Closed

soumith closed this as completed Mar 31, 2017

lizeng614 mentioned this issue Apr 4, 2017

CPU Only failed lizeng614/SqueezeNet-Neural-Style-Pytorch#3

Closed

chsasank mentioned this issue Apr 12, 2017

Running .backward() without a CUDA-capable device pytorch/tutorials#64

Closed

wddabc mentioned this issue Apr 15, 2017

Handling the error message of cudaGetDeviceCount #1267

Closed

junyanz mentioned this issue Apr 23, 2017

CPU only junyanz/pytorch-CycleGAN-and-pix2pix#11

Closed

jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this issue Nov 5, 2021

Expose some of the utility functions (pytorch#1154)

642c58d

* Expose some of the utility functions They are useful to have for the C++ interface.

hubertlu-tw pushed a commit to hubertlu-tw/pytorch that referenced this issue Nov 1, 2022

Merge pull request pytorch#1154 from NVIDIA/rework_spatial_bottleneck…

d934eca

…_code_split Add functions to compute grad_out1, grad_out1_halo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA: no NVIDIA devices found (THCudaCheck FAIL file=torch/csrc/autograd/engine.cpp) #1154

NVIDIA: no NVIDIA devices found (THCudaCheck FAIL file=torch/csrc/autograd/engine.cpp) #1154

woofie56 commented Mar 31, 2017

useryc commented Mar 31, 2017 •

edited

apaszke commented Mar 31, 2017

soumith commented Mar 31, 2017

woofie56 commented Mar 31, 2017

soumith commented Mar 31, 2017

woofie56 commented Mar 31, 2017

ethancaballero commented Mar 31, 2017 •

edited

woofie56 commented Mar 31, 2017

soumith commented Mar 31, 2017

woofie56 commented Apr 1, 2017

soumith commented Apr 1, 2017

woofie56 commented Apr 1, 2017

woofie56 commented Apr 1, 2017

wddabc commented Apr 14, 2017

wddabc commented Apr 14, 2017

NVIDIA: no NVIDIA devices found (THCudaCheck FAIL file=torch/csrc/autograd/engine.cpp) #1154

NVIDIA: no NVIDIA devices found (THCudaCheck FAIL file=torch/csrc/autograd/engine.cpp) #1154

Comments

woofie56 commented Mar 31, 2017

useryc commented Mar 31, 2017 • edited

apaszke commented Mar 31, 2017

soumith commented Mar 31, 2017

woofie56 commented Mar 31, 2017

soumith commented Mar 31, 2017

woofie56 commented Mar 31, 2017

ethancaballero commented Mar 31, 2017 • edited

woofie56 commented Mar 31, 2017

soumith commented Mar 31, 2017

woofie56 commented Apr 1, 2017

soumith commented Apr 1, 2017

woofie56 commented Apr 1, 2017

woofie56 commented Apr 1, 2017

wddabc commented Apr 14, 2017

wddabc commented Apr 14, 2017

useryc commented Mar 31, 2017 •

edited

ethancaballero commented Mar 31, 2017 •

edited