Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown error with CUDA #62

Closed
zzhat0706 opened this issue Feb 15, 2020 · 10 comments
Closed

Unknown error with CUDA #62

zzhat0706 opened this issue Feb 15, 2020 · 10 comments
Assignees
Labels
question Further information is requested

Comments

@zzhat0706
Copy link

Thank you for your great work at first!
When I try to run deformation of sphere to dolphin tutorial, I found an unexpected errors when loading the vertices of meshes to device, which be set to CUDA:0. Here is the error log:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/THC/THCGeneral.cpp line=50 error=30 : unknown error
Traceback (most recent call last):
File "dolphin.py", line 33, in
faces_idx = faces.verts_idx.to(device)
File "/home/jormungandr/anaconda3/envs/pytorch3d/lib/python3.6/site-packages/torch/cuda/init.py", line 197, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (30) : unknown error at /opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/THC/THCGeneral.cpp:50

I've tried rmmod nvidia, nvidia-uvm, but each of these commands has an error about
rmmod: ERROR: Module nvidia_uvm is not currently loaded or
rmmod: ERROR: Module nvidia is in use by: nvidia_modeset
I rebooted once, but nothing changed either.
And my environment is as follows:
Pytorch : 1.4
Python : 3.6.10
CUDA : 10.0 by nvcc(runtime)
cuDNN : 7.0
OS : Ubuntu 18.04

@nikhilaravi
Copy link
Contributor

@zzhat0706 I am assuming you built PyTorch3D from local clone? Were you able to run if you change the device to cpu? Are you able to run other pytorch code on device i.e. not with PyTorch3D? This looks like an issue with PyTorch not PyTorch3D.

There's an issue on the PyTorch repo which is referencing this problem - did you check this? pytorch/pytorch#17108

@zzhat0706
Copy link
Author

@nikhilaravi Thanks for your answers!
Before Pytorch3D, I've run some pytorch codes such as CycleGAN and W-GAN. And I built Pytorch3D from Anaconda Cloud but not the local clone.
But I haven't tried cpu yet, I'll check it later.

@nikhilaravi nikhilaravi self-assigned this Feb 18, 2020
@shersoni610
Copy link

Hello,

I did all the things mentioned above but I still get the error:


RuntimeError Traceback (most recent call last)
in
23
24 # Create a textures object
---> 25 tex = Textures(verts_uvs=verts_uvs, faces_uvs=faces_uvs, maps=texture_image)
26
27 # Create a meshes object with textures

~/Disk/Software/Anaconda3/envs/pytorch3d/lib/python3.7/site-packages/pytorch3d/structures/textures.py in init(self, maps, faces_uvs, verts_uvs, verts_rgb)
118
119 if self._faces_uvs_padded is not None:
--> 120 self._num_faces_per_mesh = faces_uvs.gt(-1).all(-1).sum(-1).tolist()
121
122 def clone(self):

RuntimeError: CUDA error: no kernel image is available for execution on the device

@shersoni610
Copy link

I downgraded the Nvidia driver but the error persists:

-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 00000000:04:00.0 On | N/A |
| 26% 32C P8 14W / 250W | 642MiB / 6082MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

@shersoni610
Copy link

Surprisingly, all the tests in the test folder passed. But the following error comes in the notebook tutorials.
RuntimeError: CUDA error: no kernel image is available for execution on the device

@bottler
Copy link
Contributor

bottler commented Feb 19, 2020

I can't see the name of the GPU you are using due to truncation. ("GeForce GTX TIT...") Is it a TITAN X? If it is one of the other TITANs, then I think it will have compute capability 3.5 and so need a local build of pytorch.

@shersoni610
Copy link

shersoni610 commented Feb 19, 2020 via email

@bottler
Copy link
Contributor

bottler commented Feb 20, 2020

I don't think the driver matters, but you might as well use the latest driver which is working with your GPU. If nvidia-smi is working then you probably have a working driver (although I am not sure about this).

The problem is that you cannot install pytorch (except old versions which we don't support) from conda with your GPU. You will need to set up a new conda environment, and follow the instructions at https://github.com/pytorch/pytorch#from-source for your gpu. I suggest you checkout the branch v1.4. I think you will then need to install torchvision from source as well, and then install pytorch3d from source.

(If all this sounds too hard, and you just want to get a feel for the tutorials, and you are not expecting that you will be using pytorch3d much on your computer, maybe you can run them on colab instead. Alternatively, if you install just pytorch3d from github, then you may be able to run the tutorial entirely on the CPU - just change device = torch.device("cuda:0") to device = torch.device("cpu") in the tutorial.)

@nikhilaravi
Copy link
Contributor

@zzhat0706, @shersoni610 were you able to resolve this installation issue? If so, please share what you did here for others to replicate!

@nikhilaravi nikhilaravi added the question Further information is requested label Feb 24, 2020
@zzhat0706
Copy link
Author

zzhat0706 commented Mar 9, 2020

@nikhilaravi Sorry for the late reply, I eventually chose to run on nvidia-docker, then everything just worked out perfectly!
Thx for your great work again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants