New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 processes cannot use same GPU #3871

Closed
glample opened this Issue Nov 25, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@glample
Contributor

glample commented Nov 25, 2017

Hi,

Since a few weeks ago (or maybe more), I can't have more than one python script accessing one GPU. For instance, if I have GPU 0 used by one process (where the process is not necessarily running, but just allocating some memory on GPU 0), then I can't use GPU 0 at all, and I have to find the process that uses it and kill it before I can run some other code on that GPU. That's very inconvenient, and in the past it was not like this, I was able to run several scripts on a same GPU in parallel.

Is it an expected behavior due do some recent changes, or the problem only comes from me?

@glample

This comment has been minimized.

Show comment
Hide comment
@glample

glample Nov 25, 2017

Contributor

And this is the error I get when I run:

import torch
torch.FloatTensor(3).normal_().cuda()
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCStorage.cu line=66 error=46 : all CUDA-capable devices are busy or unavailable
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/home/guismay/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 66, in _cuda
    return new_type(self.size()).copy_(self, async)
  File "/private/home/guismay/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 269, in _lazy_new
    return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: cuda runtime error (46) : all CUDA-capable devices are busy or unavailable at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCStorage.cu:66
Contributor

glample commented Nov 25, 2017

And this is the error I get when I run:

import torch
torch.FloatTensor(3).normal_().cuda()
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCStorage.cu line=66 error=46 : all CUDA-capable devices are busy or unavailable
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/home/guismay/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 66, in _cuda
    return new_type(self.size()).copy_(self, async)
  File "/private/home/guismay/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 269, in _lazy_new
    return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: cuda runtime error (46) : all CUDA-capable devices are busy or unavailable at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCStorage.cu:66
@apaszke

This comment has been minimized.

Show comment
Hide comment
@apaszke

apaszke Nov 25, 2017

Member

It happens because your GPUs are in EXCLUSIVE_PROCESS mode, so the CUDA driver will forbid two processes from using the same GPU. You should be able to change that by running nvidia-smi -g <GPU number> -c 0.

Member

apaszke commented Nov 25, 2017

It happens because your GPUs are in EXCLUSIVE_PROCESS mode, so the CUDA driver will forbid two processes from using the same GPU. You should be able to change that by running nvidia-smi -g <GPU number> -c 0.

@apaszke apaszke closed this Nov 25, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment