Can not use .cuda() function to load the model into GPU using Pytorch 1.3 #27738

phongnhhn92 · 2019-10-11T10:10:25Z

🐛 Bug

I am trying to run the Captum CIFAR10 example link and I want to test it on GPU so I modified a line net = Net().cuda() to load the model into the GPU (I am having a single GPU RTX 2080TI). However I got this error:

AssertionError: 
The NVIDIA driver on your system is too old (found version 10000).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

At the moment, I am using NVIDIA driver version 410. I have tried to upgrade NVIDIA GPU driver to version 435 and I don't see that error anymore but the code got stuck trying to load the model into the GPU.

To Reproduce

Steps to reproduce the behavior:

I upgrade the lastest pytorch version using this command conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
Download the Captum CIFAR 10 example code and modify the line to load the network into GPU
Run the code and receive the error.

Environment

Collecting environment information...
PyTorch version: 1.3.0
Is debug build: No
CUDA used to build PyTorch: 10.1.243

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: No
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: GeForce RTX 2080 Ti
Nvidia driver version: 410.104
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.17.0
[pip3] numpydoc==0.7.0
[conda] _tflow_select 2.3.0 mkl
[conda] blas 1.0 mkl
[conda] captum 0.1.0 0 pytorch
[conda] mkl 2019.4 243
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.0.14 py37ha843d7b_0
[conda] mkl_random 1.0.2 py37hd81dba3_0
[conda] pytorch 1.3.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] tensorflow 1.14.0 mkl_py37h45c423b_0
[conda] tensorflow-base 1.14.0 mkl_py37h7ce6ba3_0
[conda] torchtext 0.4.0 pypi_0 pypi
[conda] torchvision 0.4.1 py37_cu101 pytorch

cc @ezyang @gchanan @zou3519

The text was updated successfully, but these errors were encountered:

mazzma12 · 2019-10-11T10:44:04Z

Same issue here. I think it is related to the release of 1.3.0. Installing 1.2.0 solved it for me

phongnhhn92 · 2019-10-11T11:09:18Z

I think the problem is related to CUDA version since the new pytorch 1.3 is built using CUDA 10.1.243 and my current CUDA version is 10.1.168 (I installed it from the conda package). I guess I have to wait until the cuda conda package gets some updates to the new version. Another solution is installing cuda 10.1.243 manually.
Btw, the Captum code works perfectly with pytorch 1.2 :D

Zehaos · 2019-10-11T16:53:11Z

Same issue here. Current conda cudatoolkit version is old(10.1.168).

bryant1410 · 2019-10-11T19:08:33Z

Guys, you can still use pytorch=1.3.0 with cudatoolkit=10.0

soumith · 2019-10-11T22:13:40Z

@jjhelmus would you know when anaconda would upgrade to 10.1.243 if there's a plan. Also, if there's a better way to ask about anaconda cuda / cudnn upgrades let me know I'll follow it :)

soumith · 2019-10-11T22:15:05Z

@ptrblck can you folks try to repro this (see the user's 2nd comment), basically they are running into a hang of some sort with 10.1.243 vs 10.1.168

huyvnphan · 2019-10-12T02:26:11Z

I have the same issue. Right now, the only solution is to revert back to Pytorch 1.2

ssnl · 2019-10-12T05:32:23Z

@soumith I can reproduce this with system CUDA cuda_10.1.105_418.39, conda cudatoolkit pkgs/main/linux-64::cudatoolkit-10.1.168-0 and 1.3 binary.

mazzma12 · 2019-10-12T08:29:07Z

Guys, you can still use pytorch=1.3.0 with cudatoolkit=10.0

Intersting, so the cuda version would not be the cause of the problem ? Where do you find information about the cuda version required for each torch version, and the installation steps ? I couldn't find it on torch website.

I still have: no cuda available when installing with pip though

WenmuZhou · 2019-10-12T09:28:07Z

pytorch1.3 cuda10 has same error

leftthomas · 2019-10-12T09:48:27Z

@mazzma12 here can find the exact version of supported cuda

soumith · 2019-10-12T11:33:55Z

another issue reporter reports that the startup time for 10.1 is in minutes, and we are looking. so, it's not deadlocked, but starts after a few minutes. looks like some PTX->SASS compilation happening: #27807

soumith · 2019-10-12T11:52:50Z

@leftthomas @phongnhhn92 @ssnl can you confirm or deny that pip install -U torch does not produce the slow-down?

soumith · 2019-10-12T12:03:36Z

in the interest of not having two threads (and not copy-pasting my comments), I am closing this in favor of #27807

Please follow updates at #27807
I think i have a possible solution, I'll post an update there.

soumith · 2019-10-12T21:44:52Z

This issue is now fixed with newly updated binaries.
Uninstalling and reinstalling PyTorch from Anaconda will fix it.

phongnhhn92 · 2019-10-14T11:36:19Z

I have tried to create a new Conda environment and I can still see the mismatch between the cudatoolkit version (10.1.168) and the built Pytorch cuda (10.1.243).

When I try to check the cuda with the installed pytorch, this is the result:

soumith · 2019-10-14T22:30:26Z

@phongnhhn92 upgrade your NVIDIA driver

phongnhhn92 · 2019-10-15T12:27:18Z

@phongnhhn92 upgrade your NVIDIA driver

It works. Thanks!

jjhelmus · 2019-10-16T14:37:19Z

@jjhelmus would you know when anaconda would upgrade to 10.1.243 if there's a plan. Also, if there's a better way to ask about anaconda cuda / cudnn upgrades let me know I'll follow it :)

I'll add an update to the cudatoolkit and related packages to our backlog. A new sprint starts next Monday so these will likely be available sometime next week. The anaconda-issues repository is a better place for requests like these. Multiple members of Anaconda's distribution team monitor that repository's issue tracker.

MOAboAli · 2019-10-18T05:37:58Z

@phongnhhn92 upgrade your NVIDIA driver

I get same Error but I don't have NVIDIA Driver to upgrade , Display Driver :Intel HD Graphics 4000,

so in my case what can i do ?

phongnhhn92 · 2019-10-18T05:40:59Z

In your case, you dont have nvidia gpu then you shouldn’t use .cuda() function at all.

MOAboAli · 2019-10-18T05:44:29Z

i get this ERROR

MOAboAli · 2019-10-18T21:40:49Z

So what is the replacement ?

isalirezag · 2019-10-18T21:55:09Z

I still see that error, although i have cuda 10.1 and nvcc -V shows that i have cuda 10.1

ptrblck · 2019-10-21T14:37:45Z

@mohamedaboali1990 as @phongnhhn92 explained, you won't be able to use the .cuda calls, if you don't have an NVIDIA GPU.

@isalirezag did you (re)install the binaries after the fix was published?

ptrblck · 2019-10-24T12:53:22Z

@KoalaSheep Your local CUDA (and cudnn) installations won't be used, as the PyTorch binaries ship with their own CUDA, cudnn and other libs.

Could you create a new conda environment and reinstall the latest PyTorch version, please?
Let us know, if you still face this issue.

olix86 · 2019-10-25T12:49:46Z

Is this issue specific to conda? I'm having the same issue with CUDA 10.0 and pytorch 1.3.0 but using pip... @soumith

vlad-i · 2019-10-28T09:12:20Z

@phongnhhn92 upgrade your NVIDIA driver

It works. Thanks!

How do you update the Nvidia driver? Is there a one-liner? I'm on a GCP instance.

I'm also looking into this, trying to get the Nvidia driver to update via the terminal, no luck so far.

KoalaSheep · 2019-10-29T03:18:18Z

@KoalaSheep Your local CUDA (and cudnn) installations won't be used, as the PyTorch binaries ship with their own CUDA, cudnn and other libs.

Could you create a new conda environment and reinstall the latest PyTorch version, please?
Let us know, if you still face this issue.
@ptrblck

Thank you for reply. After make a new environment, I still face the same problem.
When I install the pytorch, cuda 10.1.168 is installed and it still takes too long to move to cuda.

I reinstall pytorch 1.2. cudatoolkit is downgraded to 10.0.130-0 and it works well for me now.
Still waiting for a way to use pytorch 1.3.

Smerity · 2019-11-01T20:58:40Z

Just to note I ran into a potentially related issue in upgrading to PyTorch 1.3 so in case this helps anyone:

My Titan V card had no issue but my GTX 1080 Ti reported "cuda runtime error (209): ... no kernel image is available for execution on the device" upon using something from rnn.py.

This was with conda install pytorch cudatoolkit=10.1 -c pytorch.

Whilst this may be related to an older Nvidia driver my system decided to make upgrading that difficult. I instead ran conda install pytorch torchvision cudatoolkit=10.0 -c pytorch and that seems to work for now.

I'll tackle broken Ubuntu / Nvidia packages another day ^_^

bryant1410 · 2019-11-01T21:04:42Z

With cudatoolkit=10.1.243?

…

On Fri, Nov 1, 2019 at 4:59 PM Stephen Merity ***@***.***> wrote: Just to note I ran into a potentially related issue in upgrading to PyTorch 1.3 so in case this helps anyone: My Titan V card had no issue but my GTX 1080 Ti reported "no kernel image is available for execution on the device". This was with conda install pytorch cudatoolkit=10.1 -c pytorch. Whilst this may be related to an older Nvidia driver my system decided to make upgrading that difficult. I instead ran conda install pytorch torchvision cudatoolkit=10.0 -c pytorch and that seems to work for now. I'll tackle broken Ubuntu / Nvidia packages another day ^_^ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#27738?email_source=notifications&email_token=AA5ZPXKYYZRHI5MTTMAW2DDQRSKD7A5CNFSM4I7XZG2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC4EVDQ#issuecomment-548948622>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA5ZPXOZGZQTY72M342TJ63QRSKD7ANCNFSM4I7XZG2A> .

Smerity · 2019-11-01T21:13:52Z

Using conda install pytorch torchvision cudatoolkit=10.1.243 -c pytorch still results in a RuntimeError on my 1080 Ti sadly:

  File "/home/smerity/anaconda3/envs/pyt/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 516, in forward_impl
    dtype=input.dtype, device=input.device)
RuntimeError: cuda runtime error (209) : no kernel image is available for execution on the device at /opt/conda/conda-bld/pytorch_1570910687650/work/aten/src/THC/generic/THCTensorMath.cu:35```

soumith · 2019-11-01T21:24:57Z

ok let me get my hands onto a 1080Ti. This shouldn't happen, it is weird.

soumith · 2019-11-01T22:15:52Z

@Smerity can you please confirm that running https://github.com/pytorch/examples/tree/master/word_language_model with python main.py --cuda is also failing on your end?

soumith · 2019-11-01T22:17:00Z

also can you give your output of nvidia-smi, and preferably open a new issue. I want to get to reproducing it, so far haven't been able to.

jjhelmus · 2019-11-04T15:42:56Z

@jjhelmus would you know when anaconda would upgrade to 10.1.243 if there's a plan

cudatoolkit 10.1.243 packages are now available in defaults for the linux-64 and win-64 platforms.

soumith · 2019-11-04T16:36:04Z

thanks, noticed that over the weekend :)

magic282 · 2019-11-11T14:47:26Z

Still have this problem.
conda package:

pytorch                   1.3.1           py3.6_cuda10.1.243_cudnn7.6.3_0    pytorch
cudatoolkit               10.1.243             h6bb024c_0

smi output:

Mon Nov 11 14:46:19 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.78       Driver Version: 410.78       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 0000A9AA:00:00.0 Off |                    0 |
| N/A   31C    P0    29W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 0000D120:00:00.0 Off |                    0 |
| N/A   30C    P0    29W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  Off  | 0000E26F:00:00.0 Off |                    0 |
| N/A   30C    P0    32W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  Off  | 0000F5EF:00:00.0 Off |                    0 |
| N/A   30C    P0    30W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

nvcc output:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

magic282 · 2019-11-11T14:57:35Z

Found that downgrading to pytorch 1.3.0 fixes this issue:

sudo /opt/conda/bin/conda install cudatoolkit=10.0 pytorch=1.3.0 -c pytorch -n pytorch-py3.6

But this seems to be just an ad-hoc fix.

sayakpaul · 2019-11-11T14:58:39Z

I am facing this issue on my GCP instance which equipped with CUDA 10.0. Is anyone else also facing the same on a GCP instance?

Also, out of curiosity, I ran !nvcc --version on a Colab notebook and I found out the version of CUDA there is also 10.0 and PyTorch 1.3.1 runs successfully.

justanhduc · 2019-11-12T08:50:19Z

I got a problem with cuda(). Loading to GPU using cuda() takes forever in an RTX 2080. I tried the following simple script

import torch as T
foo = T.rand(10)
foo.cuda()

The system hangs (or takes a very unreasonably long time that I cannot wait) at the last command. I checked nvidia-smi and found that the memory slowly raised.

Environment:

Pytorch 1.3.0 installed using the default conda command.
OS: Ubuntu 16.04 LTS
Python version: 3.7
CUDA runtime version: 10.1.243
CuDNN version: 7603

Edit: Fixed in 1.3.1.

soumith · 2019-11-14T19:57:16Z

@magic282 please try upgrading your nvidia driver to 430 or above and please confirm if that fixes things. I have just tried things on a P100 and GP100, and the binaries worked fine for me.

sayakpaul · 2019-11-15T01:06:17Z

I am facing this issue on my GCP instance which equipped with CUDA 10.0. Is anyone else also facing the same on a GCP instance?

Also, out of curiosity, I ran !nvcc --version on a Colab notebook and I found out the version of CUDA there is also 10.0 and PyTorch 1.3.1 runs successfully.

Anything on this? :(

soumith · 2019-11-15T03:22:33Z

@sayakpaul what GPU does your instance have? by the same issue, can you expand what you're seeing?

sayakpaul · 2019-11-15T09:18:48Z

@soumith it's a P100. I am seeing:

The NVIDIA driver on your system is too old (found version 10000).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

soumith · 2019-11-15T17:16:50Z

@sayakpaul as the error message says, upgrade your CUDA driver, you installed the CUDA 10.1 compatible pytorch package which is the default.

sayakpaul · 2019-11-16T00:36:36Z

@soumith yeah! But Colab too has CUDA10.0 and PyTorch 1.3.1 still runs.

soumith · 2019-11-16T03:47:33Z

colab is loaded up with special builds of pytorch that are built against cuda 10.0

magic282 · 2019-11-16T06:41:30Z

@soumith Thank you. Upgrading the driver indeed solves this problem. Actually I was running inside docker on a cluster, so I didn't know that the driver on the host machine is lower.

sayakpaul · 2019-11-16T10:08:45Z

@soumith thanks much for the clarification!

darolt · 2020-01-11T22:47:52Z

just to register here: got the same error on a fresh install of pytorch 1.3 and cuda 10.1. Both had the same cuda version given by conda list. For me, updating the driver was not an option because my card (Tesla k40) has as suggested driver (NVIDIA drivers page) 418.xx. Downgrading to pytorch 1.3.0 solved the problem for me.

AceEviliano · 2020-01-21T07:32:52Z

@darolt
I am new to this community so I don't understand where the above discussion finally ended. Can you help me with this ? I have a K40c as well and I am stuck with installing pytorch. I tried all versions and most of them take too long to load a model to GPU and then this pops up :

RuntimeError: cuda runtime error (209) : no kernel image is available for execution

here is some more info :

     active environment : base
    active env location : /home/rishi/anaconda3
            shell level : 1
       user config file : /home/rishi/.condarc
 populated config files : /home/rishi/.condarc
          conda version : 4.8.1
    conda-build version : 3.18.8
         python version : 3.7.3.final.0
       virtual packages : __cuda=10.1
                          __glibc=2.23
       base environment : /home/rishi/anaconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/rishi/anaconda3/pkgs
                          /home/rishi/.conda/pkgs
       envs directories : /home/rishi/anaconda3/envs
                          /home/rishi/.conda/envs
               platform : linux-64
             user-agent : conda/4.8.1 requests/2.22.0 CPython/3.7.3 Linux/4.4.0-170-generic ubuntu/16.04 glibc/2.23
                UID:GID : 1014:1014
             netrc file : None
           offline mode : False

and here's my gpu-info :

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40c          Off  | 00000000:02:00.0 Off |                    0 |
| 43%   74C    P0   132W / 235W |   3609MiB / 11441MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

phongnhhn92 changed the title ~~Can not use .cuda() function to load the model into GPU~~ Can not use .cuda() function to load the model into GPU using Pytorch 1.3 Oct 11, 2019

soumith added high priority module: binaries Anything related to official binaries that we release to users labels Oct 12, 2019

soumith closed this as completed Oct 12, 2019

zou3519 mentioned this issue Oct 15, 2019

cuda.model() takes too much time to load on PyTorch 1.3 #27982

Closed

olix86 mentioned this issue Oct 24, 2019

Pytorch 1.3.0 cuda not detected ivadomed/ivadomed#69

Closed

charleygros mentioned this issue Oct 24, 2019

Change == to >= for torch requirement ivadomed/ivadomed#70

Merged

haooooooqi mentioned this issue Nov 1, 2019

CUDA and CUDNN version facebookresearch/SlowFast#9

Closed

Can not use .cuda() function to load the model into GPU using Pytorch 1.3 #27738

Can not use .cuda() function to load the model into GPU using Pytorch 1.3 #27738

Comments

phongnhhn92 commented Oct 11, 2019 • edited by pytorch-probot bot

🐛 Bug

To Reproduce

Environment

mazzma12 commented Oct 11, 2019

phongnhhn92 commented Oct 11, 2019

Zehaos commented Oct 11, 2019

bryant1410 commented Oct 11, 2019

soumith commented Oct 11, 2019

soumith commented Oct 11, 2019

huyvnphan commented Oct 12, 2019 • edited

ssnl commented Oct 12, 2019 • edited

mazzma12 commented Oct 12, 2019

WenmuZhou commented Oct 12, 2019

leftthomas commented Oct 12, 2019 • edited

soumith commented Oct 12, 2019

soumith commented Oct 12, 2019

soumith commented Oct 12, 2019

soumith commented Oct 12, 2019

phongnhhn92 commented Oct 14, 2019

soumith commented Oct 14, 2019

phongnhhn92 commented Oct 15, 2019

jjhelmus commented Oct 16, 2019

MOAboAli commented Oct 18, 2019

phongnhhn92 commented Oct 18, 2019 • edited

MOAboAli commented Oct 18, 2019

MOAboAli commented Oct 18, 2019

isalirezag commented Oct 18, 2019

ptrblck commented Oct 21, 2019

ptrblck commented Oct 24, 2019

olix86 commented Oct 25, 2019 • edited

vlad-i commented Oct 28, 2019 • edited

KoalaSheep commented Oct 29, 2019 • edited

Smerity commented Nov 1, 2019 • edited

bryant1410 commented Nov 1, 2019 via email

Smerity commented Nov 1, 2019

soumith commented Nov 1, 2019

soumith commented Nov 1, 2019

soumith commented Nov 1, 2019

jjhelmus commented Nov 4, 2019

soumith commented Nov 4, 2019

magic282 commented Nov 11, 2019

magic282 commented Nov 11, 2019

sayakpaul commented Nov 11, 2019

justanhduc commented Nov 12, 2019 • edited

soumith commented Nov 14, 2019

sayakpaul commented Nov 15, 2019

soumith commented Nov 15, 2019

sayakpaul commented Nov 15, 2019

soumith commented Nov 15, 2019

sayakpaul commented Nov 16, 2019 • edited

soumith commented Nov 16, 2019

magic282 commented Nov 16, 2019

sayakpaul commented Nov 16, 2019

darolt commented Jan 11, 2020

AceEviliano commented Jan 21, 2020 • edited

phongnhhn92 commented Oct 11, 2019 •

edited by pytorch-probot bot

huyvnphan commented Oct 12, 2019 •

edited

ssnl commented Oct 12, 2019 •

edited

leftthomas commented Oct 12, 2019 •

edited

phongnhhn92 commented Oct 18, 2019 •

edited

olix86 commented Oct 25, 2019 •

edited

vlad-i commented Oct 28, 2019 •

edited

KoalaSheep commented Oct 29, 2019 •

edited

Smerity commented Nov 1, 2019 •

edited

justanhduc commented Nov 12, 2019 •

edited

sayakpaul commented Nov 16, 2019 •

edited

AceEviliano commented Jan 21, 2020 •

edited