-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not use .cuda() function to load the model into GPU using Pytorch 1.3 #27738
Comments
Same issue here. I think it is related to the release of 1.3.0. Installing 1.2.0 solved it for me |
I think the problem is related to CUDA version since the new pytorch 1.3 is built using CUDA 10.1.243 and my current CUDA version is 10.1.168 (I installed it from the conda package). I guess I have to wait until the cuda conda package gets some updates to the new version. Another solution is installing cuda 10.1.243 manually. |
Same issue here. Current conda cudatoolkit version is old(10.1.168). |
Guys, you can still use pytorch=1.3.0 with cudatoolkit=10.0 |
@jjhelmus would you know when anaconda would upgrade to 10.1.243 if there's a plan. Also, if there's a better way to ask about anaconda cuda / cudnn upgrades let me know I'll follow it :) |
@ptrblck can you folks try to repro this (see the user's 2nd comment), basically they are running into a hang of some sort with 10.1.243 vs 10.1.168 |
I have the same issue. Right now, the only solution is to revert back to Pytorch 1.2 |
@soumith I can reproduce this with system CUDA |
Intersting, so the cuda version would not be the cause of the problem ? Where do you find information about the cuda version required for each torch version, and the installation steps ? I couldn't find it on torch website. I still have: |
pytorch1.3 cuda10 has same error |
another issue reporter reports that the startup time for 10.1 is in minutes, and we are looking. so, it's not deadlocked, but starts after a few minutes. looks like some PTX->SASS compilation happening: #27807 |
@leftthomas @phongnhhn92 @ssnl can you confirm or deny that |
This issue is now fixed with newly updated binaries. |
@phongnhhn92 upgrade your NVIDIA driver |
It works. Thanks! |
I'll add an update to the |
I get same Error but I don't have NVIDIA Driver to upgrade , Display Driver :Intel HD Graphics 4000, so in my case what can i do ? |
In your case, you dont have nvidia gpu then you shouldn’t use .cuda() function at all. |
So what is the replacement ? |
I still see that error, although i have cuda 10.1 and nvcc -V shows that i have cuda 10.1 |
@mohamedaboali1990 as @phongnhhn92 explained, you won't be able to use the @isalirezag did you (re)install the binaries after the fix was published? |
@KoalaSheep Your local CUDA (and cudnn) installations won't be used, as the PyTorch binaries ship with their own CUDA, cudnn and other libs. Could you create a new conda environment and reinstall the latest PyTorch version, please? |
Is this issue specific to conda? I'm having the same issue with CUDA 10.0 and pytorch 1.3.0 but using pip... @soumith |
I'm also looking into this, trying to get the Nvidia driver to update via the terminal, no luck so far. |
Thank you for reply. After make a new environment, I still face the same problem. I reinstall pytorch 1.2. cudatoolkit is downgraded to 10.0.130-0 and it works well for me now. |
Just to note I ran into a potentially related issue in upgrading to PyTorch 1.3 so in case this helps anyone: My Titan V card had no issue but my GTX 1080 Ti reported "cuda runtime error (209): ... no kernel image is available for execution on the device" upon using something from This was with Whilst this may be related to an older Nvidia driver my system decided to make upgrading that difficult. I instead ran I'll tackle broken Ubuntu / Nvidia packages another day ^_^ |
With cudatoolkit=10.1.243?
…On Fri, Nov 1, 2019 at 4:59 PM Stephen Merity ***@***.***> wrote:
Just to note I ran into a potentially related issue in upgrading to
PyTorch 1.3 so in case this helps anyone:
My Titan V card had no issue but my GTX 1080 Ti reported "no kernel image
is available for execution on the device".
This was with conda install pytorch cudatoolkit=10.1 -c pytorch.
Whilst this may be related to an older Nvidia driver my system decided to
make upgrading that difficult. I instead ran conda install pytorch
torchvision cudatoolkit=10.0 -c pytorch and that seems to work for now.
I'll tackle broken Ubuntu / Nvidia packages another day ^_^
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#27738?email_source=notifications&email_token=AA5ZPXKYYZRHI5MTTMAW2DDQRSKD7A5CNFSM4I7XZG2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC4EVDQ#issuecomment-548948622>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA5ZPXOZGZQTY72M342TJ63QRSKD7ANCNFSM4I7XZG2A>
.
|
Using
|
ok let me get my hands onto a 1080Ti. This shouldn't happen, it is weird. |
@Smerity can you please confirm that running https://github.com/pytorch/examples/tree/master/word_language_model with |
also can you give your output of |
cudatoolkit 10.1.243 packages are now available in |
thanks, noticed that over the weekend :) |
Still have this problem.
smi output:
nvcc output:
|
Found that downgrading to pytorch 1.3.0 fixes this issue:
But this seems to be just an ad-hoc fix. |
I am facing this issue on my GCP instance which equipped with CUDA 10.0. Is anyone else also facing the same on a GCP instance? Also, out of curiosity, I ran |
I got a problem with
The system hangs (or takes a very unreasonably long time that I cannot wait) at the last command. I checked Environment:
Edit: Fixed in 1.3.1. |
@magic282 please try upgrading your nvidia driver to 430 or above and please confirm if that fixes things. I have just tried things on a P100 and GP100, and the binaries worked fine for me. |
Anything on this? :( |
@sayakpaul what GPU does your instance have? by the same issue, can you expand what you're seeing? |
@soumith it's a P100. I am seeing:
|
@sayakpaul as the error message says, upgrade your CUDA driver, you installed the CUDA 10.1 compatible pytorch package which is the default. |
@soumith yeah! But Colab too has CUDA10.0 and PyTorch 1.3.1 still runs. |
colab is loaded up with special builds of pytorch that are built against cuda 10.0 |
@soumith Thank you. Upgrading the driver indeed solves this problem. Actually I was running inside docker on a cluster, so I didn't know that the driver on the host machine is lower. |
@soumith thanks much for the clarification! |
just to register here: got the same error on a fresh install of pytorch 1.3 and cuda 10.1. Both had the same cuda version given by conda list. For me, updating the driver was not an option because my card (Tesla k40) has as suggested driver (NVIDIA drivers page) 418.xx. Downgrading to pytorch 1.3.0 solved the problem for me. |
@darolt
here is some more info :
and here's my gpu-info :
|
🐛 Bug
I am trying to run the Captum CIFAR10 example link and I want to test it on GPU so I modified a line
net = Net().cuda()
to load the model into the GPU (I am having a single GPU RTX 2080TI). However I got this error:At the moment, I am using NVIDIA driver version 410. I have tried to upgrade NVIDIA GPU driver to version 435 and I don't see that error anymore but the code got stuck trying to load the model into the GPU.
To Reproduce
Steps to reproduce the behavior:
Environment
Collecting environment information...
PyTorch version: 1.3.0
Is debug build: No
CUDA used to build PyTorch: 10.1.243
OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2
Python version: 3.7
Is CUDA available: No
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: GeForce RTX 2080 Ti
Nvidia driver version: 410.104
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] numpy==1.17.0
[pip3] numpydoc==0.7.0
[conda] _tflow_select 2.3.0 mkl
[conda] blas 1.0 mkl
[conda] captum 0.1.0 0 pytorch
[conda] mkl 2019.4 243
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.0.14 py37ha843d7b_0
[conda] mkl_random 1.0.2 py37hd81dba3_0
[conda] pytorch 1.3.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] tensorflow 1.14.0 mkl_py37h45c423b_0
[conda] tensorflow-base 1.14.0 mkl_py37h7ce6ba3_0
[conda] torchtext 0.4.0 pypi_0 pypi
[conda] torchvision 0.4.1 py37_cu101 pytorch
cc @ezyang @gchanan @zou3519
The text was updated successfully, but these errors were encountered: