Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 1.3 no longer supporting Tesla K40m? #30532

Open
JamesOwers opened this issue Nov 27, 2019 · 78 comments
Open

Version 1.3 no longer supporting Tesla K40m? #30532

JamesOwers opened this issue Nov 27, 2019 · 78 comments
Labels
module: cuda module: docs oncall: binaries triaged

Comments

@JamesOwers
Copy link

@JamesOwers JamesOwers commented Nov 27, 2019

馃悰 Bug

I am using a Tesla K40m, installed pytorch 1.3 with conda, using CUDA 10.1

To Reproduce

Steps to reproduce the behavior:

  1. Have a box with a Tesla K40m
  2. conda install pytorch cudatoolkit -c pytorch
  3. show cuda is available
python -c 'import torch; print(torch.cuda.is_available());'
>>> True
  1. Instantiate a model and call .forward()
Traceback (most recent call last):
  File "./baselines/get_results.py", line 395, in <module>
    main(args)
  File "./baselines/get_results.py", line 325, in main
    log_info = eval_main(eval_args)
  File "/mnt/cdtds_cluster_home/s0816700/git/midi_degradation_toolkit/baselines/eval_task.py", line 165, in main
    log_info = trainer.test(0, evaluate=True)
  File "/mnt/cdtds_cluster_home/s0816700/git/midi_degradation_toolkit/mdtk/pytorch_trainers.py", line 110, in test
    evaluate=evaluate)
  File "/mnt/cdtds_cluster_home/s0816700/git/midi_degradation_toolkit/mdtk/pytorch_trainers.py", line 220, in iteration
    model_output = self.model.forward(input_data, input_lengths)
  File "/mnt/cdtds_cluster_home/s0816700/git/midi_degradation_toolkit/mdtk/pytorch_models.py", line 49, in forward
    self.hidden = self.init_hidden(batch_size, device=device)
  File "/mnt/cdtds_cluster_home/s0816700/git/midi_degradation_toolkit/mdtk/pytorch_models.py", line 40, in init_hidden
    return (torch.randn(1, batch_size, self.hidden_dim, device=device),
RuntimeError: CUDA error: no kernel image is available for execution on the device

First tried downgrading to cudatoolkit=10.0, that exhibited same issue.

The code will run fine if you repeat steps above but instead conda install pytorch=1.2 cudatoolkit=10.0 -c pytorch.

Expected behavior

If no longer supporting a specific GPU, please bomb out upon load with useful error message.

Environment

Unfort ran your script after I 'fixed' so pytorch version will be 1.2 here - issue encountered with version 1.3.

Collecting environment information...
PyTorch version: 1.2.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Scientific Linux release 7.6 (Nitrogen)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)
CMake version: version 2.8.12.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Tesla K40m
Nvidia driver version: 430.50
cuDNN version: /usr/lib64/libcudnn.so.6.5.18

Versions of relevant libraries:
[pip3] numpy==1.16.3
[pip3] numpydoc==0.8.0
[conda] blas                      1.0                         mkl  
[conda] mkl                       2019.4                      243  
[conda] mkl-service               2.3.0            py37he904b0f_0  
[conda] mkl_fft                   1.0.15           py37ha843d7b_0  
[conda] mkl_random                1.1.0            py37hd6b4f25_0  
[conda] pytorch                   1.2.0           py3.7_cuda10.0.130_cudnn7.6.2_0    pytorch
[conda] torchvision               0.4.0                py37_cu100    pytorch

cc @ezyang @gchanan @zou3519 @jerryzh168 @ngimel

@albanD albanD added high priority oncall: binaries module: cuda triaged labels Nov 27, 2019
@albanD
Copy link
Contributor

@albanD albanD commented Nov 27, 2019

Just to be sure, were you using 1.3.0 or 1.3.1?

@JamesOwers
Copy link
Author

@JamesOwers JamesOwers commented Nov 27, 2019

1.3.1

conda list 'pytorch|cuda'
>>> # packages in environment at /home/s0816700/miniconda3/envs/mdtk:
>>> #
>>> # Name                    Version                   Build  Channel
>>> cudatoolkit               10.1.243             h6bb024c_0  
>>> pytorch                   1.3.1           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch

Was the env at point of failure.

@albanD
Copy link
Contributor

@albanD albanD commented Nov 27, 2019

cc @ngimel

@SsnL
Copy link
Collaborator

@SsnL SsnL commented Nov 27, 2019

K40m has a compute capability of 3.5, which I believe we have dropped support of.

@JamesOwers
Copy link
Author

@JamesOwers JamesOwers commented Nov 27, 2019

Ok. Please may you implement a useful "oldgpu" warning? Like here: #6529

Error at the moment very unclear to casual user like me.

--- EDIT ---
Would also be great to link users:

  1. to a page detailing what compute capacity you support (if this exists) and
  2. how to find out what the compute capacity of your GPU is (I guess here: https://developer.nvidia.com/cuda-gpus#compute for most?)

Struggl(ed/ing) to find both of those things!

As an aside, @SsnL - possibly this line needs updating if you are correct:

on an NVIDIA GPU with compute capability >= 3.0.
. Where did you get your information about minimal compute capability support?

@ptrblck
Copy link
Collaborator

@ptrblck ptrblck commented Nov 30, 2019

@JamesOwers If I'm not mistaken, this commit bumped the minimal compute capability to 3.7.

@xsacha
Copy link
Contributor

@xsacha xsacha commented Nov 30, 2019

There's no technical reason for it to be changed to 3.7 right?
The code still supports 3.5 (and even 3.0 again).

This is just for Conda? Looks like it went from 3.5 and 5.0+ to 3.7 and 5.0+ so it was always missing either 3.5 or 3.7. I suppose it takes too long/becomes too large to support more than 2 built architectures.

@ptrblck
Copy link
Collaborator

@ptrblck ptrblck commented Dec 1, 2019

@soumith might correct me, but I think the main reason is the growing size of the binaries.

@xsacha
Copy link
Contributor

@xsacha xsacha commented Dec 2, 2019

@ptrblck that is the reason but it is strange it went from supporting K40 (+ several consumer cards) and not K80 to supporting K80 and not K40 (+ several consumer cards).

on an NVIDIA GPU with compute capability >= 3.0.

I also wish there was a way for the message to reflect the minimum cuda arch from the cuda arch list for when it was compiled. This would make it easier when it gets changed to 3.7, for example. Or when a user supports 3.0 by compiling it themselves.

@ezyang
Copy link
Contributor

@ezyang ezyang commented Dec 3, 2019

This is also being discussed at #24205 (comment)

@jeherr
Copy link

@jeherr jeherr commented Dec 16, 2019

I'd just like to suggest that the compatible compute capabilities for the precompiled binaries be added somewhere to the documentation, especially when providing installation instructions for the binaries. That information does not appear to be readily available anywhere.

@ngimel ngimel added the module: docs label Dec 16, 2019
@shiyongde
Copy link

@shiyongde shiyongde commented Mar 4, 2020

k40m with cuda10.0 get the same error!!!
build from source get more Error!!!

@jayenashar
Copy link

@jayenashar jayenashar commented May 1, 2020

hi guys, i have made a python 3.6 pytorch 1.3.1 linux_x86_64 wheel without restriction on compute capability, and it's working on my 3.5 GPU. i would be more than happy to build wheels for different python and pytorch versions if someone can tell me a proper distribution channel (i.e. not google drive).

jayenashar referenced this issue in pytorch/builder May 11, 2020
@aln3
Copy link

@aln3 aln3 commented May 15, 2020

@jayenashar Are you able to provide instruction to build Pytorch version 1.3.1 for a specific GPU (NVIDIA Tesla K20 GPU) & Python 3.6.8? I've attempted to build a compatible version but am still having hardware compatibility issues:
[W NNPACK.cpp:77] Could not initialize NNPACK! Reason: Unsupported hardware.

@jayenashar
Copy link

@jayenashar jayenashar commented May 15, 2020

@anowlan123 I don't see a reason to build for a specific GPU, but I believe you can export the environment variable TORCH_CUDA_ARCH_LIST for your specific compute capability (3.5), then use the build-from-source instructions for pytorch.

The pytorch 1.3.1 wheel I made should work for you (python 3.6.9, NVIDIA Tesla K20 GPU). I setup a pypi account to try and distribute it, but it seems there is a 60MB limit, and my wheel is 139MB. So I have uploaded it here: https://github.com/UNSWComputing/pytorch/releases/download/v1.3.1/torch-1.3.1-cp36-cp36m-linux_x86_64.whl

@PeteKey
Copy link

@PeteKey PeteKey commented May 18, 2020

Dear @anowlan123,

I would be very interested in a wheel of pytorch1.4 that work with Keppler K40 and cuda9.2. Would you be able to help out? I am thinking about installing that via miniconda.

@jayenashar
Copy link

@jayenashar jayenashar commented May 18, 2020

@PeteKey you didn't specify a python version, but i made a wheel with python 3.6, pytorch 1.4.1, and magma-cuda92. please try it here: https://github.com/UNSWComputing/pytorch/releases/download/v1.4.1/torch-1.4.1-cp36-cp36m-linux_x86_64.whl if you have any issues, please upgrade to cuda10.2.

@PeteKey
Copy link

@PeteKey PeteKey commented May 18, 2020

@anowlan123, python 3.6 is fine but I guess something is not quite working yet. Any idea how to fix this? I am running this on ubuntu 14.04 if that matters.

File "/home/pk/miniconda3/envs/pytorch1.4py36_unsw_anowlan123/lib/python3.6/site-packages/torch/init.py", line 81, in
from torch._C import *
ImportError: libmkl_intel_lp64.so: cannot open shared object file: No such file or directory

@aln3
Copy link

@aln3 aln3 commented May 18, 2020

@jayenashar Thanks, still having the same compatibility issues though. @PeteKey once i create an conda enviroment, i used this script to build from source.

#!/bin/bash
#Make sure conda enviroment is activated
cd /home/user
conda activate env

conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi
conda install -c pytorch magma-cuda101

cd ~/anaconda3/envs/env/compiler_compat
mv ld ld-old

Prep Pyorch Repo

cd /home/user/Downloads
git clone --recursive https://github.com/pytorch/pytorch
cd /home/user/Downloads/pytorch
git submodule sync
git submodule update --init --recursive

Specify environment variables for specific pytorch build

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
export PYTORCH_BUILD_VERSION=1.3.1
export PYTORCH_BUILD_NUMBER=1
export TORCH_CUDA_ARCH_LIST=3.5

Build and install

python setup.py install

Clean the build

setup.py clean --all

cd ~/anaconda3/envs/env/compiler_compat
mv ld-old ld

@jayenashar
Copy link

@jayenashar jayenashar commented Oct 13, 2020

@nelson-liu that's great. then i only need to worry about conda packages.

this is how i test for cuda: python -c 'import torch; torch.randn([3,5]).cuda()'

@Guptajakala
Copy link

@Guptajakala Guptajakala commented Oct 30, 2020

@jayenashar
That's so great to see your releases!
I'm planning to subscribe to your future releases as well if you still plan to keep update with later pytorch releases. Would you plan to always release to that forked repo? Where should I request specific build, here or elsewhere?

@jayenashar
Copy link

@jayenashar jayenashar commented Oct 30, 2020

@Guptajakala yes i will release to that forked repo, unless someone knows a better place. i tried pypi but it seems they have a file size limit and that is the reason the official builds don't support old GPUs. i can try an anaconda channel.

right now i'm taking requests here as it seems to be the discoverable place.

@Guptajakala
Copy link

@Guptajakala Guptajakala commented Oct 30, 2020

@jayenashar
Hi, does this work with python 3.6.9?
image
I downloaded this one and run command
conda install ./pytorch-1.6.0-py3.7_cuda10.1.243_cudnn7.6.3_0.tar.bz2

After installation, I import it but it says
No module named 'torch'

conda list shows this item
pytorch 1.6.0 py3.7_cuda10.1.243_cudnn7.6.3_0 <unknown>

@jayenashar
Copy link

@jayenashar jayenashar commented Oct 30, 2020

@Guptajakala no you can't use a py3.7 pytorch with python 3.6.

do you want me to build you one for python 3.6?

@Guptajakala
Copy link

@Guptajakala Guptajakala commented Oct 30, 2020

@jayenashar
that would be great, thank you!

@jayenashar
Copy link

@jayenashar jayenashar commented Oct 30, 2020

@Guptajakala https://github.com/UNSWComputing/pytorch/releases/tag/v1.7.0

@Guptajakala
Copy link

@Guptajakala Guptajakala commented Oct 30, 2020

@jayenashar aweseome!

@igor-krawczuk
Copy link

@igor-krawczuk igor-krawczuk commented Dec 26, 2020

@nelson-liu thank you very much for the wheels, the 1.6 wheel helped me a lot to get a project running just now

@jayenashar also thanks to you for being so incredible considerate! if you are still taking requests, could I ask you to please create a build for the following?

  • pytorch: the latest possible (1.7,1.8,1.9?)
  • k40c Driver Version: 435.21
  • conda python 3.8
  • cuda 10.1 according to nvidia-smi

@jayenashar
Copy link

@jayenashar jayenashar commented Dec 26, 2020

@igor-krawczuk https://github.com/UNSWComputing/pytorch/releases/tag/v1.7.0

sorry i didn't realise there is a 1.7.1. let me know if you absolutely need it.

@AlbertoZandara
Copy link

@AlbertoZandara AlbertoZandara commented Jan 4, 2021

@jayenashar Thank you very much for your availability, I have a Nvidia 920m with cuda capabilities 3.5, can I use your prebuilt packages?

  • pytorch: the latest possible (1.7,1.8,1.9?)
  • Nvidia 920m Driver Version: 419.67
  • conda python 3.8.5
  • cuda 10.1 according to nvidia-smi

PS: If I use windows could I have some installation problems? Because I tried to install it but after the installation it says
No module named 'torch'

@jayenashar
Copy link

@jayenashar jayenashar commented Jan 4, 2021

hi @AlbertoZandara any CUDA GPU will work with my packages. pytorch remove the old compute capabilities to save space and that's why my packages are much larger.

unfortunately i don't have windows, so i am only building linux packages: #30532 (comment)

@sophiaas
Copy link

@sophiaas sophiaas commented Mar 24, 2021

Hi @jayenashar, any chance I can get a 1.8.0 wheel for the following?

PyTorch: 1.8.0
Python: 3.8
CUDA: 10.1
conda

Thanks so much for your help!

@jayenashar
Copy link

@jayenashar jayenashar commented Mar 24, 2021

hi @sophiaas unfortunately the way i used to build isn't working anymore. i'll try and get them for you another way, hopefully in the next week, but no promises.

@sophiaas
Copy link

@sophiaas sophiaas commented Mar 24, 2021

@jayenashar appreciate you!

@jayenashar
Copy link

@jayenashar jayenashar commented Mar 28, 2021

@sophiaas i regret to inform you that i can't figure it out. i can help you to build from source, maybe?

@geemk
Copy link

@geemk geemk commented Apr 22, 2021

Hi @jayenashar, any chance I can get a wheel for the most advanced version of PyTorch (>= 1.5) for the following configurations

Tesla K40c
PyTorch: >= 1.5
Python: 3.6.13
CUDA: 9.2
conda

Thank you so much i'm becoming desperate to make this work

@jayenashar
Copy link

@jayenashar jayenashar commented Apr 24, 2021

@beifengli
Copy link

@beifengli beifengli commented May 29, 2021

@jayenashar Thank you very much for your availability, I have a Nvidia 820m which compute capability is only 2.1, can I get a possible wheel for the following

  • PyTorch: latest possible
  • Python: >=3.6
  • CUDA: 9.1 or 9.0
  • conda

Thanks so much for your help!

@jayenashar
Copy link

@jayenashar jayenashar commented May 29, 2021

@beifengli looks like CUDA 9.1 was never supported and 9.0 was last supported in pytorch 1.1. Have you tried https://pytorch.org/get-started/previous-versions/#v110 ?

@beifengli
Copy link

@beifengli beifengli commented May 29, 2021

@jramseyer I try to use pytorch 1.0 and 1.1, but it always show me the following information:
image

I found that pytorch from version 0.2 recommand GPU compute capability >= 3.0. But my GPU is too old and it's compute capability is only 2.1.

image

my test code:

import torch
from torch import nn, tensor
from torch.cuda import device_count

x=torch.rand(5,3)
print(x)
print("available:",torch.cuda.is_available()," device_count:",
torch.cuda.device_count()," current_device:",
torch.cuda.current_device())

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

x = torch.tensor([1, 2, 3], device=device)
# or
#x = torch.tensor([1, 2, 3]).to(device)
y = torch.tensor([1,4,9]).to(device)

print(x,y)
print(x+y)

I really want to use my GPU, I hope you can help, thank you very much.

@jayenashar
Copy link

@jayenashar jayenashar commented May 29, 2021

sorry i don't think i can help you. 0.1.12 seems like your best chance

@beifengli
Copy link

@beifengli beifengli commented May 29, 2021

@jayenashar thank you very much. I will try it right away

ktagen-sudo added a commit to UNM-CARC/QuickBytes that referenced this issue Oct 9, 2021
Pytorch+GPU **silently** fails with the previous tutorial. It means that with a simple command similar to this "torch.cuda.is_available()" it will show the GPU. It would even "allocate"  a tensor to the device with a command similar to this "y = torch.tensor([1,4,9]).to(device)". However, when doing more advanced commands such as  ".forward()" or even matrices operations it would print an error similar to this "RuntimeError: CUDA error: no kernel image is available for execution on the device" .
The following command above fixes the issue. The pip command installed pytorch in this directory "/users/kfotso/.conda/envs/compat_gpu/lib/python3.7/site-packages/" .

**More info here**: --> https://blog.nelsonliu.me/2020/10/13/newer-pytorch-binaries-for-older-gpus/ and here pytorch/pytorch#30532 .
**Package can be found here**: --> https://github.com/nelson-liu/pytorch-manylinux-binaries/releases
I just downloaded it from here.
@sbrl
Copy link

@sbrl sbrl commented Mar 3, 2022

Looks like your error message for this is buggy:

/home/USERNAME/.conda/envs/py38/lib/python3.8/site-packages/torch/cuda/__init__.py:120: UserWarning: 
    Found GPU%d %s which is of cuda capability %d.%d.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability supported by this library is %d.%d.

It would be helpful if you actually filled in the %d values here. Tested on a Tesla K40m on my University's HPC with Python 3.8.8 & PyTorch 1.10.2+cu102. I came through this StackOverflow answer which has over 2K views.

I strongly suggest updating your main website page (https://pytorch.org/get-started/locally/) to clearly state the minimum CUDA capability index supported. You link to https://developer.nvidia.com/cuda-zone, which gives people the false impression that any CUDA-capable Nvidia GPU will work, which is not the case.

@weiaicunzai
Copy link

@weiaicunzai weiaicunzai commented Jun 11, 2022

Looks like your error message for this is buggy:

/home/USERNAME/.conda/envs/py38/lib/python3.8/site-packages/torch/cuda/__init__.py:120: UserWarning: 
    Found GPU%d %s which is of cuda capability %d.%d.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability supported by this library is %d.%d.

It would be helpful if you actually filled in the %d values here. Tested on a Tesla K40m on my University's HPC with Python 3.8.8 & PyTorch 1.10.2+cu102. I came through this StackOverflow answer which has over 2K views.

I strongly suggest updating your main website page (https://pytorch.org/get-started/locally/) to clearly state the minimum CUDA capability index supported. You link to https://developer.nvidia.com/cuda-zone, which gives people the false impression that any CUDA-capable Nvidia GPU will work, which is not the case.

Agreed, a CUDA capability matrix of Pytorch on the official website would be very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cuda module: docs oncall: binaries triaged
Projects
None yet
Development

No branches or pull requests