Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pycuda uses unavailable compute capabilities on older versions of CUDA with new hardware #134

Open
mbrubake opened this issue Feb 8, 2017 · 4 comments

Comments

@mbrubake
Copy link
Contributor

mbrubake commented Feb 8, 2017

pycuda defaults to asking nvcc to use the maximum compute capability available on the GPU. This fails if the version of CUDA doesn't support the compute capability. For instance, if you're trying to use a GTX 1080 on CUDA 7.5 you get error messages like:

ExecError: error invoking 'nvcc --preprocess -arch sm_61 -Ifile.cu --compiler-options -P': [Errno 2] No such file or directory

The solution seems to be to use the highest compute capability available in CUDA that's supported by the card, but I'm not sure the best way to do that.

@inducer
Copy link
Owner

inducer commented Feb 8, 2017

You can force an arch by passing an argument to SourceModule: https://documen.tician.de/pycuda/driver.html#pycuda.compiler.SourceModule

I'd be happy to take a patch/pull request that reads an environment variable like PYCUDA_DEFAULT_JIT_ARCH. (e.g.)

@mbrubake
Copy link
Contributor Author

mbrubake commented Feb 8, 2017

Is there an easy way to determine what the maximum supported compute capability of the linked version of CUDA is? Seems like we want to use an arch which is min(max supported by CUDA, max supported by device).

@inducer
Copy link
Owner

inducer commented Feb 8, 2017

Short of parsing nvcc output, I don't think so.

@vincefn
Copy link
Contributor

vincefn commented Dec 12, 2017

Having an environment variable like PYCUDA_DEFAULT_JIT_ARCH would be very useful.

For custom kernels you can indeed use the arch argument, but this is not possible for ElementWise or Reduction kernels (and I guess Parallel Scan, but I do not use them).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants