-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fill() regression in 0.3.9 #19
Comments
I have an impression that 0.3.9 was a broken build which I uploaded to Pypi by accident. And Pypi doesn't allow me to overwrite such packages. |
Can't reproduce so far.. |
Try deleting the file "__ptx_cache__.db". Chance is that there's some broken ptx code. |
Thanks, found the file here on Colab: |
There should be some detailed failure message printed through stdout from C code. But using Colab, you can't see it. We shall take a look if it is also reproduced in an enviroment where stdout is visible. |
got it by executing python through shell:
full output:
|
What is the CUDA version and GPU type being used here? In ThrustRTC, I use: Not very sure yet how to fix this. Most likely, the arch compute_70 is not supported by the CUDA (nvrtc) being used here. A CUDA version above 10 should not have such issue, |
Thanks from numba import cuda
cuda.detect() gives
|
and
|
Ok. That is compute_30 hitting the lower-bound of CUDA 11.1. In a Kepler + CUDA 11.1 setup, basically any CUDA code that requires compilation won't work. For ThrustRTC, there are 2 different cases where you'll hit a compilation problem: |
Thanks. There is no control over hardware in Colab. Will try getting non-default CUDA version... but it's tricky to ask users to do it every time... Why is it working then with ThrustRTC 0.3.8? Can we do any better in making this reported to the users by ThrustRTC? |
I think it was "working" in 0.3.8 just because some checking was missing. It might give you some incorrect result silently back then. |
Confirming it works on Colab after downgrading CUDA. |
Downgrading CUDA make the short example above work indeed, but trying with the actual code I've earlier extracted the minimal reproducer I now get (with CUDA 10.0):
Any hints? |
Try export LD_LIBRARY_PATH="/usr/local/cuda/lib64" |
Especially, if you downgraded CUDA by hack, you need to make sure that libnvrtc-builtins.so from CUDA 10 is within LD_LIBRARY_PATH, not the one from CUDA 11. |
It works out of the box with default Colab Cuda with ThrustRTC 0.3.15! |
The following used to work up until 0.3.8 but fails with newer versions without any clear error message:
Here's a example on fresh Google Colab GPU runtime:
OK on 0.3.8:
The text was updated successfully, but these errors were encountered: