New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cudasim acting differently than Cuda (when allocating) #6055
Comments
With the code as written, the compiler could optimize away Unfortunately there isn't a way to allocate global memory in a CUDA kernel - this is supported by CUDA C/C++, but has not yet been implemented in Numba. |
I got a minimal example: CUDASIM = False
#CUDASIM = True
if CUDASIM:
import os
os.environ['NUMBA_ENABLE_CUDASIM'] = '1'
from numba import cuda, float32, float64, int32
@cuda.jit(device=True)
def f(p):
a = cuda.local.array(2, float64)
a[0] = p[0]
a[1] = p[1]
b = 1.
return a, b ## Variation 1
#return a ## Variation 2
@cuda.jit
def kernel(ps, ret_a):
bidx = cuda.threadIdx.x
# worker index
if bidx >= len(ps): return
# Algorithm
a, b = f(ps[bidx]) ## Variation 1
#a = f(ps[bidx]) ## Variation 2
ret_a[bidx, 0] = a[0]
ret_a[bidx, 1] = a[1]
return
n = 10
ps = np.random.rand(n, 2)
a = np.zeros((n, 2))
kernel[10,10](ps, a)
print(a) With cuda it prints a list of [0. 0.] (array is discarded), while with cudasim it gives out random numbers (the input |
Are you using the latest version of Numba (0.50.1)? Varation 1 behaves similarly for me with Variation 2 seems to be a limitation of the current implementation, but that could probably be fixed without too much effort given that returning an array as part of a tuple appears to be working. |
Hello, I have Numba version 0.50.1 installed.
|
@mha-py Could you post the output of |
Here the output:
|
This issue is marked as stale as it has had no activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with any updates and confirm that this issue still needs to be addressed. |
@mha-py quick question: is this still present in the current version (0.53.1)? |
This issue is marked as stale as it has had no activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with any updates and confirm that this issue still needs to be addressed. |
Just checked this with Numba 0.54 RC:
I also noticed that this is returning a local array from a function, which isn't expected to work (see also discussion in #7090), so I'm going to close this. |
Hello!
I had a bug which i finally found but when debugging i found that cudasim was acting differently (had no bug) than cuda (had bug):
It was like:
With cuda, "a" seems to be discarded, but with cudasim it has its values which made debugging quite hard. Maybe in cudasim mode a could also be discarded?
I have another question: How do you allocate global memory in a cuda kernel? I only found giving the function an array which is allocated on CPU site, but I cant find a pendant to cuda.local.array?
And another suggestion: In cudasim you can use print(), in cuda it throws an error. I think it would be convenient to simply ignore print() in pure cuda mode, because then there is no need to comment the print statements when switching cuda and cudasim mode.
Thanks!
The text was updated successfully, but these errors were encountered: