New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slices of dynamic shared arrays all alias #5073
Comments
Thanks for the report. I'm of the view that the simulator behaviour is the expected behaviour too. Mark as a bug to fix? |
Sounds good - hoping to look into this fairly soon, so I don't have to write "Don't use slices of dynamic shared memory because it doesn't work" in the docs :-) |
One cause of the problem is that the shape and element count of a dynamic shared array is 0. If I manually hack the creation of the shared array such that its shape and element count are correct for this particular instance: diff --git a/numba/cuda/cudaimpl.py b/numba/cuda/cudaimpl.py
index 31ede32..170b57f 100644
--- a/numba/cuda/cudaimpl.py
+++ b/numba/cuda/cudaimpl.py
@@ -629,6 +629,9 @@ def _generic_array(context, builder, shape, dtype, symbol_name, addrspace,
elemcount = reduce(operator.mul, shape)
lldtype = context.get_data_type(dtype)
laryty = Type.array(lldtype, elemcount)
+ if elemcount == 0:
+ elemcount = 2
+ shape = (2,)
if addrspace == nvvm.ADDRSPACE_LOCAL:
# Special case local addrespace allocation to use alloca then the correct output is produced in this case. |
The root of the issue was that computations of indices and slice bounds were incorrect because the shape of dynamic shared memory is generally (0,). To fix this, we compute the shape (1-D only) of dynamic shared arrays using the dynamic shared memory size and the itemsize of the type of the array when it is created. This is implemented by reading the special register %dynamic_smem_size - unfortunately NVVM doesn't provide an intrinsic for this, so we access it using inline assembly.
Fix #5073: Slices of dynamic shared memory all alias
The following:
when run on a CUDA device:
and on the simulator:
Although we haven't documented dynamic shared memory (yet - I discovered this while writing examples), I think the simulator behaviour should be considered correct, because it matches what a normal user would expect to happen.
The PTX (edited for clarity) clearly does nothing to differentiate the slices, but I am yet to work back any further on the issue:
The text was updated successfully, but these errors were encountered: