Slices of dynamic shared arrays all alias #5073

gmarkall · 2020-01-13T17:27:10Z

The following:

from numba import cuda, int32
import numpy as np


@cuda.jit(debug=True)
def sm_slice(x):
    dynsmem = cuda.shared.array(0, dtype=int32)
    sm1 = dynsmem[0:1]
    sm2 = dynsmem[1:2]

    sm1[0] = 1
    sm2[0] = 2
    x[0] = sm1[0]
    x[1] = sm2[0]


arr = np.zeros(2, dtype=np.int32)
nshared = 2 * arr.dtype.itemsize
sm_slice[1, 1, 0, nshared](arr)
print(arr)

when run on a CUDA device:

$ python alias.py 
[2. 2.]

and on the simulator:

$ NUMBA_ENABLE_CUDASIM=1 python alias.py 
[1. 2.]

Although we haven't documented dynamic shared memory (yet - I discovered this while writing examples), I think the simulator behaviour should be considered correct, because it matches what a normal user would expect to happen.

The PTX (edited for clarity) clearly does nothing to differentiate the slices, but I am yet to work back any further on the issue:

.visible .entry sm_slice(
        .param .u64 sm_slice_param_0,
        .param .u64 sm_slice_param_1,
        .param .u64 sm_slice_param_2,
        .param .u64 sm_slice_param_3,
        .param .u64 sm_slice_param_4,
        .param .u64 sm_slice_param_5,
        .param .u64 sm_slice_param_6
)
{
        .reg .b32       %r<5>;
        .reg .b64       %rd<5>;


        ld.param.u64    %rd1, [sm_slice_param_4];
        cvta.to.global.u64      %rd2, %rd1;
        mov.u64         %rd3, 0;
        mov.u64         %rd4, %rd3;
        // dbg 11
        mov.u32         %r1, 1;
        st.shared.u32   [_cudapy_smem_1], %r1;
        // dbg 12
        mov.u32         %r2, 2;
        st.shared.u32   [_cudapy_smem_1], %r2;
        // dbg 13
        ld.shared.u32   %r3, [_cudapy_smem_1];
        st.global.u32   [%rd2], %r3;
        // dbg 14
        ld.shared.u32   %r4, [_cudapy_smem_1];
        st.global.u32   [%rd2+4], %r4;
        ret;
}

The text was updated successfully, but these errors were encountered:

stuartarchibald · 2020-01-13T18:09:56Z

Thanks for the report. I'm of the view that the simulator behaviour is the expected behaviour too. Mark as a bug to fix?

gmarkall · 2020-01-13T20:52:45Z

Sounds good - hoping to look into this fairly soon, so I don't have to write "Don't use slices of dynamic shared memory because it doesn't work" in the docs :-)

gmarkall · 2020-01-16T23:43:53Z

One cause of the problem is that the shape and element count of a dynamic shared array is 0. If I manually hack the creation of the shared array such that its shape and element count are correct for this particular instance:

diff --git a/numba/cuda/cudaimpl.py b/numba/cuda/cudaimpl.py
index 31ede32..170b57f 100644
--- a/numba/cuda/cudaimpl.py
+++ b/numba/cuda/cudaimpl.py
@@ -629,6 +629,9 @@ def _generic_array(context, builder, shape, dtype, symbol_name, addrspace,
     elemcount = reduce(operator.mul, shape)
     lldtype = context.get_data_type(dtype)
     laryty = Type.array(lldtype, elemcount)
+    if elemcount == 0:
+        elemcount = 2
+        shape = (2,)
 
     if addrspace == nvvm.ADDRSPACE_LOCAL:
         # Special case local addrespace allocation to use alloca

then the correct output is produced in this case.

The root of the issue was that computations of indices and slice bounds were incorrect because the shape of dynamic shared memory is generally (0,). To fix this, we compute the shape (1-D only) of dynamic shared arrays using the dynamic shared memory size and the itemsize of the type of the array when it is created. This is implemented by reading the special register %dynamic_smem_size - unfortunately NVVM doesn't provide an intrinsic for this, so we access it using inline assembly.

Fix #5073: Slices of dynamic shared memory all alias

stuartarchibald added needtriage CUDA CUDA related issue/PR labels Jan 13, 2020

stuartarchibald added bug and removed needtriage labels Jan 13, 2020

gmarkall mentioned this issue Jan 17, 2020

2D+ Dynamic Shared Memory support #2463

Open

seibert closed this as completed in 29bc72f Feb 26, 2020

seibert added a commit that referenced this issue Feb 26, 2020

Merge pull request #5099 from gmarkall/grm-issue-5073

0f2f216

Fix #5073: Slices of dynamic shared memory all alias

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slices of dynamic shared arrays all alias #5073

Slices of dynamic shared arrays all alias #5073

gmarkall commented Jan 13, 2020

stuartarchibald commented Jan 13, 2020

gmarkall commented Jan 13, 2020

gmarkall commented Jan 16, 2020

Slices of dynamic shared arrays all alias #5073

Slices of dynamic shared arrays all alias #5073

Comments

gmarkall commented Jan 13, 2020

stuartarchibald commented Jan 13, 2020

gmarkall commented Jan 13, 2020

gmarkall commented Jan 16, 2020