Make setitem to fill all subscript positions even if size of values array is less than subscript size. #72

inferrna · 2015-01-13T09:15:32Z

It's needed to be more compatible with numpy. For example:

>>> sx = array.empty(queue, 4, np.int32)
>>> sx
array([0, 0, 0, 0], dtype=int32)
>>> idx = clrandom.rand(queue, 2, np.int32, luxury=None, a=0, b=4)
>>> idx
array([3, 1], dtype=int32)
>>> val = clrandom.rand(queue, 1, np.int32, luxury=None, a=0, b=99)
>>> val
array([48], dtype=int32)
>>> sx[idx] = val
>>> sx
array([ 0,  0,  0, 48], dtype=int32) #Only one position filled

>>> npsx = np.zeros(sx.shape, dtype=sx.dtype)
>>> npsx
array([0, 0, 0, 0], dtype=int32)
>>> npval = np.array([48], dtype=np.int32)
>>> npval
array([48], dtype=int32)
>>> npidx = np.array([3, 1], dtype=np.int32)
>>> npidx
array([3, 1], dtype=int32)
>>> npsx[npidx] = npval
>>> npsx
array([ 0, 48,  0, 48], dtype=int32) #Both positions filled

Also, as I observed in other cases, numpy is using equation like values[n%values.size] (where n is current subscription position) to get value for current subscription index. For example:

>>> npval = np.array([48, 32], dtype=np.int32)
>>> npidx = np.array([1, 2, 3], dtype=np.int32)
>>> npsx = np.zeros((4,), dtype=np.int32)
>>> npsx
array([0, 0, 0, 0], dtype=int32)
>>> npsx[npidx] = npval
>>> npsx
array([ 0, 48, 32, 48], dtype=int32)

>>> npidx = np.array([2, 3, 1], dtype=np.int32)
>>> npsx = np.zeros((4,), dtype=np.int32)
>>> npsx[npidx] = npval
>>> npsx
array([ 0, 48, 48, 32], dtype=int32)

I think, it is possible to make custom kernels for cases like:

values is array with size 1
values is array with size less than subscription size
values is array with size bigger or equal than subscription size

The text was updated successfully, but these errors were encountered:

inducer · 2015-01-13T18:50:51Z

Good catch. Can you provide a patch?

inferrna · 2015-01-14T06:20:52Z

Works for me on a one-dimensional arrays, but I can't run all the tests provided, because on my laptop I have only two buggy opencl implementations - pocl and beignet. It anyway crashes with as without my changes.

diff --git a/pyopencl/array.py b/pyopencl/array.py
index 71ce3d8..5f4cddd 100644
--- a/pyopencl/array.py
+++ b/pyopencl/array.py
@@ -1998,19 +1998,22 @@ def multi_put(arrays, dest_indices, dest_shape=None, out=None, queue=None,

     chunk_size = _builtin_min(vec_count, 10)

-    def make_func_for_chunk_size(chunk_size):
+    vals_count = arrays[0].size
+
+    def make_func_for_chunk_size(chunk_size, vc=0):
         knl = elementwise.get_put_kernel(
                 context,
-                a_dtype, dest_indices.dtype, vec_count=chunk_size)
+                a_dtype, dest_indices.dtype, vec_count=chunk_size, vals_count=vc)
         return knl

-    knl = make_func_for_chunk_size(chunk_size)
+    knl = make_func_for_chunk_size(chunk_size, vals_count)

     for start_i in range(0, len(arrays), chunk_size):
         chunk_slice = slice(start_i, start_i+chunk_size)

+        vals_count = arrays[start_i].size
         if start_i + chunk_size > vec_count:
-            knl = make_func_for_chunk_size(vec_count-start_i)
+            knl = make_func_for_chunk_size(vec_count-start_i, vals_count)

         gs, ls = dest_indices.get_sizes(queue,
                 knl.get_work_group_info(
diff --git a/pyopencl/elementwise.py b/pyopencl/elementwise.py
index 398936f..a4973fd 100644
--- a/pyopencl/elementwise.py
+++ b/pyopencl/elementwise.py
@@ -413,7 +413,7 @@ def get_take_put_kernel(context, dtype, idx_dtype, with_offsets, vec_count=1):


 @context_dependent_memoize
-def get_put_kernel(context, dtype, idx_dtype, vec_count=1):
+def get_put_kernel(context, dtype, idx_dtype, vec_count=1, vals_count=0):
     ctx = {
             "idx_tp": dtype_to_ctype(idx_dtype),
             "tp": dtype_to_ctype(dtype),
@@ -429,9 +429,12 @@ def get_put_kernel(context, dtype, idx_dtype, vec_count=1):
                 for i in range(vec_count)
             ]

+    idxs = ('i','0','i%'+str(vals_count))
+    idx = idxs[min(vals_count, 2)]
+
     body = (
             "%(idx_tp)s dest_idx = gmem_dest_idx[i];\n" % ctx
-            + "\n".join("dest%d[dest_idx] = src%d[i];" % (i, i)
+            + "\n".join("dest%d[dest_idx] = src%d[%s];" % (i, i, idx)
                 for i in range(vec_count)))

     return get_elwise_kernel(context, args, body, name="put")

inducer · 2015-01-18T01:09:00Z

I'm not quite happy with this, for two reasons:

The code is a little wasteful in terms of read bandwidth. Broadcasting the scalar value to all threads through a kernel argument would likely cut bandwidth usage.
The code is a little hard to decipher, with the computed index into a pre-constructed tuple. I'd prefer somewhat plainer (if marginally slower) if-then code. I'm also not convinced that "modulo mode" is sufficiently general to merit being included in a general-purpose array toolkit.

inferrna · 2015-01-19T03:04:15Z

Broadcasting the scalar value to all threads through a kernel argument would likely cut bandwidth usage.

Agree with that. But hardcoding value into kernel needs recompilation each time. 2-step broadcasting over local data may be an solution. Is there for now any way to do this with pyopencl? If isn't, any extension for ElementwiseKernel to work with array and single value would be also helpful.

inducer · 2015-01-19T19:47:05Z

Passing a kernel argument seems like a good solution--you do have to bake the type into the kernel (but you have to do that anyway), but the can be provided at runtime.

hrfuller · 2016-11-17T23:49:09Z

@inducer Currently working on a solution to this. If we don't want to use "modulo mode" in the general case, does it make sense to throw an Exception if the indices array and the source array have different lengths (of course excluding the case where the source array has length == 1, in which case we fill all indices with the value)?. Behavior for a case with say, 2 values and 3 indices, would be undefined without "modulo mode" and could potentially seg fault or write garbage memory to the destination array (as it does in my build). However, I believe adding an Exception would not be backwards compatible.

inducer · 2016-11-19T01:00:00Z

Upon reconsidering this, I'm OK with (and will take patches for) either (a) matching numpy behavior exactly (but doing the broadcast via kernel arguments) or (b) only taking care of the same-size case and throwing an error in all other cases.

hrfuller mentioned this issue Dec 2, 2016

Multi __setitem__ numpy compliance #158

Merged

inducer closed this as completed in #158 Dec 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make setitem to fill all subscript positions even if size of values array is less than subscript size. #72

Make setitem to fill all subscript positions even if size of values array is less than subscript size. #72

inferrna commented Jan 13, 2015

inducer commented Jan 13, 2015

inferrna commented Jan 14, 2015

inducer commented Jan 18, 2015

inferrna commented Jan 19, 2015

inducer commented Jan 19, 2015

hrfuller commented Nov 17, 2016

inducer commented Nov 19, 2016

Make __setitem__ to fill all subscript positions even if size of values array is less than subscript size. #72

Make __setitem__ to fill all subscript positions even if size of values array is less than subscript size. #72

Comments

inferrna commented Jan 13, 2015

inducer commented Jan 13, 2015

inferrna commented Jan 14, 2015

inducer commented Jan 18, 2015

inferrna commented Jan 19, 2015

inducer commented Jan 19, 2015

hrfuller commented Nov 17, 2016

inducer commented Nov 19, 2016

Make setitem to fill all subscript positions even if size of values array is less than subscript size. #72

Make setitem to fill all subscript positions even if size of values array is less than subscript size. #72