Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make __setitem__ to fill all subscript positions even if size of values array is less than subscript size. #72

Closed
inferrna opened this issue Jan 13, 2015 · 7 comments

Comments

@inferrna
Copy link

It's needed to be more compatible with numpy. For example:

>>> sx = array.empty(queue, 4, np.int32)
>>> sx
array([0, 0, 0, 0], dtype=int32)
>>> idx = clrandom.rand(queue, 2, np.int32, luxury=None, a=0, b=4)
>>> idx
array([3, 1], dtype=int32)
>>> val = clrandom.rand(queue, 1, np.int32, luxury=None, a=0, b=99)
>>> val
array([48], dtype=int32)
>>> sx[idx] = val
>>> sx
array([ 0,  0,  0, 48], dtype=int32) #Only one position filled

>>> npsx = np.zeros(sx.shape, dtype=sx.dtype)
>>> npsx
array([0, 0, 0, 0], dtype=int32)
>>> npval = np.array([48], dtype=np.int32)
>>> npval
array([48], dtype=int32)
>>> npidx = np.array([3, 1], dtype=np.int32)
>>> npidx
array([3, 1], dtype=int32)
>>> npsx[npidx] = npval
>>> npsx
array([ 0, 48,  0, 48], dtype=int32) #Both positions filled

Also, as I observed in other cases, numpy is using equation like values[n%values.size] (where n is current subscription position) to get value for current subscription index. For example:

>>> npval = np.array([48, 32], dtype=np.int32)
>>> npidx = np.array([1, 2, 3], dtype=np.int32)
>>> npsx = np.zeros((4,), dtype=np.int32)
>>> npsx
array([0, 0, 0, 0], dtype=int32)
>>> npsx[npidx] = npval
>>> npsx
array([ 0, 48, 32, 48], dtype=int32)

>>> npidx = np.array([2, 3, 1], dtype=np.int32)
>>> npsx = np.zeros((4,), dtype=np.int32)
>>> npsx[npidx] = npval
>>> npsx
array([ 0, 48, 48, 32], dtype=int32)

I think, it is possible to make custom kernels for cases like:

  • values is array with size 1
  • values is array with size less than subscription size
  • values is array with size bigger or equal than subscription size
@inducer
Copy link
Owner

inducer commented Jan 13, 2015

Good catch. Can you provide a patch?

@inferrna
Copy link
Author

Works for me on a one-dimensional arrays, but I can't run all the tests provided, because on my laptop I have only two buggy opencl implementations - pocl and beignet. It anyway crashes with as without my changes.

diff --git a/pyopencl/array.py b/pyopencl/array.py
index 71ce3d8..5f4cddd 100644
--- a/pyopencl/array.py
+++ b/pyopencl/array.py
@@ -1998,19 +1998,22 @@ def multi_put(arrays, dest_indices, dest_shape=None, out=None, queue=None,

     chunk_size = _builtin_min(vec_count, 10)

-    def make_func_for_chunk_size(chunk_size):
+    vals_count = arrays[0].size
+
+    def make_func_for_chunk_size(chunk_size, vc=0):
         knl = elementwise.get_put_kernel(
                 context,
-                a_dtype, dest_indices.dtype, vec_count=chunk_size)
+                a_dtype, dest_indices.dtype, vec_count=chunk_size, vals_count=vc)
         return knl

-    knl = make_func_for_chunk_size(chunk_size)
+    knl = make_func_for_chunk_size(chunk_size, vals_count)

     for start_i in range(0, len(arrays), chunk_size):
         chunk_slice = slice(start_i, start_i+chunk_size)

+        vals_count = arrays[start_i].size
         if start_i + chunk_size > vec_count:
-            knl = make_func_for_chunk_size(vec_count-start_i)
+            knl = make_func_for_chunk_size(vec_count-start_i, vals_count)

         gs, ls = dest_indices.get_sizes(queue,
                 knl.get_work_group_info(
diff --git a/pyopencl/elementwise.py b/pyopencl/elementwise.py
index 398936f..a4973fd 100644
--- a/pyopencl/elementwise.py
+++ b/pyopencl/elementwise.py
@@ -413,7 +413,7 @@ def get_take_put_kernel(context, dtype, idx_dtype, with_offsets, vec_count=1):


 @context_dependent_memoize
-def get_put_kernel(context, dtype, idx_dtype, vec_count=1):
+def get_put_kernel(context, dtype, idx_dtype, vec_count=1, vals_count=0):
     ctx = {
             "idx_tp": dtype_to_ctype(idx_dtype),
             "tp": dtype_to_ctype(dtype),
@@ -429,9 +429,12 @@ def get_put_kernel(context, dtype, idx_dtype, vec_count=1):
                 for i in range(vec_count)
             ]

+    idxs = ('i','0','i%'+str(vals_count))
+    idx = idxs[min(vals_count, 2)]
+
     body = (
             "%(idx_tp)s dest_idx = gmem_dest_idx[i];\n" % ctx
-            + "\n".join("dest%d[dest_idx] = src%d[i];" % (i, i)
+            + "\n".join("dest%d[dest_idx] = src%d[%s];" % (i, i, idx)
                 for i in range(vec_count)))

     return get_elwise_kernel(context, args, body, name="put")

@inducer
Copy link
Owner

inducer commented Jan 18, 2015

I'm not quite happy with this, for two reasons:

  • The code is a little wasteful in terms of read bandwidth. Broadcasting the scalar value to all threads through a kernel argument would likely cut bandwidth usage.
  • The code is a little hard to decipher, with the computed index into a pre-constructed tuple. I'd prefer somewhat plainer (if marginally slower) if-then code. I'm also not convinced that "modulo mode" is sufficiently general to merit being included in a general-purpose array toolkit.

@inferrna
Copy link
Author

Broadcasting the scalar value to all threads through a kernel argument would likely cut bandwidth usage.

Agree with that. But hardcoding value into kernel needs recompilation each time. 2-step broadcasting over local data may be an solution. Is there for now any way to do this with pyopencl? If isn't, any extension for ElementwiseKernel to work with array and single value would be also helpful.

@inducer
Copy link
Owner

inducer commented Jan 19, 2015

Passing a kernel argument seems like a good solution--you do have to bake the type into the kernel (but you have to do that anyway), but the can be provided at runtime.

@hrfuller
Copy link
Contributor

@inducer Currently working on a solution to this. If we don't want to use "modulo mode" in the general case, does it make sense to throw an Exception if the indices array and the source array have different lengths (of course excluding the case where the source array has length == 1, in which case we fill all indices with the value)?. Behavior for a case with say, 2 values and 3 indices, would be undefined without "modulo mode" and could potentially seg fault or write garbage memory to the destination array (as it does in my build). However, I believe adding an Exception would not be backwards compatible.

@inducer
Copy link
Owner

inducer commented Nov 19, 2016

Upon reconsidering this, I'm OK with (and will take patches for) either (a) matching numpy behavior exactly (but doing the broadcast via kernel arguments) or (b) only taking care of the same-size case and throwing an error in all other cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants