copy() for some discontiguous arrays; setitem; get2() provisional… #76

davidweichiang · 2015-07-06T14:35:15Z

Adds a private function _copy() that copies either a GPUArray/ndarray to another GPUArray/ndarray. The two arrays must have the same shape and dtype. They must be <= 3d. They must have the same order and must be contiguous along the minor axis, but otherwise don't have to have the same strides. Sorry that it's verbose; I can compact it later if it's decided to keep it.

This function is used in copy() and setitem(), and a dumbly-named get2() method which doesn't automatically reshape arrays with the same size but different shape. I wasn't sure what the right thing to do here was.

There isn't an asynchronous version because I'm not familiar yet with how that works.

…ly is like get() for same-shaped arrays

inducer · 2015-07-11T18:18:34Z

pycuda/gpuarray.py

+
+def _copy(dst, src):
+    """Copy the contents of src into dst."""
+    if not (isinstance(src, GPUArray) or isinstance(src, np.ndarray)):


isinstance(src, (GPUArray, np.ndarray)) is preferable to or-d together single isinstance(), since isinstance is kind of a slow operation.

That said, the host side should not really be limited to numpy arrays--in principle anything with a buffer interface is OK.

inducer · 2015-07-11T18:28:35Z

One way in which this could be more general is that it could just look for sets of contiguous dimensions and batch those up. It could then use CUDA's suppport for noncontiguity for those dimennsions where it's needed. That's sort of the more general statement over what I said in the line comment.

Overall, I'd be happy to merge this with the following issues resolved:

I'd like to have some tests, to ensure that this stuff actually works, and to put me in a position to maintain it.
I don't think I like the presence of get2(), I'd like this to be part of the main get() codepath. (Or maybe you can explain the use case, and we can try to come up with a better name.)
As it stands, I think this breaks compatibility for high-rank contiguous arrays, for which setitem no longer works.

davidweichiang · 2015-07-11T18:51:50Z

Sure, I can address these issues. If get and get2 were to be merged, it would work under the condition (self and ary are both contiguous BUT not necessarily the same shape) OR (self and ary are the same shape and ndim <= 3 BUT not necessarily contiguous). That seemed odd to me, so I kept it separate. I can think of a few options:

Keep them under separate names (get and get_discontig, perhaps)
A single method that works under the condition written above.
A single method that works under the condition (self and ary have the same shape and order) AND ((self and ary are both contiguous) OR (self and ary have ndim <= 3)). But this would break the current reshaping behavior of get.
Your extension of collapsing contiguous dimensions would meld both cases into one conditon: self and ary have the same shape and order, and together they have <= 2 discontiguous dimensions.

Personally, I would prefer the last option.

inducer · 2015-07-11T19:37:29Z

Agree, I like the last option best as well. FWIW, if .get() allows reshaping (it does?), that's by accident for lack of checking. That's a behavior I'd be happy to deprecate.

davidweichiang · 2015-07-11T20:11:06Z

Yes, it's in the documentation :) I'll look at implementing the last option.

inducer · 2015-07-11T20:46:41Z

That seems like a terrible idea. I've deprecated that in 8a764bf.

…m <= 3 and minor axis is contiguous)

…ases

davidweichiang · 2015-07-12T02:53:48Z

OK, the last two commits implement the 3rd and 4th idea above, respectively. I am not sure I understand buffers totally, and will add a couple of line-by-line comments where I have doubts.

davidweichiang · 2015-07-12T02:55:53Z

pycuda/gpuarray.py

+                # so that the order is neither Fortran or C.
+                # So, we attempt to get a contiguous view of dst.
+                dst_flat = dst.ravel(order='K')
+                assert np.byte_bounds(dst) == np.byte_bounds(dst_flat)


I can't guarantee this assertion will always be true. It seems to be true in the case that I'm concerned about. The other solution would be for mempy_dtoh to allow a non-contiguous buffer but that doesn't seem to make sense.

inducer · 2015-07-12T19:40:08Z

Looks great, thank you for your contribution. Ready to merge as far as I'm concerned. Any lingering concerns on your end?

davidweichiang · 2015-07-12T20:37:23Z

Only the question above about Memcpy*async. Thanks!

copy() for some discontiguous arrays; __setitem__; get2() provisional…

inducer · 2015-07-12T21:52:18Z

Thanks again!

inducer · 2015-07-12T22:09:03Z

Hmm. Both of your tests fail on my CI machine:

____________________________ TestGPUArray.test_copy ____________________________
Traceback (most recent call last):
  File "/home/gitlab_ci_multi_runner/builds/5b708ae5/0/inducer/pycuda/test/test_gpuarray.py", line 983, in test_copy
    a_gpu = curand((3,3))
  File "/home/gitlab_ci_multi_runner/builds/5b708ae5/0/inducer/pycuda/.env/local/lib/python2.7/site-packages/pycuda-2015.1.2-py2.7-linux-x86_64.egg/pycuda/curandom.py", line 195, in rand
    result = GPUArray(shape, dtype)
  File "/home/gitlab_ci_multi_runner/builds/5b708ae5/0/inducer/pycuda/.env/local/lib/python2.7/site-packages/pycuda-2015.1.2-py2.7-linux-x86_64.egg/pycuda/gpuarray.py", line 202, in __init__
    self.gpudata = self.allocator(self.size * self.dtype.itemsize)
LogicError: cuMemAlloc failed: invalid device context
__________________________ TestGPUArray.test_get_set ___________________________
Traceback (most recent call last):
  File "/home/gitlab_ci_multi_runner/builds/5b708ae5/0/inducer/pycuda/test/test_gpuarray.py", line 1011, in test_get_set
    a_gpu = gpuarray.to_gpu(a)
  File "/home/gitlab_ci_multi_runner/builds/5b708ae5/0/inducer/pycuda/.env/local/lib/python2.7/site-packages/pycuda-2015.1.2-py2.7-linux-x86_64.egg/pycuda/gpuarray.py", line 951, in to_gpu
    result = GPUArray(ary.shape, ary.dtype, allocator, strides=_compact_strides(ary))
  File "/home/gitlab_ci_multi_runner/builds/5b708ae5/0/inducer/pycuda/.env/local/lib/python2.7/site-packages/pycuda-2015.1.2-py2.7-linux-x86_64.egg/pycuda/gpuarray.py", line 202, in __init__
    self.gpudata = self.allocator(self.size * self.dtype.itemsize)
LogicError: cuMemAlloc failed: invalid device context

inducer · 2015-07-12T22:09:20Z

That's a K20, FWIW.

davidweichiang · 2015-07-12T23:12:15Z

I'm confused because the first test is raising an exception before it reaches any of the new code...

inducer · 2015-07-13T04:09:29Z

Mystery solved. The tests were missing @mark_cuda_test. Among other things, this accomplishes proper context set-up/tear-down for the tests.

davidweichiang added 2 commits July 6, 2015 09:56

copy() for some discontiguous arrays; __setitem__; get2() provisional…

e1120f8

…ly is like get() for same-shaped arrays

shortened _copy, fixed some bugs

110c574

inducer reviewed Jul 11, 2015
View reviewed changes

Merge remote-tracking branch 'upstream/master'

a6551ff

davidweichiang added 3 commits July 11, 2015 18:45

Merge remote-tracking branch 'upstream/master'

60648a2

get(), copy(), etc. work on arrays that are either contiguous or (ndi…

c4dd2c3

…m <= 3 and minor axis is contiguous)

copy(), get(), etc. collapse dimensions to allow >3d arrays in some c…

9c6d3ba

…ases

davidweichiang reviewed Jul 12, 2015
View reviewed changes

davidweichiang added 2 commits July 11, 2015 23:08

safer to use as_strided instead of ravel("K")?

ac83622

set/to_gpu symmetrical with get

fa7a3bb

inducer mentioned this pull request Jul 12, 2015

Fix confusion between non-asynchrony and zero stream #78

Open

inducer added a commit that referenced this pull request Jul 12, 2015

Merge pull request #76 from davidweichiang/master

fde69b0

copy() for some discontiguous arrays; __setitem__; get2() provisional…

inducer merged commit fde69b0 into inducer:master Jul 12, 2015

untom mentioned this pull request Jul 29, 2015

Fix gpuarray.set #80

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

copy() for some discontiguous arrays; setitem; get2() provisional… #76

copy() for some discontiguous arrays; setitem; get2() provisional… #76

davidweichiang commented Jul 6, 2015

inducer Jul 11, 2015

inducer commented Jul 11, 2015

davidweichiang commented Jul 11, 2015

inducer commented Jul 11, 2015

davidweichiang commented Jul 11, 2015

inducer commented Jul 11, 2015

davidweichiang commented Jul 12, 2015

davidweichiang Jul 12, 2015

inducer commented Jul 12, 2015

davidweichiang commented Jul 12, 2015

inducer commented Jul 12, 2015

inducer commented Jul 12, 2015

inducer commented Jul 12, 2015

davidweichiang commented Jul 12, 2015

inducer commented Jul 13, 2015

copy() for some discontiguous arrays; __setitem__; get2() provisional… #76

copy() for some discontiguous arrays; __setitem__; get2() provisional… #76

Conversation

davidweichiang commented Jul 6, 2015

inducer Jul 11, 2015

Choose a reason for hiding this comment

inducer commented Jul 11, 2015

davidweichiang commented Jul 11, 2015

inducer commented Jul 11, 2015

davidweichiang commented Jul 11, 2015

inducer commented Jul 11, 2015

davidweichiang commented Jul 12, 2015

davidweichiang Jul 12, 2015

Choose a reason for hiding this comment

inducer commented Jul 12, 2015

davidweichiang commented Jul 12, 2015

inducer commented Jul 12, 2015

inducer commented Jul 12, 2015

inducer commented Jul 12, 2015

davidweichiang commented Jul 12, 2015

inducer commented Jul 13, 2015

copy() for some discontiguous arrays; setitem; get2() provisional… #76

copy() for some discontiguous arrays; setitem; get2() provisional… #76