Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy() for some discontiguous arrays; __setitem__; get2() provisional… #76

Merged
merged 8 commits into from
Jul 12, 2015

Conversation

davidweichiang
Copy link
Contributor

Adds a private function _copy() that copies either a GPUArray/ndarray to another GPUArray/ndarray. The two arrays must have the same shape and dtype. They must be <= 3d. They must have the same order and must be contiguous along the minor axis, but otherwise don't have to have the same strides. Sorry that it's verbose; I can compact it later if it's decided to keep it.

This function is used in copy() and setitem(), and a dumbly-named get2() method which doesn't automatically reshape arrays with the same size but different shape. I wasn't sure what the right thing to do here was.

There isn't an asynchronous version because I'm not familiar yet with how that works.


def _copy(dst, src):
"""Copy the contents of src into dst."""
if not (isinstance(src, GPUArray) or isinstance(src, np.ndarray)):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance(src, (GPUArray, np.ndarray)) is preferable to or-d together single isinstance(), since isinstance is kind of a slow operation.

That said, the host side should not really be limited to numpy arrays--in principle anything with a buffer interface is OK.

@inducer
Copy link
Owner

inducer commented Jul 11, 2015

One way in which this could be more general is that it could just look for sets of contiguous dimensions and batch those up. It could then use CUDA's suppport for noncontiguity for those dimennsions where it's needed. That's sort of the more general statement over what I said in the line comment.

Overall, I'd be happy to merge this with the following issues resolved:

  • I'd like to have some tests, to ensure that this stuff actually works, and to put me in a position to maintain it.
  • I don't think I like the presence of get2(), I'd like this to be part of the main get() codepath. (Or maybe you can explain the use case, and we can try to come up with a better name.)
  • As it stands, I think this breaks compatibility for high-rank contiguous arrays, for which setitem no longer works.

@davidweichiang
Copy link
Contributor Author

Sure, I can address these issues. If get and get2 were to be merged, it would work under the condition (self and ary are both contiguous BUT not necessarily the same shape) OR (self and ary are the same shape and ndim <= 3 BUT not necessarily contiguous). That seemed odd to me, so I kept it separate. I can think of a few options:

  • Keep them under separate names (get and get_discontig, perhaps)
  • A single method that works under the condition written above.
  • A single method that works under the condition (self and ary have the same shape and order) AND ((self and ary are both contiguous) OR (self and ary have ndim <= 3)). But this would break the current reshaping behavior of get.
  • Your extension of collapsing contiguous dimensions would meld both cases into one conditon: self and ary have the same shape and order, and together they have <= 2 discontiguous dimensions.

Personally, I would prefer the last option.

@inducer
Copy link
Owner

inducer commented Jul 11, 2015

Agree, I like the last option best as well. FWIW, if .get() allows reshaping (it does?), that's by accident for lack of checking. That's a behavior I'd be happy to deprecate.

@davidweichiang
Copy link
Contributor Author

Yes, it's in the documentation :) I'll look at implementing the last option.

@inducer
Copy link
Owner

inducer commented Jul 11, 2015

That seems like a terrible idea. I've deprecated that in 8a764bf.

@davidweichiang
Copy link
Contributor Author

OK, the last two commits implement the 3rd and 4th idea above, respectively. I am not sure I understand buffers totally, and will add a couple of line-by-line comments where I have doubts.

# so that the order is neither Fortran or C.
# So, we attempt to get a contiguous view of dst.
dst_flat = dst.ravel(order='K')
assert np.byte_bounds(dst) == np.byte_bounds(dst_flat)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't guarantee this assertion will always be true. It seems to be true in the case that I'm concerned about. The other solution would be for mempy_dtoh to allow a non-contiguous buffer but that doesn't seem to make sense.

@inducer
Copy link
Owner

inducer commented Jul 12, 2015

Looks great, thank you for your contribution. Ready to merge as far as I'm concerned. Any lingering concerns on your end?

@davidweichiang
Copy link
Contributor Author

Only the question above about Memcpy*async. Thanks!

inducer added a commit that referenced this pull request Jul 12, 2015
copy() for some discontiguous arrays; __setitem__; get2() provisional…
@inducer inducer merged commit fde69b0 into inducer:master Jul 12, 2015
@inducer
Copy link
Owner

inducer commented Jul 12, 2015

Thanks again!

@inducer
Copy link
Owner

inducer commented Jul 12, 2015

Hmm. Both of your tests fail on my CI machine:

____________________________ TestGPUArray.test_copy ____________________________
Traceback (most recent call last):
  File "/home/gitlab_ci_multi_runner/builds/5b708ae5/0/inducer/pycuda/test/test_gpuarray.py", line 983, in test_copy
    a_gpu = curand((3,3))
  File "/home/gitlab_ci_multi_runner/builds/5b708ae5/0/inducer/pycuda/.env/local/lib/python2.7/site-packages/pycuda-2015.1.2-py2.7-linux-x86_64.egg/pycuda/curandom.py", line 195, in rand
    result = GPUArray(shape, dtype)
  File "/home/gitlab_ci_multi_runner/builds/5b708ae5/0/inducer/pycuda/.env/local/lib/python2.7/site-packages/pycuda-2015.1.2-py2.7-linux-x86_64.egg/pycuda/gpuarray.py", line 202, in __init__
    self.gpudata = self.allocator(self.size * self.dtype.itemsize)
LogicError: cuMemAlloc failed: invalid device context
__________________________ TestGPUArray.test_get_set ___________________________
Traceback (most recent call last):
  File "/home/gitlab_ci_multi_runner/builds/5b708ae5/0/inducer/pycuda/test/test_gpuarray.py", line 1011, in test_get_set
    a_gpu = gpuarray.to_gpu(a)
  File "/home/gitlab_ci_multi_runner/builds/5b708ae5/0/inducer/pycuda/.env/local/lib/python2.7/site-packages/pycuda-2015.1.2-py2.7-linux-x86_64.egg/pycuda/gpuarray.py", line 951, in to_gpu
    result = GPUArray(ary.shape, ary.dtype, allocator, strides=_compact_strides(ary))
  File "/home/gitlab_ci_multi_runner/builds/5b708ae5/0/inducer/pycuda/.env/local/lib/python2.7/site-packages/pycuda-2015.1.2-py2.7-linux-x86_64.egg/pycuda/gpuarray.py", line 202, in __init__
    self.gpudata = self.allocator(self.size * self.dtype.itemsize)
LogicError: cuMemAlloc failed: invalid device context

@inducer
Copy link
Owner

inducer commented Jul 12, 2015

That's a K20, FWIW.

@davidweichiang
Copy link
Contributor Author

I'm confused because the first test is raising an exception before it reaches any of the new code...

@inducer
Copy link
Owner

inducer commented Jul 13, 2015

Mystery solved. The tests were missing @mark_cuda_test. Among other things, this accomplishes proper context set-up/tear-down for the tests.

@untom untom mentioned this pull request Jul 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants