-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
copy() for some discontiguous arrays; __setitem__; get2() provisional… #76
Conversation
…ly is like get() for same-shaped arrays
|
||
def _copy(dst, src): | ||
"""Copy the contents of src into dst.""" | ||
if not (isinstance(src, GPUArray) or isinstance(src, np.ndarray)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isinstance(src, (GPUArray, np.ndarray))
is preferable to or
-d together single isinstance()
, since isinstance
is kind of a slow operation.
That said, the host side should not really be limited to numpy arrays--in principle anything with a buffer interface is OK.
One way in which this could be more general is that it could just look for sets of contiguous dimensions and batch those up. It could then use CUDA's suppport for noncontiguity for those dimennsions where it's needed. That's sort of the more general statement over what I said in the line comment. Overall, I'd be happy to merge this with the following issues resolved:
|
Sure, I can address these issues. If get and get2 were to be merged, it would work under the condition (self and ary are both contiguous BUT not necessarily the same shape) OR (self and ary are the same shape and ndim <= 3 BUT not necessarily contiguous). That seemed odd to me, so I kept it separate. I can think of a few options:
Personally, I would prefer the last option. |
Agree, I like the last option best as well. FWIW, if |
Yes, it's in the documentation :) I'll look at implementing the last option. |
That seems like a terrible idea. I've deprecated that in 8a764bf. |
OK, the last two commits implement the 3rd and 4th idea above, respectively. I am not sure I understand buffers totally, and will add a couple of line-by-line comments where I have doubts. |
# so that the order is neither Fortran or C. | ||
# So, we attempt to get a contiguous view of dst. | ||
dst_flat = dst.ravel(order='K') | ||
assert np.byte_bounds(dst) == np.byte_bounds(dst_flat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't guarantee this assertion will always be true. It seems to be true in the case that I'm concerned about. The other solution would be for mempy_dtoh to allow a non-contiguous buffer but that doesn't seem to make sense.
Looks great, thank you for your contribution. Ready to merge as far as I'm concerned. Any lingering concerns on your end? |
Only the question above about Memcpy*async. Thanks! |
copy() for some discontiguous arrays; __setitem__; get2() provisional…
Thanks again! |
Hmm. Both of your tests fail on my CI machine:
|
That's a K20, FWIW. |
I'm confused because the first test is raising an exception before it reaches any of the new code... |
Mystery solved. The tests were missing |
Adds a private function _copy() that copies either a GPUArray/ndarray to another GPUArray/ndarray. The two arrays must have the same shape and dtype. They must be <= 3d. They must have the same order and must be contiguous along the minor axis, but otherwise don't have to have the same strides. Sorry that it's verbose; I can compact it later if it's decided to keep it.
This function is used in copy() and setitem(), and a dumbly-named get2() method which doesn't automatically reshape arrays with the same size but different shape. I wasn't sure what the right thing to do here was.
There isn't an asynchronous version because I'm not familiar yet with how that works.