-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for non-contiguous strides #4736
Fix for non-contiguous strides #4736
Conversation
This PR has been labeled |
This PR has been labeled |
rerun tests |
else: | ||
cupy_data = cp.array(data, copy=True, order='C') | ||
self._ptr = cupy_data.data.ptr | ||
self._owner = cupy_data if cupy_data.flags.owndata \ | ||
else data | ||
self.order = 'C' | ||
self.strides = cupy_data.strides |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the newly created array should have a conformant CAI, could we just call the constructor again?
else: | |
cupy_data = cp.array(data, copy=True, order='C') | |
self._ptr = cupy_data.data.ptr | |
self._owner = cupy_data if cupy_data.flags.owndata \ | |
else data | |
self.order = 'C' | |
self.strides = cupy_data.strides | |
else: | |
cupy_data = cp.array(data, copy=True, order='C') | |
super().__init__(data=cupy_data) |
python/cuml/common/memory_utils.py
Outdated
itemsize = cp.dtype(dtype).itemsize | ||
shape = list(shape) | ||
strides = list(strides) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a need for these copies and you can also just use the sliced view of those arrays directly.
python/cuml/common/memory_utils.py
Outdated
shape = shape[::-1] | ||
for dim_size in shape[:-1]: | ||
strides.append(dim_size * strides[-1]) | ||
strides = strides[::-1] | ||
|
||
else: | ||
raise ValueError('Order must be "F" or "C". ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ValueError('Order must be "F" or "C". ') | |
raise ValueError('Order must be "F" or "C".') |
python/cuml/common/memory_utils.py
Outdated
shape = shape[::-1] | ||
for dim_size in shape[:-1]: | ||
strides.append(dim_size * strides[-1]) | ||
strides = strides[::-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider to use a combination of itertools.accumulate()
and operator.mul()
to compute the strides a bit more succinctly:
from itertools import accumulate
from operator import mul
f_strides = list(accumulate(shape[:-1], func=mul, initial=item_size))
c_strides = list(accumulate(shape[:0:-1], func=mul, initial=item_size))[::-1]
Edit: If you like the suggestion, I can run some benchmarks to ensure that this isn't slower by any chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. Sure that could be interesting :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like loops are the fastest, however I was able to apply a micro optimization that is a bit faster (see below):
Loops: 100.00%
Loops (alt): 94.73%
With accumulate: 132.87%
I ran this a few times and results appear pretty consistent.
Benchmark code
from itertools import accumulate
from operator import mul
def compute_strides(shape, item_size):
tuple(accumulate(shape[:-1], func=mul, initial=item_size))
tuple(accumulate(shape[:0:-1], func=mul, initial=item_size))[::-1]
def compute_strides_loops(shape, item_size):
strides = [item_size]
for dim_size in shape[:-1]:
strides.append(dim_size * strides[-1])
tuple(strides)
strides = [item_size]
shape = shape[::-1]
for dim_size in shape[:-1]:
strides.append(dim_size * strides[-1])
tuple(strides[::-1])
def compute_strides_loops_alternative(shape, item_size):
strides = [item_size]
for dim_size in shape[:-1]:
strides.append(dim_size * strides[-1])
tuple(strides)
strides = [item_size]
for dim_size in shape[:0:-1]:
strides.append(dim_size * strides[-1])
tuple(strides[::-1])
if __name__ == "__main__":
from timeit import timeit
result_w_loops = timeit(
"compute_strides((2, 3), 8)",
setup="from benchmark_strides import compute_strides_loops as compute_strides",
)
result_w_loops_alt = timeit(
"compute_strides((2, 3), 8)",
setup="from benchmark_strides import compute_strides_loops_alternative as compute_strides",
)
result_w_accumulate = timeit(
"compute_strides((2, 3), 8)",
setup="from benchmark_strides import compute_strides",
)
print(f"Loops: {result_w_loops / result_w_loops:.2%}")
print(f"Loops (alt): {result_w_loops_alt / result_w_loops:.2%}")
print(f"With accumulate: {result_w_accumulate / result_w_loops:.2%}")
Micro-optimization:
# F-order
strides = [item_size]
for dim_size in shape[:-1]:
strides.append(dim_size * strides[-1])
tuple(strides)
# C-order
strides = [item_size]
for dim_size in shape[:0:-1]:
strides.append(dim_size * strides[-1])
tuple(strides[::-1])
@gpucibot merge |
Fixes rapidsai#4731 Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4736
Fixes #4731