-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Use buffered space in linalg.matrix_power #18137
Conversation
The linalg.matrix_power function allocates new space for each matrix multiplication that it performs. For large matrices, creating and using a buffer can lead to performance benefits.
@@ -1039,6 +1039,18 @@ def tz(mat): | |||
if dt != object: | |||
tz(self.stacked.astype(dt)) | |||
|
|||
def test_power_is_three(self, dt): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case of the exponent being 3 (a hard-coded case) was actually untested before.
Thanks @yunfeim. It would be helpful if you could add a benchmark for |
return fmatmul(fmatmul(a, a), a) | ||
# create and use buffered space | ||
buffer = fmatmul(a, a) | ||
return fmatmul(buffer, a, out=buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems clear that this will help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even support in-place matrix multiply? (Does BLAS even support it?)
I honestly expect the matmul
ufunc (and np.dot
) will just run overlap detection and create an additional internal buffer here. (I am having problems with timeit
not running my setup code right now).
Unless there are some clear timings, We should probably keep the new tests and maybe see how to improve the n>3
case, where I expect something might work (if it is worth it). Otherwise, first we need to dig into avoiding the additional copy in np.dot
and np.matmul
first (and making sure that is correct).
|
||
# Use binary decomposition to reduce the number of matrix multiplications. | ||
# Here, we iterate over the bits of n, from LSB to MSB, raise `a` to | ||
# increasing powers of 2, and multiply into the result as needed. | ||
z = result = None | ||
while n > 0: | ||
z = a if z is None else fmatmul(z, z) | ||
z = a.copy() if z is None else fmatmul(z, z, out=z) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little less obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a fairly clear win to me - although in principle this now does an extra copy that would be possible to avoid
@yunfeim is there a benchmark that is improved by this code? If not please add one. In any case, you should show a before/after comparison of speed or memory usage. |
This benchmark shows no change in this PR:
I think ufunc overlap checks make a copy of Unfortunately I think we should close this PR. |
I agree, the current state is not useful. In parts it could be, but there was no follow-up in a long time. |
The linalg.matrix_power function allocates new space
for each matrix multiplication that it performs.
For large matrices, creating and using a buffer
can lead to performance benefits.