Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Use buffered space in linalg.matrix_power #18137

Closed
wants to merge 1 commit into from

Conversation

yunfeim
Copy link

@yunfeim yunfeim commented Jan 8, 2021

The linalg.matrix_power function allocates new space
for each matrix multiplication that it performs.
For large matrices, creating and using a buffer
can lead to performance benefits.

The linalg.matrix_power function allocates new space
for each matrix multiplication that it performs.
For large matrices, creating and using a buffer
can lead to performance benefits.
@@ -1039,6 +1039,18 @@ def tz(mat):
if dt != object:
tz(self.stacked.astype(dt))

def test_power_is_three(self, dt):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case of the exponent being 3 (a hard-coded case) was actually untested before.

@rgommers
Copy link
Member

Thanks @yunfeim. It would be helpful if you could add a benchmark for matrix_power in benchmarks/benchmarks/bench_linalg.py that shows the performance benefit.

return fmatmul(fmatmul(a, a), a)
# create and use buffered space
buffer = fmatmul(a, a)
return fmatmul(buffer, a, out=buffer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems clear that this will help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even support in-place matrix multiply? (Does BLAS even support it?)

I honestly expect the matmul ufunc (and np.dot) will just run overlap detection and create an additional internal buffer here. (I am having problems with timeit not running my setup code right now).

Unless there are some clear timings, We should probably keep the new tests and maybe see how to improve the n>3 case, where I expect something might work (if it is worth it). Otherwise, first we need to dig into avoiding the additional copy in np.dot and np.matmul first (and making sure that is correct).


# Use binary decomposition to reduce the number of matrix multiplications.
# Here, we iterate over the bits of n, from LSB to MSB, raise `a` to
# increasing powers of 2, and multiply into the result as needed.
z = result = None
while n > 0:
z = a if z is None else fmatmul(z, z)
z = a.copy() if z is None else fmatmul(z, z, out=z)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little less obvious.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a fairly clear win to me - although in principle this now does an extra copy that would be possible to avoid

@mattip
Copy link
Member

mattip commented Feb 22, 2021

@yunfeim is there a benchmark that is improved by this code? If not please add one. In any case, you should show a before/after comparison of speed or memory usage.

Base automatically changed from master to main March 4, 2021 02:05
@seberg seberg added 52 - Inactive Pending author response 55 - Needs work labels Jun 10, 2022
@mattip
Copy link
Member

mattip commented Mar 28, 2023

This benchmark shows no change in this PR:

class MatrixPower(Benchmark):
    def setup(self):
        self.a = get_squares_()['float64']

    def time_matrix_power_3(self):
        np.linalg.matrix_power(self.a, 3)

I think ufunc overlap checks make a copy of out before reaching the linalg code, so deeper refactoring is needed to avoid copying. Perhaps OpenBLAS can avoid some extra allocations, but in general matmul(a, a, out=a) will have to copy a.

Unfortunately I think we should close this PR.

@seberg
Copy link
Member

seberg commented Mar 28, 2023

I agree, the current state is not useful. In parts it could be, but there was no follow-up in a long time.

@seberg seberg closed this Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants